Sunday, May 9, 2010

Bug in MySQL Subqueries with IN Operator

While developing an application with MySQL at the backend I happened to write a query containing a subquery with the IN operator. I was surprised to see that the query crashed MySQL daemon on Windows! On Linux, it was taking ages and I had to abort it. After lots of tests I concluded that MySQL has a limited support for subqueries. I browsed their site for help and found out that they were working on it and it will be fixed in the 6th version.

Similarly I also found that Self-Joins have problems too, for tables with large number of entries.
Here are the queries. If you wanna experiment then download the table structure from here

The query with subquery(takes infinite time):
SELECT DISTINCT(uid) FROM ip WHERE ip IN(SELECT ip FROM ip WHERE uid=3) ORDER BY uid; 

The alternate solution using self-join(takes 5.30 seconds on my notebook):
SELECT DISTINCT(a.uid) FROM ip a, ip b WHERE a.ip=b.ip AND b.uid=3 ORDER BY a.uid;

Another solution using an inner join to an inline view(takes 2.55 seconds on my notebook):
SELECT DISTINCT(a.uid) FROM ip a, (SELECT ip FROM ip WHERE uid=3) b WHERE a.ip=b.ip ORDER BY a.uid;

or in ANSI style using the INNER JOIN keyword


SELECT DISTINCT(a.uid) FROM ip a INNER JOIN (SELECT ip FROM ip WHERE uid=3) b ON a.ip=b.ip ORDER BY a.uid;



The penultimate solution I used was through PHP by splitting the query in parts and processing through PHP. It takes 1.99 seconds to vomit the result:
<?php
$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$start = $time;
$sql=mysql_query("SELECT ip FROM ip WHERE uid=3");
$condition="";
while($tmp=mysql_fetch_array($sql)){
$condition.=" ip='$tmp[0]' OR";
}
$len=strlen($condition);
$condition=substr($condition,0,len-3);
$sql=mysql_query("SELECT DISTINCT(uid) FROM ip WHERE".$condition);
          if ($sql){
  $i=0;
  while ($sqls=mysql_fetch_array($sql)){
  $iarray[$i++]=$sqls[0];
  }
  }
$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$finish = $time;
$totaltime = ($finish - $start);
echo count($iarray)." rows in $totaltime seconds
";
print_r($iarray);
?>

The final solution that I implemented take 0.035 seconds!

<?php
$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$start = $time;
$sql=mysql_query("SELECT ip FROM ip WHERE uid=3");
$condition="";
while($tmp=mysql_fetch_array($sql)){
$condition.="'$tmp[0]',";
}
$len=strlen($condition);
$condition=substr($condition,0,len-1);
$query="SELECT DISTINCT(uid) FROM ip WHERE ip IN(".$condition.")";
$sql=mysql_query($query);
          if ($sql){
 $i=0;
 while ($sqls=mysql_fetch_array($sql)){
 $iarray[$i++]=$sqls[0];
 }
 }
$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$finish = $time;
$totaltime = ($finish - $start);
echo count($iarray)." rows in $totaltime seconds
";
print_r($iarray);
?>

1 comment:

Caio Iglesias said...

That is truly annoying. I lost 2 days believing I was doing it wrong only to find out it is a bug. http://bugs.mysql.com/bug.php?id=9090 Fixed on a later version, but of course my host won't update.