Saturday, November 20, 2010

Importing MS Access database into Oracle 10g or MySQL

While working on a data mining project I came across an issue in which I had to dump all the data from an MS Access mdb file into Oracle and MySQL database tables. Actually, to test your implementations of various data mining algorithms you need huge real life data sets, and they are mostly available on the internet in CSV formats or MS Access mdb files. One great place to download such data sets is the Machine Learning repository of University of California. Here is the link:

http://archive.ics.uci.edu/ml/

Now lets get back to the point of this post. Yesterday I got an mdb file and I needed to put all the data into my Oracle 10g XE installation for developing and testing an implementation of CTU-Tree data mining algorithm. The table had 1 lakh records. The first column was the ID, for serial number or transaction ID and next 19 were labelled as A to S. It was a transactional dataset. I would like to point that here transactional doesn't mean the ACID property following database, but actual data pertaining to transactions in supermarkets.
I personally hate MS Access as a database and I am a big fan of Oracle (although they're doing nasty things with Android now a days :P).
First attempt
I fiddled with some options in MS Access and exported the whole data set into XML format. For that, in the left column just right click the table name and see the Export menu option. Then I went to application express in 10g XE and tried the Load XML data option. Unfortunately it didn't work. Probably Oracle's Load XML Data couldn't parse Access's exported XML well.
Second Attempt
Next, I tried to export the Access file into TXT format delimited by commas. The export was successful except with an issue that I got warnings because some of the cells in the Access DB were null. I tried Load Text Data, in Oracle.This time Oracle's importer crashed, dunno why. I tried the same thing again but in vain. The application didn't crash but it showed me the status as
Succeeded: 0
Failed: 100000
Third attempt with MySQL
I thought, let me dump all of it to MySQL as MySQL is more flexible in many trivial things. The text export had one lakh lines like these two:


1,"25","52","164","240","274","328","368","448","538","561","630","687","730","775","825","834","0","0","0","100"
2,"39","120","124","205","401","581","704","814","825","834","0","0","0","0","0","0","0","0","0","0"

I created a table in a MySQL database with 20 columns all of type INT. Then I used this command in the mysql client command line utility:

LOAD DATA INFILE 'G:\TRANSACTIONS.TXT' INTO TABLE transactions;

I pressed the enter and within a second the output came. But it showed some warning. I ignored and straight away launched PHPMyAdmin to have a glance at the data. Alas! Only the first column had data! Rest all were set to NULL! Then I noticed the double quotes enclosed values in the rest 19 columns except the first. In the text editor(I use jEdit) I did a find and replace for " and tried to replace it with nothing. But it crashed. Same thing happened with Notepad++. This was due to the high volume of data.
I straight away went to MS Access and in the export wizard I removed the TEXT QUALIFIER as " and set it to {none} in the third dialog of the wizard. This time all the data in the export was without double quotes. I ran the MySQL's LOAD DATA command as written earlier, again and still I got warnings.
Then I modified the command a bit as follows to indicate the comma delimitation:
LOAD DATA INFILE 'G:\TRANSACTIONS.TXT' INTO TABLE transactions FIELDS TERMINATED BY ','
and executed it. Voila! The first success! It all went well and within a second, 1 lakh entries were inside MySQL. And this time without any errors and warnings.
Attempt 4: Oracle
Now after a fortune with MySQL I had to get the data anyhow into Oracle. I created a table in Oracle with the requisite number of columns all with NUMBER data type. Then I wrote a PHP program to automate the task, and in 1 minutes 47 seconds it did the job! Here is the program:


<?php
ini_set("max_execution_time",1200);
// Connect to the Oracle 10g XE database
$conn = oci_connect('user', 'pass', '127.0.0.1/XE');

if (!$conn){
   $e = oci_error();
   trigger_error(htmlentities($e['message'], ENT_QUOTES), E_USER_ERROR);
}
$blah=file_get_contents("TRANSACTIONS.TXT");
$blah1=explode("\n",$blah);
$i=0;
while($i<count($blah1)){
   $blah2="INSERT INTO transactions VALUES(".$blah1[$i].")";
   //use the below line instead of above if you need to dump a file that has field values enclosed with double quotes
   //$blah2="INSERT INTO transactions VALUES(".str_replace('"','',$blah1[$i]).")";
   //echo $blah2;
   $stid = oci_parse($conn, $blah2);
   oci_execute($stid);
   $i++;
}
echo "Done!";
?>
The latest update is that I successfully dumped the file that I had dumped in MySQL(i.e. the one without double quotes enclosed field values), into Oracle using the built in Load Text Data technique. The errors earlier were due to the double quotes which made the data as string but the column types were NUMBERs in the table.

1 comment:

freaky_ss said...

gud job once again.......i think i cn also do d same in .NET