HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Enhance CSV file database import

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filecsvdatabaseenhanceimport

Problem

I'm using the script below to import a large .csv file to my database. If the table is empty the process takes about 5 minutes to finish on a local machine. If I'm using the file to update existing values on the same table it takes more than 15 minutes to finish. My .csv file contains about 35,000 rows.

How can I speed up the process?

```
if ( $request->get( $_POST["action"] ) == "import" ) {

$file = $upload->file_upload( "import", "media/import" );
if ( file_exists( DIR_UPLOAD_PHOTO . "/media/import/" . $file ) ) {

$file = DIR_UPLOAD_PHOTO . "/media/import/" . $file;
$handle = fopen( $file, "r" );

if ( $handle ) {
$lines = explode( "\r", fread( $handle, filesize( $file ) ) );
}

$total_array = count( $array );

$x = 0;

foreach ( $lines as $line ) {

if ( $x >= 1 ) {
$data = explode( "|", $line );

$titlu = trim( addslashes( $data[0] ) );
$alias = $this->generate_seo_link( $titlu );
$gramaj = trim( $data[1] );
$greutate = trim( $data[2] );
$pret_total = trim( $data[3] );
$pret_redus = trim( $data[4] );
$poza = trim( $data[5] );
$pret_unitar = trim( $data[6] );
$categorie = trim( $data[7] );
$brand = trim( addslashes( $data[8] ) );
$descriere = trim( addslashes( $data[9] ) );
$vizibil = trim( $data[10] );
$cod = trim( $data[11] );
$nou = trim( $data[12] );
$cant_variabila = trim( $data[13] );
$congelat = trim( $data[14] );
$tva = trim( $data[15] );

Solution

Well this might just be my shortest post to date:

Why? Why bother parsing, and manually inserting each and every row individually if you can use something like:

LOAD DATA INFILE "/path/to/data.csv"
INTO TABLE produse
(titlu, alias, gramaj) -- add all column names here
COLUMNS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r' -- \r or \n, seems to be \r in your case for some reason
IGNORE 1 LINES; -- if your CSV contains a line containing field names or something


Check the full syntax, and possibilities here. It contains references to how you can handle duplicate keys (hint: REPLACE is what you're probably after).

Update, some actual code review:

Having said all that, I think one thing about your code just looked too worrying not to mention: you're obviously using a DB abstraction layer of sorts, or some extension. Please, make sure to use prepared statements. It might also be worth looking into fgetcsv, to extract data from CSV files in the future. Instead of that pesky fread + explode mess, you can simply write:

$handle = fopen('/path/to/data.csv', 'r');
$delimiter = '|';//your delimiter
$db->beginTransaction();//use transactions if you want
//create a prepared statement for the inserts
$stmt = $db->prepare('INSERT INTO my.table (field1, field2, field3) VALUES (:val1, :val2, :val3)');
//set the bind keys we're going to use
$bindKeys = array(':val1', ':val2', ':val3');
while ($line = fgetcsv($handle, 1024, $delimiter))
{
    $line = array_map('trim', $line);//trim all of the values in one line
    $bind = array_combine($bindKeys, array_values($line));//set the keys for the bind
    $stmt->execute($bind);//insert the lot
    $stmt->closeCursor();//optional, depending on the db, this might be required
}
$db->commit();//save changes...
fclose($handle);//
$db = null;//close DB connection
echo 'All done!', PHP_EOL;


This snippet does, pretty much, the same thing as your code does, only it's a lot shorter, safer and cleaner... The DB extension I've used here is PDO, in case you were wondering, but you can do the same thing with mysqli_*, too.
IF one of the inserts fail, an exception should be thrown (unless you've set the errmode to silent). If this happens, catch the exception and call $db->rollBack(); to avoid storing partial data.

There's also the issue of a rather weird:

$cod = trim($data[11]);//no spaces possible

if ( $cod != "" && $cod != " " )
//second condition is IMPOSIBURU


You also have this weird way of adding else clauses where they really don't serve any purpose:

} else {
     $alias = $alias;//??
}


Just leave out the else, it's noise. It's code-smell.

You also increment $x to work out if you're processing the first line (first element of $lines) or not. Why bother, why not drop the first element in the array, and process the rest as a whole:

array_shift($lines);//removes $lines[0]
foreach ($lines as $line)
{
    //process rest
}

Code Snippets

LOAD DATA INFILE "/path/to/data.csv"
INTO TABLE produse
(titlu, alias, gramaj) -- add all column names here
COLUMNS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r' -- \r or \n, seems to be \r in your case for some reason
IGNORE 1 LINES; -- if your CSV contains a line containing field names or something
$handle = fopen('/path/to/data.csv', 'r');
$delimiter = '|';//your delimiter
$db->beginTransaction();//use transactions if you want
//create a prepared statement for the inserts
$stmt = $db->prepare('INSERT INTO my.table (field1, field2, field3) VALUES (:val1, :val2, :val3)');
//set the bind keys we're going to use
$bindKeys = array(':val1', ':val2', ':val3');
while ($line = fgetcsv($handle, 1024, $delimiter))
{
    $line = array_map('trim', $line);//trim all of the values in one line
    $bind = array_combine($bindKeys, array_values($line));//set the keys for the bind
    $stmt->execute($bind);//insert the lot
    $stmt->closeCursor();//optional, depending on the db, this might be required
}
$db->commit();//save changes...
fclose($handle);//
$db = null;//close DB connection
echo 'All done!', PHP_EOL;
$cod = trim($data[11]);//no spaces possible

if ( $cod != "" && $cod != " " )
//second condition is IMPOSIBURU
} else {
     $alias = $alias;//??
}
array_shift($lines);//removes $lines[0]
foreach ($lines as $line)
{
    //process rest
}

Context

StackExchange Code Review Q#68976, answer score: 3

Revisions (0)

No revisions yet.