HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Removing image records if no physical file exists

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileimageremovingrecordsexistsphysical

Problem

I have a working script that selects image fields in all tables and empty their values if the physical file doesnt exist.

$query1 = "SELECT table_name,column_name
           FROM information_schema.columns
           WHERE table_schema='schemaname' AND column_name like '%image%' or column_name='video'";

$result1 = mysql_query($query1) or die(mysql_error() . " -- " . $query1);

while($row1 = mysql_fetch_row($result1)){
    if (!strpos($row1[0],'backup') > 0){
            $sql = "Select COLUMN_NAME FROM information_schema.columns WHERE TABLE_NAME = '".$row1[0]."' AND EXTRA = 'auto_increment'";
            $resultcol = mysql_query($sql);
            $rowcol = mysql_fetch_row($resultcol);

        $query2 = "SELECT " . $row1[1] . ", " .$rowcol[0] . "
               FROM " . $row1[0] . "
               WHERE " . $row1[1] . " != '' AND " . $row1[1] . " IS NOT NULL
               ";
        echo $query2 . "";  
        $result2 = mysql_query($query2) or die(mysql_error() . " -- " . $query2);

        while ($rowdb =  mysql_fetch_row($result2)){
            if (!strpos($rowdb[0],'facebook') > 0 && !file_exists($img_root.'/'.$rowdb[0])){
                $sql = "UPDATE ".$row1[0]." SET ". $row1[1] . " = '' WHERE " . $rowcol[0]. "= ".$rowdb[1];
                echo $sql . "";
                $delete_count++;
                //mysql_query("UPDATE ".$row1[0]." SET ". $row1[1] . " = '' WHERE id = ".$row1["id"]);
            }
        }
    }
}


The script is working fine, but it takes time. I was wondering if there is a smarter (more optimized) way to get the same function.

Solution

The mysql_* functions are deprecated. Use mysqli or PDO instead.

Each query that you issue has a rather large overhead: you have to communicate the query to the server; the server has to parse, plan, and execute the query; the results have to be returned to your application. Therefore, you want to avoid running any query in a loop. You're much better off executing fewer complex queries than many simple queries.

First, note that your $query1 has a bug: AND has higher precedence than OR, so you need parentheses. Also, you eventually ignore any column named …backup, so you might as well filter those out immediately.

SELECT TABLE_NAME AS table_name, COLUMN_NAME AS image_col
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE
        TABLE_SCHEMA = 'schemaname'
        AND TABLE_NAME NOT LIKE '%backup%'
        AND (COLUMN_NAME LIKE '%image%' OR COLUMN_NAME = 'video');


Let's clarify the naming.

  • Your $row1[0] is what I call table_name



  • Your $row1[1] is what I call image_col



Notionally, you want to execute this query for each result returned by that $schema query.

UPDATE `$schema['table_name']`
    SET `$schema['image_col']` = ''
    WHERE
        `$schema['image_col']` != '' AND `$schema['image_col']` IS NOT NULL
        AND `$s['image_col']` NOT LIKE '%facebook%'
        AND NOT file_exists('$img_root/$schema['image_col']')


Of course, there are two problems with that idea:

  • There is no file_exists() function in MySQL (and furthermore, if such a function existed, it would be wrong to run it on the database server rather than on the application server, since they could be on different machines with different filesystems).



-
You have to execute many UPDATEs, not just one. Furthermore, the UPDATE queries are dynamic — the table and column names vary.

This is a strong indication that your database schema is poorly designed. There should not be a proliferation of tables! You should also not be doing version control or backups by creating tables named …backup. I encourage you to follow up with these issues on http://dba.stackexchange.com.

A solution to the first problem is to create a temporary table that lists all images on your website's filesystem, then write the UPDATE with an anti-join.

$db = new mysqli(…);
$db->query('CREATE TEMPORARY TABLE fs_images ( filename VARCHAR(128) PRIMARY KEY );');
$ins = $db->prepare('INSERT INTO fs_images VALUES (?)');
$file_it = new RecursiveDirectoryIterator($img_root);
foreach (new RecursiveIteratorIterator($file_it) as $f) {
    $ins->bind_param('s', basename($f));
    $ins->execute();
}
$ins->close();


A solution to the second problem is to use a stored procedure with a loop that executes dynamically composed SQL queries with a cursor.

DELIMITER $
CREATE PROCEDURE clear_images()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE table_name VARCHAR(64);
    DECLARE image_col VARCHAR(64);

    DECLARE schema_cursor CURSOR FOR
        SELECT TABLE_NAME AS table_name, COLUMN_NAME AS image_col
            FROM INFORMATION_SCHEMA.COLUMNS
            WHERE
                TABLE_SCHEMA = 'schemaname'
                AND TABLE_NAME NOT LIKE '%backup%'
                AND (COLUMN_NAME LIKE '%image%' OR COLUMN_NAME = 'video');
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN schema_cursor;
    schema_loop: LOOP
        FETCH schema_cursor INTO table_name, image_col;
        IF done THEN
            LEAVE schema_loop;
        END IF;

        PREPARE stmt FROM CONCAT(
            'UPDATE `', @table_name, '`',
            '    SET `', @image_col, '` = ''''',
            '    WHERE',
            '        `', @image_col, '` != '''' AND `', @image_col, '` IS NOT NULL',
            '        AND `', @image_col, '` NOT LIKE ''%facebook%''',
            '        AND `', @image_col, '` NOT IN (SELECT filename FROM fs_images);'
        );
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
    END LOOP;
END$
DELIMITER ;


Anyway, this solution is rather nasty (and untested!) — all as a result of your poor database schema design.

Code Snippets

SELECT TABLE_NAME AS table_name, COLUMN_NAME AS image_col
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE
        TABLE_SCHEMA = 'schemaname'
        AND TABLE_NAME NOT LIKE '%backup%'
        AND (COLUMN_NAME LIKE '%image%' OR COLUMN_NAME = 'video');
UPDATE `$schema['table_name']`
    SET `$schema['image_col']` = ''
    WHERE
        `$schema['image_col']` != '' AND `$schema['image_col']` IS NOT NULL
        AND `$s['image_col']` NOT LIKE '%facebook%'
        AND NOT file_exists('$img_root/$schema['image_col']')
$db = new mysqli(…);
$db->query('CREATE TEMPORARY TABLE fs_images ( filename VARCHAR(128) PRIMARY KEY );');
$ins = $db->prepare('INSERT INTO fs_images VALUES (?)');
$file_it = new RecursiveDirectoryIterator($img_root);
foreach (new RecursiveIteratorIterator($file_it) as $f) {
    $ins->bind_param('s', basename($f));
    $ins->execute();
}
$ins->close();
DELIMITER $$
CREATE PROCEDURE clear_images()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE table_name VARCHAR(64);
    DECLARE image_col VARCHAR(64);

    DECLARE schema_cursor CURSOR FOR
        SELECT TABLE_NAME AS table_name, COLUMN_NAME AS image_col
            FROM INFORMATION_SCHEMA.COLUMNS
            WHERE
                TABLE_SCHEMA = 'schemaname'
                AND TABLE_NAME NOT LIKE '%backup%'
                AND (COLUMN_NAME LIKE '%image%' OR COLUMN_NAME = 'video');
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN schema_cursor;
    schema_loop: LOOP
        FETCH schema_cursor INTO table_name, image_col;
        IF done THEN
            LEAVE schema_loop;
        END IF;

        PREPARE stmt FROM CONCAT(
            'UPDATE `', @table_name, '`',
            '    SET `', @image_col, '` = ''''',
            '    WHERE',
            '        `', @image_col, '` != '''' AND `', @image_col, '` IS NOT NULL',
            '        AND `', @image_col, '` NOT LIKE ''%facebook%''',
            '        AND `', @image_col, '` NOT IN (SELECT filename FROM fs_images);'
        );
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
    END LOOP;
END$$
DELIMITER ;

Context

StackExchange Code Review Q#44161, answer score: 2

Revisions (0)

No revisions yet.