HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Find minutes with no data in database

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withdatabasefindminutesdata

Problem

I'm trying to show a table, for each ID, to see if there is data available. In this case, its measurements. If there's no data, that means the logger isn't doing its job properly.

I currently have two tables: data and times. data contains id, datetime, sensor_id, and value. times contains id and value.

Times is filled with 00:00, 00:01, 00:02, etc., all the way through to 23:59.

I have this query:

SELECT t.`value`, d.`sensor_id`, COUNT(t.`value`) as `numrows`
FROM `data` d
RIGHT JOIN `times` t
ON d.`datetime` LIKE CONCAT('% ', t.`value`, '%')
WHERE d.`datetime` LIKE '%$2014-11-05%'
AND d.`sensor_id` IN(1,2,3,4,5,999)
GROUP BY d.`sensor_id`, t.`value`
ORDER BY d.`sensor_id` ASC, t.`id` ASC

while($s = $select->fetch_assoc()) {
    $checkArray[$s['sensor_id']][$s['value']] = $s['numrows'];
}

foreach($checkArray as $key => $arr) {
    echo 'Sensor: ' . $key;
    for($i = 0; $i = 1) { //See if has at least one row
                echo 'YES DATA FOR ' . $time . '';
            }
        }
    }
}


Of course, I sort this in a table, and the result is this:

Just for sensors 1, 2, 3, 4, and 5, the load time is over 5.5 seconds. I don't know how to optimize this further. I've put indexes on the queried columns, but I can't think of anything else.

My SHOW CREATE TABLE for data:

CREATE TABLE `data` (
 `id` int(13) NOT NULL AUTO_INCREMENT,
 `sensor_id` int(13) NOT NULL,
 `datetime` datetime NOT NULL,
 `value` float NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `sensor_id_2` (`sensor_id`,`datetime`,`value`),
 KEY `sensor_id` (`sensor_id`),
 KEY `value` (`value`),
 KEY `datetime` (`datetime`),
 KEY `sensor_id_3` (`sensor_id`),
 KEY `datetime_2` (`datetime`)
) ENGINE=InnoDB AUTO_INCREMENT=103921 DEFAULT CHARSET=utf8


And my SHOW CREATE TABLE for times:

``
CREATE TABLE
minutes (
id int(13) NOT NULL AUTO_INCREMENT,
value varchar(16) NOT NULL,
PRIMARY KEY (
id),
UNIQUE KEY
id_3 (id),
KEY
value (value`),

Solution

Is this the only query on data that matters? You have too many simple indexes. For example, there's no reason to have an index on just sensor_id if you already have one with sensor_id in the front.

CREATE TABLE `data` (
 `id` int(13) NOT NULL AUTO_INCREMENT,
 `sensor_id` int(13) NOT NULL,
 `datetime` datetime NOT NULL,
 `value` float NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `sensor_id_2` (`sensor_id`,`datetime`,`value`)
)


The minutes table is even worse. There's only two columns but you have six indexes (the PRIMARY KEY is an index on id). Three would be serious overkill.

CREATE TABLE `minutes` (
 `id` int(13) NOT NULL AUTO_INCREMENT,
 `value` varchar(16) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `value_id` (`value`, `id`),
 UNIQUE KEY `id_value` (`id`, `value`)
)


Your query looks weird to me. You're doing string operations (LIKE) on a datetime field. This probably means that you are skipping the index and almost certainly that you aren't using the index to limit the possible rows. A more common form would be

SELECT t.`value`, d.`sensor_id`, COUNT(t.`value`) as `numrows`
FROM `data` d
RIGHT JOIN `times` t
  ON TIME_FORMAT(d.`datetime`, '%H:%i') = t.`value`
WHERE d.`datetime` >= '2014-11-05 00:00:00'
  AND d.`datetime` <= '2014-11-05 23:59:00'
  AND d.`sensor_id` IN (1,2,3,4,5,999)
GROUP BY d.`sensor_id`, t.`value`
ORDER BY d.`sensor_id` ASC, t.`value` ASC


And you should be aware that when you specify IN, you are repeating the query once for each value. So this query would run six times. You may find it easier to run the six queries on the PHP side, as they'll be smaller. This can also help your programming logic.

You might want to look up EXPLAIN plans, as they should help hint at what your problems are. You might consider posting your SQL and explain plan to Stack Overflow (hint, just add EXPLAIN before the SELECT and look at the results in an admin tool like phpMySQL).

It's possible that the thing to do is to replace your DATETIME column with a DATE column and a minutes_id column. Then your join would be on a fast integer join and you could do a straightforward date check.

Rather than doing a RIGHT JOIN, you might be better off with a regular inner join. Then look for missing times in PHP. Outer joins (left and right) are much slower than properly indexed inner joins. You may find it easier to look for the missing times in PHP than to have your database do it.

I find it questionable to read all the data in and then use it. It's often better to read the data and then use it immediately.

$times = array();
for ( $hour = 0; $hour fetch_assoc() ) {
    if ( $s['sensor_id'] != $last_sensor ) {
        echo 'Sensor: ' . $s['sensor_id'] . PHP_EOL;
        $last_sensor = $s['sensor_id'];

        // clear any remaining times from the last sensor
        while ( list(, $time) = each($times) ) {
            echo "$time 0" . PHP_EOL;
        }
        reset($times);
    }

    // iterate through until we find the current time
    list(, $time) = each($times);
    while ( $time != $s['value'] ) {
        // mark the missing times
        echo "$time 0" . PHP_EOL;
        list(, $time) = each($times);
    }

    echo "$time {$s['numrows']}" . PHP_EOL;
}

Code Snippets

CREATE TABLE `data` (
 `id` int(13) NOT NULL AUTO_INCREMENT,
 `sensor_id` int(13) NOT NULL,
 `datetime` datetime NOT NULL,
 `value` float NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `sensor_id_2` (`sensor_id`,`datetime`,`value`)
)
CREATE TABLE `minutes` (
 `id` int(13) NOT NULL AUTO_INCREMENT,
 `value` varchar(16) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `value_id` (`value`, `id`),
 UNIQUE KEY `id_value` (`id`, `value`)
)
SELECT t.`value`, d.`sensor_id`, COUNT(t.`value`) as `numrows`
FROM `data` d
RIGHT JOIN `times` t
  ON TIME_FORMAT(d.`datetime`, '%H:%i') = t.`value`
WHERE d.`datetime` >= '2014-11-05 00:00:00'
  AND d.`datetime` <= '2014-11-05 23:59:00'
  AND d.`sensor_id` IN (1,2,3,4,5,999)
GROUP BY d.`sensor_id`, t.`value`
ORDER BY d.`sensor_id` ASC, t.`value` ASC
$times = array();
for ( $hour = 0; $hour < 24; $hour++ ) {
    for ( $minute = 0; $minute < 60; $minute++ ) {
        $times[] = sprintf('%02u:%02u', $hour, $minute);
    }
}

reset($times);
$last_sensor = 0; // assumes that there is no sensor 0
while ( $s = $select->fetch_assoc() ) {
    if ( $s['sensor_id'] != $last_sensor ) {
        echo 'Sensor: ' . $s['sensor_id'] . PHP_EOL;
        $last_sensor = $s['sensor_id'];

        // clear any remaining times from the last sensor
        while ( list(, $time) = each($times) ) {
            echo "$time 0" . PHP_EOL;
        }
        reset($times);
    }

    // iterate through until we find the current time
    list(, $time) = each($times);
    while ( $time != $s['value'] ) {
        // mark the missing times
        echo "$time 0" . PHP_EOL;
        list(, $time) = each($times);
    }

    echo "$time {$s['numrows']}" . PHP_EOL;
}

Context

StackExchange Code Review Q#69146, answer score: 4

Revisions (0)

No revisions yet.