HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Creating many random test database entries

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomcreatingdatabasetestmanyentries

Problem

I'm working on a simple time sheet webapp. I have created the following query (simplified - I actually have several mapped columns of a similar type to project_id) to generate test data:

INSERT INTO `entries` (`entry_id`, `user_id`, `project_id`, `date`, `comment`, `hours`)
VALUES
( null,
  0,
  (SELECT `project_id` FROM projects ORDER BY RAND() LIMIT 1),
  CURRENT_DATE(),
  '# TEST DATA #',
  (SELECT ROUND((0.25 + RAND() * (24 - 0.24)), 2))
);


I'm currently running this query inside a PHP loop.

This code is currently adequate for my needs now, as I can iterate 50 loops in 0.3037 seconds. However, I fear that when it comes time to test large data sets (searching and report generation for rows > 1 000 000), I may run into problems creating them.

How can I optimize this algorithm? Should I consider using a stored procedure? Or should I just not worry about it, and run the loop to longer iterations, more times?

DROP PROCEDURE IF EXISTS `create_test_entries`;

DELIMITER //

CREATE PROCEDURE `create_test_entries` (IN number INT)
LANGUAGE SQL
DETERMINISTIC
SQL SECURITY INVOKER
COMMENT 'Creates `number` of random test entries in the timesheet'
BEGIN
    DECLARE i INT;
    SET i = 0;

    WHILE i < number DO
        INSERT INTO `entries` (`entry_id`, `user_id`, `project_id`, `department_id`, `task_id`, `date`, `comment`, `hours`)
        VALUES
        ( null,
          0,
          (SELECT `project_id` FROM projects ORDER BY RAND() LIMIT 1),
          CURRENT_DATE(),
          '# TEST DATA #',
          (SELECT ROUND((0.25 + RAND() * (24 - 0.24)), 2))
        );
        SET i = i + 1;
    END WHILE;
END //

Solution

It feels like (SELECT project_id FROM projects ORDER BY RAND() LIMIT 1) stands for "get random but still valid project_id". You could probably just iterate over all of your projects (with no randomness) and for every project add a random number of records. I don't think it affects your intention but this will probably work faster.

Context

StackExchange Code Review Q#14411, answer score: 2

Revisions (0)

No revisions yet.