HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Any way to speed up this UPDATE?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thisupdateanywayspeed

Problem

SQL:

CREATE FUNCTION dbo.fnRandomForeNames ()
RETURNS VARCHAR(50)
AS
BEGIN
RETURN (
SELECT TOP 1 [FirstName]
FROM [tmp_ForeNames]
ORDER BY (SELECT new_id from GetNewID)
)
END
GO


Similar functions for dbo.fnRandomSurNames() etc.

UPDATE Table1
SET firstname = dbo.fnRandomForeNames(),
lastname = dbo.fnRandomSurNames(),
address1 = dbo.fnRandomAddress1(),
address2 = dbo.fnRandomAddress2(),
address3 = dbo.fnRandomAddress3(),
birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01')


My C# Code:

private void RunThis(string connString, StreamReader sr)
    {
        sr.BaseStream.Position = 0;
        string sqlQuery = sr.ReadToEnd();
        using (SqlConnection connection = new SqlConnection(connString))
        {
            Server server = new Server(new ServerConnection(connection));
            server.ConnectionContext.StatementTimeout = 4200;
            server.ConnectionContext.ExecuteNonQuery(sqlQuery);
        }
        sr.Close();
    }


........

RunThis(e.Argument.ToString(), _updateClaim);


Where e.Argument.ToString() is the connection string.

The CREATE FUNCTION scripts are run earlier, take very little time to run.
Also, names are stored in tmp databases, these are entered in C# via arrays.
These also take very little time to run.

Table1 contains approx 140,000 rows and takes approx. 14 mins to complete.

I have also tried using parameterised SQL queries, skipping the tmp tables and SQL functions and instead creating the SQL query and executing it from the code, such as the following:

UPDATE Table1
SET lastname = '{0}',
firstname = '{1}',
birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01'),
address1 = '{2}',
address2 = '{3}',
address3 = '{4}'
WHERE u_id = '{6}'


And some C#:

```
using (SqlConnection connection = new SqlConnection(connString))
{
connection.Open();
fo

Solution

Not sure what that ORDER BY (SELECT new_id from GetNewID), but comparing the following approaches, second is much faster and spends most of the time in COUNT(*), which could be pre-calculated.

SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()

DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)

SELECT name FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
    FROM master.sys.all_objects
) AS names
WHERE n = @n


I guess you could make it even faster by materializing integer sequential id inside your names tables and making a clustered index on that.

Code Snippets

SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()

DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)

SELECT name FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
    FROM master.sys.all_objects
) AS names
WHERE n = @n

Context

StackExchange Code Review Q#14980, answer score: 3

Revisions (0)

No revisions yet.