HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why would call to scalar function inside a Table Value Function be slower than outside the TVF?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whythescalarslowerfunctionthanvaluetvfwouldcall

Problem

I am writing a Table Value Function, calling the function takes 10x as long as directly running the code. I traced this to a call to a multi-line scalar function inside the TVF. The call to the scalar function is excessively slow WHEN called from within the TVF. The scalar function takes 3 int parameters and returns a single int result.

What would cause it to be slower from within a TVF?

First, the TVF is basically a sort of pivot table, returning just one row, with 13 columns.

The scalar function is a multiline scalar, it looks for a matched set of keys (ie input is columnA, output is columnB, where input = columnA and active) in one of two tables. Searching table 1, then table 2 and finally table 1 again with a slightly changed where clause.

Usage was originally a cte query with a join clause:

With cte as ( select count(*) over (partition by c.c1) recs,
                       c.setId, c.c1, c.c2, n.name
                From tbl c
                     Inner join tbl n on n.id = schma.scal(c.c1, c.c2, c.c3)
                Where c.setId=@setId and c.endDt is null
   )


This cte was originally called 11 times, returning from (about half the time) zero to maybe 3 rows (with a where clause like c1=42). So, I thought, hey, it's getting called a lot, since there are really only 4-15 rows in total, and the base tables are ~1,000,000, I'll cut that down to just 15 calls by putting the values into a table variable, and then doing an update. This wasn't really any faster but it did let me prove that it was the call to the scalar function that was slowing things down.

That looked like:

Insert into @cte(recs, setId, c1, c2)
    select count(*) over (partition by c.c1) recs, c.setId, c.c1, c.c2
    From tbl c
    Where c.setId=@setId and c.endDt is null;

    Update c
     Set nId =schma.scal(c.c1, c.c2, c.c3)
    From @cte c


The @cte had, as I said, from 4 to 15 rows (11 in my most used setId). Putting a return first after, and then before th

Solution

Scalar functions are called once-per-row, when called as part of a query.

Consider the following example.

-
Create a new, blank database for our tests:

USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
    ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GO


-
Create a table, a multi-statement function, and a table-valued-function:

USE mv;
GO
CREATE TABLE dbo.t
(
    t_id int NOT NULL
        CONSTRAINT PK_t
        PRIMARY KEY CLUSTERED
);
GO

CREATE FUNCTION dbo.t_func
(
    @t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @r bit;
    IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
        SET @r = 1
    ELSE
        SET @r = 0;
    RETURN @r;
END
GO

CREATE FUNCTION dbo.t_tvf
(
    @min_t_id int
    , @max_t_id int
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
RETURN (
    SELECT t_id = t.t_id
        , e = dbo.t_func(dbo.t.t_id)
    FROM dbo.t
    WHERE t.t_id >= @min_t_id
        AND t.t_id <= @max_t_id
);
GO


-
Insert some sample data into the table:

INSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GO


-
Create a table to store function execution stats, and populate it with a start-row showing execution counts for the multi-statement-function, t_func:

CREATE TABLE dbo.function_stats
(
    run_num int NOT NULL
    , object_name sysname NOT NULL
    , execution_count int NULL 
    , CONSTRAINT PK_function_stats
        PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GO


-
Run a query against the TVF:

SELECT t.*
FROM dbo.t_tvf(1, 2) t;
GO


-
Capture the execution stats now:

INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 2
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';


-
The function stats results:

SELECT *
FROM dbo.function_stats fs
ORDER BY fs.run_num
    , fs.object_name;


╔═════════╦═════════════╦═════════════════╗
║ run_num ║ object_name ║ execution_count ║
╠═════════╬═════════════╬═════════════════╣
║ 1 ║ t_func ║ 0 ║
║ 2 ║ t_func ║ 2 ║
╚═════════╩═════════════╩═════════════════╝

As you can see, the multi-statement-function has execute twice, once per row for the source table accessed by the TVF.

I expect the mutli-statement-function is being called many, many times by the TVF, giving the impression that it is running slowly, whereas in fact it is simply being called many times.

Code Snippets

USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
    ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GO
USE mv;
GO
CREATE TABLE dbo.t
(
    t_id int NOT NULL
        CONSTRAINT PK_t
        PRIMARY KEY CLUSTERED
);
GO

CREATE FUNCTION dbo.t_func
(
    @t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @r bit;
    IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
        SET @r = 1
    ELSE
        SET @r = 0;
    RETURN @r;
END
GO

CREATE FUNCTION dbo.t_tvf
(
    @min_t_id int
    , @max_t_id int
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
RETURN (
    SELECT t_id = t.t_id
        , e = dbo.t_func(dbo.t.t_id)
    FROM dbo.t
    WHERE t.t_id >= @min_t_id
        AND t.t_id <= @max_t_id
);
GO
INSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GO
CREATE TABLE dbo.function_stats
(
    run_num int NOT NULL
    , object_name sysname NOT NULL
    , execution_count int NULL 
    , CONSTRAINT PK_function_stats
        PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GO
SELECT t.*
FROM dbo.t_tvf(1, 2) t;
GO

Context

StackExchange Database Administrators Q#76156, answer score: 5

Revisions (0)

No revisions yet.