patternsqlMinor
Why would call to scalar function inside a Table Value Function be slower than outside the TVF?
Viewed 0 times
whythescalarslowerfunctionthanvaluetvfwouldcall
Problem
I am writing a Table Value Function, calling the function takes 10x as long as directly running the code. I traced this to a call to a multi-line scalar function inside the TVF. The call to the scalar function is excessively slow WHEN called from within the TVF. The scalar function takes 3 int parameters and returns a single int result.
What would cause it to be slower from within a TVF?
First, the TVF is basically a sort of pivot table, returning just one row, with 13 columns.
The scalar function is a multiline scalar, it looks for a matched set of keys (ie input is columnA, output is columnB, where input = columnA and active) in one of two tables. Searching table 1, then table 2 and finally table 1 again with a slightly changed where clause.
Usage was originally a cte query with a join clause:
This cte was originally called 11 times, returning from (about half the time) zero to maybe 3 rows (with a where clause like c1=42). So, I thought, hey, it's getting called a lot, since there are really only 4-15 rows in total, and the base tables are ~1,000,000, I'll cut that down to just 15 calls by putting the values into a table variable, and then doing an update. This wasn't really any faster but it did let me prove that it was the call to the scalar function that was slowing things down.
That looked like:
The @cte had, as I said, from 4 to 15 rows (11 in my most used setId). Putting a return first after, and then before th
What would cause it to be slower from within a TVF?
First, the TVF is basically a sort of pivot table, returning just one row, with 13 columns.
The scalar function is a multiline scalar, it looks for a matched set of keys (ie input is columnA, output is columnB, where input = columnA and active) in one of two tables. Searching table 1, then table 2 and finally table 1 again with a slightly changed where clause.
Usage was originally a cte query with a join clause:
With cte as ( select count(*) over (partition by c.c1) recs,
c.setId, c.c1, c.c2, n.name
From tbl c
Inner join tbl n on n.id = schma.scal(c.c1, c.c2, c.c3)
Where c.setId=@setId and c.endDt is null
)This cte was originally called 11 times, returning from (about half the time) zero to maybe 3 rows (with a where clause like c1=42). So, I thought, hey, it's getting called a lot, since there are really only 4-15 rows in total, and the base tables are ~1,000,000, I'll cut that down to just 15 calls by putting the values into a table variable, and then doing an update. This wasn't really any faster but it did let me prove that it was the call to the scalar function that was slowing things down.
That looked like:
Insert into @cte(recs, setId, c1, c2)
select count(*) over (partition by c.c1) recs, c.setId, c.c1, c.c2
From tbl c
Where c.setId=@setId and c.endDt is null;
Update c
Set nId =schma.scal(c.c1, c.c2, c.c3)
From @cte cThe @cte had, as I said, from 4 to 15 rows (11 in my most used setId). Putting a return first after, and then before th
Solution
Scalar functions are called once-per-row, when called as part of a query.
Consider the following example.
-
Create a new, blank database for our tests:
-
Create a table, a multi-statement function, and a table-valued-function:
-
Insert some sample data into the table:
-
Create a table to store function execution stats, and populate it with a start-row showing execution counts for the multi-statement-function,
-
Run a query against the TVF:
-
Capture the execution stats now:
-
The function stats results:
╔═════════╦═════════════╦═════════════════╗
║ run_num ║ object_name ║ execution_count ║
╠═════════╬═════════════╬═════════════════╣
║ 1 ║ t_func ║ 0 ║
║ 2 ║ t_func ║ 2 ║
╚═════════╩═════════════╩═════════════════╝
As you can see, the multi-statement-function has execute twice, once per row for the source table accessed by the TVF.
I expect the mutli-statement-function is being called many, many times by the TVF, giving the impression that it is running slowly, whereas in fact it is simply being called many times.
Consider the following example.
-
Create a new, blank database for our tests:
USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GO-
Create a table, a multi-statement function, and a table-valued-function:
USE mv;
GO
CREATE TABLE dbo.t
(
t_id int NOT NULL
CONSTRAINT PK_t
PRIMARY KEY CLUSTERED
);
GO
CREATE FUNCTION dbo.t_func
(
@t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE @r bit;
IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
SET @r = 1
ELSE
SET @r = 0;
RETURN @r;
END
GO
CREATE FUNCTION dbo.t_tvf
(
@min_t_id int
, @max_t_id int
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
SELECT t_id = t.t_id
, e = dbo.t_func(dbo.t.t_id)
FROM dbo.t
WHERE t.t_id >= @min_t_id
AND t.t_id <= @max_t_id
);
GO-
Insert some sample data into the table:
INSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GO-
Create a table to store function execution stats, and populate it with a start-row showing execution counts for the multi-statement-function,
t_func:CREATE TABLE dbo.function_stats
(
run_num int NOT NULL
, object_name sysname NOT NULL
, execution_count int NULL
, CONSTRAINT PK_function_stats
PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
, o.name
, COALESCE(fs.execution_count, 0)
FROM sys.objects o
LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GO-
Run a query against the TVF:
SELECT t.*
FROM dbo.t_tvf(1, 2) t;
GO-
Capture the execution stats now:
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 2
, o.name
, COALESCE(fs.execution_count, 0)
FROM sys.objects o
LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';-
The function stats results:
SELECT *
FROM dbo.function_stats fs
ORDER BY fs.run_num
, fs.object_name;╔═════════╦═════════════╦═════════════════╗
║ run_num ║ object_name ║ execution_count ║
╠═════════╬═════════════╬═════════════════╣
║ 1 ║ t_func ║ 0 ║
║ 2 ║ t_func ║ 2 ║
╚═════════╩═════════════╩═════════════════╝
As you can see, the multi-statement-function has execute twice, once per row for the source table accessed by the TVF.
I expect the mutli-statement-function is being called many, many times by the TVF, giving the impression that it is running slowly, whereas in fact it is simply being called many times.
Code Snippets
USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GOUSE mv;
GO
CREATE TABLE dbo.t
(
t_id int NOT NULL
CONSTRAINT PK_t
PRIMARY KEY CLUSTERED
);
GO
CREATE FUNCTION dbo.t_func
(
@t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE @r bit;
IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
SET @r = 1
ELSE
SET @r = 0;
RETURN @r;
END
GO
CREATE FUNCTION dbo.t_tvf
(
@min_t_id int
, @max_t_id int
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
SELECT t_id = t.t_id
, e = dbo.t_func(dbo.t.t_id)
FROM dbo.t
WHERE t.t_id >= @min_t_id
AND t.t_id <= @max_t_id
);
GOINSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GOCREATE TABLE dbo.function_stats
(
run_num int NOT NULL
, object_name sysname NOT NULL
, execution_count int NULL
, CONSTRAINT PK_function_stats
PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
, o.name
, COALESCE(fs.execution_count, 0)
FROM sys.objects o
LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GOSELECT t.*
FROM dbo.t_tvf(1, 2) t;
GOContext
StackExchange Database Administrators Q#76156, answer score: 5
Revisions (0)
No revisions yet.