patternsqlMinor
SQL Server 2016 concurrency limitations? Tuning for db concurrency
Viewed 0 times
serversqltuninglimitationsforconcurrency2016
Problem
I have a table with ~1 billion timestamped records, and each record holds an FK to a session table (one session per day & 3-500,000 records per day), so finding the records for a given day is simply an integer join.
I am trying to analyze the data in this table (with data grouped by session), and I can run a complete analysis (every record) in 70 minutes when using a C# console app from a client machine. When I try to run a similar analysis directly in TSQL, it takes over 12 hours. I expect some penalty, as the TSQL query uses a scalar function and a custom aggregate (clr)).
My Question: In C#, I understand how to maximize and tune concurrency, so the 70 minutes is a tuned number. Is it possible to tune a query for maximal concurrency directly in SQL, or is that better left to C# apis? (I could also do this work in R, in db or outside, but the .Net concurrency APIs strike me as superior.)
Query:
Misc
I am trying to analyze the data in this table (with data grouped by session), and I can run a complete analysis (every record) in 70 minutes when using a C# console app from a client machine. When I try to run a similar analysis directly in TSQL, it takes over 12 hours. I expect some penalty, as the TSQL query uses a scalar function and a custom aggregate (clr)).
My Question: In C#, I understand how to maximize and tune concurrency, so the 70 minutes is a tuned number. Is it possible to tune a query for maximal concurrency directly in SQL, or is that better left to C# apis? (I could also do this work in R, in db or outside, but the .Net concurrency APIs strike me as superior.)
Query:
SELECT TypeNumber, SessionId, dbo.udf_SessionName([timestamp]) SessionName,
CAST(max(price)-min(price) AS REAL) as Variance, sum(EventNumber) as Volume,
dbo.Direction(price,[timestamp]) as MoveDirection
INTO temp.AnalysisResults
FROM MyTable
WHERE ISNULL(price,0)<>0
GROUP BY TypeNumber, SessionId, dbo.udf_SessionName([timestamp])Misc
- Bulk Logged enabled for this query, due to the insert
- Primary key is not used in this query (it's a composite key across three fields, which is not needed here. However, the query plan is showing that this index is being scanned, not the index I mention below (which the plan initially recommended)).
- Row-level compression is enabled
- Data spans five years, with a readonly filegroup for each month (partitioned by month); all filegroups reside on the same SSD (not great, I know)
- Index: non-clustered on SessionId asc, include TypeNumber,Timestamp,Price
- 4 CPU cores available
- The scalar function takes each timestamp, converts it to localtime with AT TIME ZONE (two calls), and looks it up in a 5-record table.
Solution
Before anything else, I think you need to try a few things to cut down on that 12+ hour query:
-
First thing I would check is the index. You have a GROUP BY on
-
Next is the scalar UDF. These are known to be bad for several reasons. And using
-
converting this to an inline TVF usually works wonders in general, but not sure if that will conflict with what might be best for the index
-
part of what makes scalar UDFs slow is that they prevent parallel plans. A SQLCLR Scalar UDF marked as
-
Complicating matters a bit for the SQLCLR UDF approach is that you are doing a lookup to a table:
If the values within
-
Regardless of whether you add (if not already doing so)
-
Once the NONpersisted Computed Column exists, you can create the actual index on:
-
You might need to rethink the
-
I am not sure of the overall performance impact, but I have never been a fan of the
-
As @LowlyDBA noted in a comment on the question: be cautious of using
-
First thing I would check is the index. You have a GROUP BY on
TypeNumber, SessionId, dbo.udf_SessionName([timestamp]) yet the index is on SessionId asc, INCLUDE TypeNumber,Timestamp,Price. Meaning, the order is not the same (and hence likely why the index is being ignored and the table is being scanned). You need the index, at the very least, to start with TypeNumber, SessionId to match the GROUP BY ordering of those columns. And then INCLUDE ([price], [timestamp], [EventNumber]) to make it a covering-index. There is more to say about the index, but that leads into the next part...-
Next is the scalar UDF. These are known to be bad for several reasons. And using
AT TIME ZONE is going to not be super fast. So, consider:-
converting this to an inline TVF usually works wonders in general, but not sure if that will conflict with what might be best for the index
-
part of what makes scalar UDFs slow is that they prevent parallel plans. A SQLCLR Scalar UDF marked as
DataAccess and SystemDataAccess both = none AND IsDeterministic=true won't prevent a parallel plan :-). If you are converting from UTC to local time (or vice versa) then you can use a class that can be called in a SAFE Assembly. If you need to convert from various time zones then you need to use the TimezoneInfo class (I think) and that requires that the Assembly be marked as UNSAFE. Being UNSAFE won't take away the benefit of allowing parallel plans, but if at all possible to go with the SAFE method, then do that.-
Complicating matters a bit for the SQLCLR UDF approach is that you are doing a lookup to a table:
MarketSessions. You mentioned this was just 5 rows. Are those 5 rows fairly static? If so, you can probably still get away with having a SQLCLR UDF that does not do any data access by creating a static collection in the Assembly, and then populate it from the table in a static class constructor. The static class constructor will execute any time that the Assembly is loaded and can prefill the collection with the values needed to check against in the udf_SessionName UDF. The only issue is that there is no internal context connection available in the static class constructor, so the Assembly will need to be marked as EXTERNAL_ACCESS. But the UDF won't be calling SqlConnection, it will just read from a static collection :-).If the values within
MarketSessions are more volatile, you can always create a SQLCLR UDF or Stored Procedure that calls the same method to populate the static collection that the class constructor calls. Then you can execute that just prior to running this query so that the internal static collection has the "current" records from that table. But, in this case, you probably won't be able to do the following two steps as the indexed value might be stale / incorrect. But you would still get the benefit of being able to have a parallel plan.-
Regardless of whether you add (if not already doing so)
WITH SCHEMABINDING to the T-SQL UDF or converting that to a SQLCLR UDF with the attributes set as noted above, you should add a column to the table to be a NONPERSISTED Computed Column that is merely the call to the UDF.-
Once the NONpersisted Computed Column exists, you can create the actual index on:
TypeNumber, SessionId, computedColumn INCLUDE ([price], [timestamp], [EventNumber]). You might need to also set the SqlFunction attribute IsPrecise=true for the SQLCLR UDF in order to get it to be indexable.-
You might need to rethink the
ISNULL. price might just be an INCLUDE column so the ISNULL function might not present a problem for using the index here, but you might need to break it out into price <> 0 AND price IS NOT NULL.-
I am not sure of the overall performance impact, but I have never been a fan of the
SELECT...INTO construct. It might be better to create the table first and then do INSERT INTO...SELECT.-
As @LowlyDBA noted in a comment on the question: be cautious of using
REAL (or FLOAT) for financial values. Yes, they compact better and transfer faster, but they can also sometimes get extra values at the very low end. I certainly wouldn't use that datatype if you were doing calculations. But to merely pass back to the app might be ok. But if you are doing calculations, you really should be using DECIMAL() or even MONEY.Context
StackExchange Database Administrators Q#197416, answer score: 9
Revisions (0)
No revisions yet.