Recent Entries 10
- pattern minor 112d agoIs this function PARALLEL SAFE?Looking at the function `f_jsonb_select_keys` from this answer: How to select sub-object with given keys from JSONB? https://www.postgresql.org/docs/current/functions-aggregate.html Aggregate functions that support Partial Mode are eligible to participate in various optimizations, such as parallel aggregation. In here, jsonb_object_agg partial node is NO https://www.postgresql.org/docs/15/parallel-safety.html In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, it is probably best to label functions as UNSAFE. ``` CREATE OR REPLACE FUNCTION f_jsonb_select_keys(_js jsonb, _keys text[]) RETURNS jsonb LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS $func$ SELECT jsonb_object_agg(t.key, t.value) FROM jsonb_each(_js) t WHERE t.key = ANY (_keys); $func$; ``` - `jsonb_each` is parallel safe, but from the above excerpt from manual, `jsonb_object_agg` is parallel unsafe? - Overall, is the function `f_jsonb_select_keys` parallel safe?
- pattern minor 112d agoSQL Server generates a query plan with parallelism and TOP statementI have two simple queries: ``` SELECT TOP(20) * FROM Clients ORDER BY City ``` ``` SELECT TOP(20) Id, Name, City FROM Clients ORDER BY City ``` In the first case I get a result like this Id Name City many more columns 6 Person 6 NULL ... 2 Person 2 NULL ... 3 Person 3 NULL ... In the second case I get a result like this In the first case I get a result like this Id Name City 2 Person 2 NULL 3 Person 3 NULL 6 Person 6 NULL Notice that the order is different. Of course I understand that beacuse City is NULL for these three persons a unique order is not necessarily garantueed. However, upon inspecting the query plans I noticed that the first query used parallelism The second query did not: After reading the documentation about parallel query processing, it specifically mentions that certain constructs inhibit parallelism, such as the TOP operator. The only notable difference, apart form parralelism, is that the Top N Sort node of the parallelised plan has an estimated I/O cost of 16, while the Top N Sort node of the non-parallelised query plan has only an estimated I/O cost of 0.01 So my question is: why would it still use parallelism here when I use the TOP operator, even though microsoft stating it should inhibit the mechanism?
- pattern moderate 112d agoIs SQL Server able to use internal parallelism for an update statement?I am struggling to find the best way to migrate some "varchar" columns to "nvarchar". One of the options i am using is to add new nvarchar column(s) then update the values from the original column, drop the original column and rename the new one to the old name. I know it will generate a lot of UNDO and REDO data. Still, I have other limitations (mostly by SQL Server not supporting parallel DDLs and multi-column ALTER table operations), so let's focus on how to run the update statement faster. My Oracle experience is telling me to use internal parallelism, but is it available in SQL Server? I am not able to run this statement in parallel, although I especially created the table to be a heap table (no clustered index). ``` update t set new_col_1 = col_1 ,new_col_2 = col_2 ... , new_col_N = col_N ; ``` There are 3 text columns holding 400GB of data. There is limited IO performance from AWS RDS (10000 IOPS). We have just 4 hours of downtime window. In this particular migration the online rebuild is not an option as the data must be migrated (to nvarchar) before the application can be started. During startup it is checking whether the actual data types correspond to the defined ones (in the application metadata repository). I am aware of the fragmentation, but we just have no choice. Still, if there is some ONLINE rebuild command it might be useful as we will be able to migrate and de-fragment later. Actually, as one of the preparations steps, we are dropping the clustered index. Later this index will be created again, which i believe will fix the fragmentation issue, as we will move from a heap to b*tree structure. It is very frustrating that we can not use any other "parallel" technique. I am thinking to try manual parallel update, by running a few parallel update statements against not overlapping ranges of the target table. Still, the lock escalation, could be the next issue, as i am going to update millions of records in each of those updates, and
- snippet moderate 112d agoWhy is a parallel top N sort apparently much more CPU efficient than a serial top N sort?I'm testing against SQL Server 2019 CU14. I have a pure row-mode query that selects the top 50 rows from a complicated view. The full query takes 25426 ms of CPU time at MAXDOP 1 and 19068 ms of CPU time at MAXDOP 2. I'm not surprised that the parallel query uses less CPU time overall. The parallel query is eligible for bitmap operators and the query plan is different in a few ways. However, I am surprised by a large reported difference in operator times for the top N sort. In the serial plan, the top N sort is reported to have taken around 10 seconds of CPU time by operator execution statistics: The MAXDOP 2 plan reports around 1.6 seconds of CPU time for the same top N sort: I don't understand why such a large difference is reported between the two different query plans. The compute scalars in the parent operator are very simple and cannot explain the discrepancy in operator times. Here is what they look like: ``` [Expr1055] = Scalar Operator(CASE WHEN COLUMN_1 IS NULL THEN (0) ELSE datediff(day,COLUMN_1,getdate()) END), [Expr1074] = Scalar Operator(CASE WHEN [Expr1074] IS NULL THEN (0) ELSE [Expr1074] END) ``` There are other compute scalars in different parts of the plan. I uploaded anonymized actual plans for the serial plan and the parallel plan if someone wants to review them. When I load the full query results without TOP into a temp table and perform a TOP 50 sort on the temp table, both the parallel and the serial plan take around 1200 ms of CPU time to perform the sort. So, the reported operator time for the parallel sort in the full query feels reasonable to me. The ten seconds for the serial query does not. Why does the serial top N sort have a much higher reported CPU time than the parallel sort? Is it truly so much less efficient or could this be a bug with operation execution statistics?
- pattern minor 112d agoNon parallel plan only in Azure when SESSION_CONTEXT is usedI'm observing strange difference between query plans that I'm getting on my local machine and on Azure SQL. I'm trying to implement row level security, where I read user identifier from SESSION_CONTEXT and then in TVF I check whether the user has access. On my local machine - SQL Server 2019 Developer edition, DB in compatibility level 150 the query plans are as expected. But when I run it on Azure DB which is also 150 compatibility level, I only get non-parallel query plans with `NonParallelPlanReason="NonParallelizableIntrinsicFunction"` . I tried a Hyperscale database as well as a DB that is in Elastic Pool and the result is same on both DBs. You can reproduce that with following code: ``` CREATE TABLE Users ( UserIdentifier nvarchar(100) PRIMARY KEY CLUSTERED ) INSERT INTO Users (UserIdentifier) VALUES ('MyUserIdentifier') CREATE TABLE TableWithRLS ( Id int NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, DataColumn nvarchar(100) NULL ) INSERT INTO TableWithRLS (DataColumn) SELECT TOP 10000000 A.[name] FROM sys.all_columns AS A CROSS JOIN sys.all_columns AS B CROSS JOIN sys.all_columns AS C CREATE OR ALTER FUNCTION CheckAccess (@userIdentifier varchar(100)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT TOP 1 1 AS HasAccess FROM dbo.Users WHERE UserIdentifier = @userIdentifier EXEC sp_set_session_context N'UserIdentifier', N'MyUserIdentifier', 1 -- This query gets always non-parallel query plan on Azure SELECT MAX(DataColumn) FROM TableWithRLS AS X CROSS APPLY CheckAccess(CAST(SESSION_CONTEXT(N'UserIdentifier') AS nvarchar(100))) ``` When I select the value from session context into a variable first, then it generates parallelizable query plan even in Azure. ``` DECLARE @userIdentifier AS nvarchar(100) = CAST(SESSION_CONTEXT(N'UserIdentifier') AS nvarchar(100)) SELECT MAX(DataColumn) FROM TableWithRLS AS X CROSS APPLY CheckAccess(@userIdentifier) ``` Unfortunately I can't do that (or at least I'm not aware how to do that) because
- pattern minor 112d agoRunning parallel functions with union allI'm trying to run 3 functions at the same time using parallelism and union all. I've read that union queries are candidates for parallelism but i can't get it to work. The query looks like this (the function is the same, but it's on a different schema, but the result is the same even on the same schema) `-- The real function is a medium size sql query with CTEs, but i get the same result with this CREATE OR REPLACE FUNCTION fn(jsonb) RETURNS table( i int ) LANGUAGE sql PARALLEL SAFE SET search_path from current AS $function$ select 1 $function$; select * from catalog.fn('{}') union all select * from customer01.fn('{}') union all select * from customer02.fn('{}') ` And i'm using this setting to force parallelism (i've tried different numbers) I get this plan ``` Gather (cost=0.25..28.00 rows=3000 width=4) (actual time=7.320..10.669 rows=3 loops=1) Output: fn.i Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=388 -> Parallel Append (cost=0.25..28.00 rows=1500 width=4) (actual time=2.283..2.313 rows=2 loops=2) Buffers: shared hit=388 Worker 0: actual time=2.420..2.422 rows=1 loops=1 Buffers: shared hit=193 Worker 1: actual time=2.146..2.205 rows=2 loops=1 Buffers: shared hit=195 -> Function Scan on catalog.fn (cost=0.25..10.25 rows=1000 width=4) (actual time=2.145..2.145 rows=1 loops=1) Output: fn.i Function Call: catalog.fn('{}'::jsonb) Buffers: shared hit=193 Worker 1: actual time=2.145..2.145 rows=1 loops=1 Buffers: shared hit=193 -> Function Scan on customer01.fn fn_1 (cost=0.25..10.25 rows=1000 width=4) (actual time=2.419..2.419 rows=1 loops=1) Output: fn_1.i Function Call: customer01.fn('{}'::jsonb) Buffers: shared hit=193 Worker 0: actual time=2.419..2.419 rows=1 loops=1 Buffers: shared hit=193 -> Functio
- pattern minor 112d agoSQL Server Scalar UDF Parallelism Mystery!Referencing this blog post by Erik Darling on the website of my favorite SQL Server guru - Brent Ozar: When you select from that table alone, it shows "CouldNotGenerateValidParallelPlan". But, when you join that table to another that does not have a check constraint/computed column calling a Scalar UDF, the query goes parallel with "Good Enough Plan Found" ``` USE tempdb; SET NOCOUNT ON; SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS ID, DATEADD(MINUTE, m.message_id, SYSDATETIME()) AS SomeDate INTO dbo.constraint_test_1 FROM sys.messages AS m, sys.messages AS m2; GO SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS ID, DATEADD(MINUTE, m.message_id, SYSDATETIME()) AS SomeDate INTO dbo.constraint_test_2 FROM sys.messages AS m, sys.messages AS m2; GO CREATE FUNCTION dbo.DateCheck ( @d DATETIME2(7)) RETURNS BIT WITH RETURNS NULL ON NULL INPUT AS BEGIN DECLARE @Out BIT; SELECT @Out = CASE WHEN @d < DATEADD(DAY, 30, SYSDATETIME()) THEN 1 ELSE 0 END; RETURN @Out; END; GO ALTER TABLE dbo.constraint_test_1 ADD CONSTRAINT ck_cc_dt CHECK ( dbo.DateCheck(SomeDate) = 1 ); SELECT * FROM dbo.constraint_test_1 OPTION (QUERYTRACEON 8649, MAXDOP 0, RECOMPILE); -- Does not go parallel SELECT T1.ID, T2.SomeDate FROM dbo.constraint_test_1 T1 INNER JOIN dbo.constraint_test_2 T2 ON T1.ID = T2.ID OPTION (QUERYTRACEON 8649, MAXDOP 0, RECOMPILE); -- Goes parallel ```
- pattern minor 112d agoCan a server wide setting of 'Max Degree of Parallelism' = 1 cause Brent Ozar's sp_BlitzCache to flag execution plan as 'forced-serialization'?I am using Brent Ozar's sp_BlitzCache store procedure and I'm attempting to nail down why it is reporting: "Something in your plan is forcing a serial query. Further investigation is needed if this is not by design." Upon investigation I found that the server configuration has set: ``` 'Max Degree of Parallelism = 1' ``` (That is on my laundry list to configure correctly. It is a hold over from days of ignorance.) Is that setting the cause of Brent to report forced serialization?
- pattern minor 112d agoNo parallelism in Express Edition of SQL ServerI have SQL Server 2019 Express Edition (CU8) on Windows Server 2019 (1809 version) and all of my queries go serial with `NoParallelPlansInDesktopOrExpressEdition` in `NoparallelPlanReason` property. Is it that Express Edition never goes parallel? I cannot find anything about this in Microsoft documentation.
- pattern minor 112d agoSlow Parallel SQL Server query, almost instant in serialI have a SQL Server query as follows (obfuscated): ``` UPDATE [TABLE1] SET [COLUMN1] = CAST('N' AS CHAR(1)) FROM [TABLE1] WHERE (COLUMN1 = '2' AND COLUMN2 IN('VAL1', 'VAL2', 'VAL3')) OR (COLUMN1 <> 'N' AND ( SELECT COUNT(*) FROM TABLE2 wle JOIN TABLE3 wl ON wl.COLUMN3 = wle.COLUMN3 WHERE TABLE1.COLUMN4 = wle.COLUMN4 AND (wl.COLUMN5 = '1' OR wl.COLUMN6 = '1') AND wle.COLUMN7 = ( SELECT MIN(alias.COLUMN7) FROM TABLE2 AS alias WHERE TABLE1.COLUMN4 = alias.COLUMN4 ) ) > 0 ) ``` We have just upgraded our (test) server to SQL Server 2016 SP2 from SQL Server 2014 SP3. The performance of the query above appears to have fallen off a cliff as a result of this. When the server was on SQL Server 2014, the database compatibility level was 120. Now it is on SQL Server 2016, the compatibility level for the database is still 120, however I have tried the query in 110,120 and 130, all with the same result. When I run `sp_whoisactive`, I can see the wait_info is `(48847425ms)CXCONSUMER` suggesting that the query has been waiting for `CXCONSUMER` for the last `48847425ms` (814 minutes) The query has currently been running for 13:34:07.587. The wait info suggests that the query has been waiting on `CXCONSUMER` for the majority, if not all of its execution time. This to me suggests some issue with paralellism so I ran the query with the hint