patternMinor
Using CROSS APPLY OPENJSON causes Azure to hang
Viewed 0 times
crossapplyhangopenjsonazureusingcauses
Problem
I have a table with around 8 million rows with a schema of:
The json documents are of a structure of
I am attempting to extract the JSON array from within the document and insert it into a new table using the following query:
What I'm finding is that if I extract and query a sample of the data, say 1000 rows. Everything works as expected and the data is inserted into the destination table. However, when I query the main table the server appears to just hang and become unresponsive.
I suspect that it is attempting to run a cross apply across all the rows before inserting data. Is there any way to improve performance? Or allow the job to begin "streaming" results into the destination table rather than attempting to batch them up?
Finally, as you can see I am employing the use of "TOP 10" results. I am ho
CREATE TABLE [dbo].[Documents](
[Id] [uniqueidentifier] NOT NULL,
[RemoteId] [int] NOT NULL,
[Json] [nvarchar](max) NULL,
[WasSuccessful] [bit] NOT NULL,
[StatusCode] [int] NULL,
[Created] [datetime2](7) NULL,
CONSTRAINT [PK_Documents] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Documents] ADD CONSTRAINT [DF_Documents_Id] DEFAULT (newsequentialid()) FOR [Id]
GO
ALTER TABLE [dbo].[Documents] ADD CONSTRAINT [DF_Documents_Created] DEFAULT (getdate()) FOR [Created]
GOThe json documents are of a structure of
{
"Id": 1,
"Data": [
{
"Id": 99,
"Name": "Person 1"
},
{
"Id": 100,
"Name": "Person 2"
}
]
}I am attempting to extract the JSON array from within the document and insert it into a new table using the following query:
;WITH CTE (Json) AS
(
SELECT TOP 10
JSON_QUERY([Json], '$.Data')
FROM
Documents
WHERE
ISJSON([Json]) > 0
)
INSERT INTO [dbo].[ParsedDocuments] (Id, Name)
SELECT
JSON_VALUE([Value], '$.Id') AS [Id],
JSON_VALUE([Value], '$.Name') AS [Name],
FROM
CTE
CROSS APPLY
OPENJSON([Json]) as XWhat I'm finding is that if I extract and query a sample of the data, say 1000 rows. Everything works as expected and the data is inserted into the destination table. However, when I query the main table the server appears to just hang and become unresponsive.
I suspect that it is attempting to run a cross apply across all the rows before inserting data. Is there any way to improve performance? Or allow the job to begin "streaming" results into the destination table rather than attempting to batch them up?
Finally, as you can see I am employing the use of "TOP 10" results. I am ho
Solution
The fact that you cannot obtain even an estimated execution plan for the query against the large table suggests that compilation is waiting on statistics creation (or update) for the
When running against the smaller sample table, statistics creation completes quickly, so an execution plan is quick to create.
I can reproduce your situation on a low-powered (S0) Azure SQL Database. On a slightly less anaemic S3 instance, the compilation still takes around 30 seconds due to the large amount of I/O involved in reading the
You can confirm the cause by temporarily disabling automatic statistics creation:
The SQL server will then produce a plan quickly, albeit with warnings about missing statistics:
Be sure to re-enable this feature after testing because good statistics are vital to general execution plan quality. Or commit to creating necessary statistics manually.
If the issue is instead automatic refresh of stale statistics, you could choose to have these updated asynchronously:
Alternatively, scale your Azure SQL Database up to a performance level that suits your workload and budget.
nvarchar(max) column.When running against the smaller sample table, statistics creation completes quickly, so an execution plan is quick to create.
I can reproduce your situation on a low-powered (S0) Azure SQL Database. On a slightly less anaemic S3 instance, the compilation still takes around 30 seconds due to the large amount of I/O involved in reading the
nvarchar(max) column to generate statistics. The premium tier is needed to get good I/O performance - even a cold P1 can get these statistics created in 3 seconds or so.You can confirm the cause by temporarily disabling automatic statistics creation:
ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS OFF;The SQL server will then produce a plan quickly, albeit with warnings about missing statistics:
Be sure to re-enable this feature after testing because good statistics are vital to general execution plan quality. Or commit to creating necessary statistics manually.
If the issue is instead automatic refresh of stale statistics, you could choose to have these updated asynchronously:
ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS_ASYNC ON;Alternatively, scale your Azure SQL Database up to a performance level that suits your workload and budget.
Code Snippets
ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS OFF;ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS_ASYNC ON;Context
StackExchange Database Administrators Q#211622, answer score: 7
Revisions (0)
No revisions yet.