HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to make my query use the available index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
theavailablequerymakehowindexuse

Problem

I have the following time-series Table in SQL:

CREATE TABLE [dbo].[SensorData](
 [DateTimeUtc] [datetime2](2) NOT NULL,
 [SensorId] [int] NOT NULL,
 [Key] [varchar](20) NOT NULL,
 [Value] [decimal](19, 4) NULL,
CONSTRAINT [PK_SensorData] PRIMARY KEY CLUSTERED 
(
  [SensorId] ASC,
  [Key] ASC,
  [DateTimeUtc] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = 
ON, Data_Compression=PAGE) ON PS_Daily(DateTimeUtc))


Now based on this index, every single query needs to have the following parameters in the query where filter:
[SensorId],
[Key],
[DateTimeUtc]


When the query has all three the query returns really fast as expected.

Now I am stuck with a specific query, that does not have any specific value for [Key]. For example:

Check if there is ANY data for sensor: 1234 in the past 12 hours. In this case, we have a filter value for DateTime and SensorId. But this query returns really slowly.

Does adding Where [Key] is not null make SQL hit that index?

I know the easy answer is to add a new index on the table with only [SensorId],[DateTimeUtc]; however, this will add a substantial amount of space to the db based on its size and will also slow down inserts.

Is there any way I can get the above query to hit the clustered index?

The reason I used the clustered index key order I did, was after reading up on how you should order it, items of which values will be the most unique should be first.

I ran EXEC sp_spaceused [SensorData]

Query:

SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM [dbo].[SensorData] AS [Extent1]
    WHERE ([Extent1].[DateTimeUtc] > @p__linq__0) AND ([Extent1].[DateTimeUtc] <= @p__linq__1) AND ([Extent1].[SensorId] = @p__linq__2)
)) THEN cast(1 as bit) ELSE cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]


There are generally about 10 to 40 distinct keys per SensorId per time period. But we obviously store those 10 to 40 keys thousands of tim

Solution

Have you thought about using ADX (Azure Data Explorer) to store this table? Such queries are a breeze for ADX.
ADX is a time series optimized cluster on Azure. For me, it seems to be the natural solution for this type of problem.

It will manage the partitions, indexes and you can create a retention policy automatically.

If you are planning to consume the data with Power BI (or Grafana), you could also do Direct Query on ADX, so no need to import a huge amount of data when processing your data model. Also, the connector for ADX supports very well query folding, so if you are using Power BI (for example) the queries for your visuals will be translated to Kusto on the fly.

If your sensor data comes from Event Hubs or IoT Hub, you can ingest the data directly to ADX (if it is a supported file type), or even use Stream Analytics to create a custom deserializer to ingest diverse data.

Please, let me know if you need more context.

Context

StackExchange Database Administrators Q#307553, answer score: 5

Revisions (0)

No revisions yet.