patternsqlMinor
Most efficient way of caching queries / processed data
Viewed 0 times
cachingefficientwayprocessedqueriesdatamost
Problem
I have doubts on database design.
I'm trying to know if there is a way to increase query performance, and the easiest way for me to explain my doubt is with a small example.
I have a news website with news posts, and those have comments. The Comments table is like this:
This news website has in its mainpage two rankings, making them heavily read, but with a low number of writes:
The "normal" way to do this is to create the appropriate queries and leave them on that index page:
Which could be heavy to process, since it has to process ALL the news comments rows every time the page is accessed. Well, I'm not sure about this because I don't know how efficient caching is.
That's why I'm thinking in a alternative, that would be to create two tables, that would have the processed data for those two rankings.
Would this be a good alternative?
If so, is there an automated way to update these tables based on updates on the Comments table besides triggers? Or are triggers the only good way to achieve this?
Or is caching enough to keep high performance? (SQL Server 2014)
Or is there another model that could be more efficient?
EDIT:
After accepting the answer, I'm posting how I created and queried the materialized view.
It's based on the steps provided by the documentation:
https://msdn.microsoft.com/en-us/library/ms191432.aspx
Creation
```
--Create view with schemabinding.
IF OBJECT_ID ('dbo.commentsposts', 'view') IS NOT NULL
DROP VIEW dbo.commentsposts;
GO
CREATE VIEW dbo.commentsposts
WITH SCHEMABINDING
I'm trying to know if there is a way to increase query performance, and the easiest way for me to explain my doubt is with a small example.
I have a news website with news posts, and those have comments. The Comments table is like this:
Table Comments:
comment_id | text | post_id | user_id | ...This news website has in its mainpage two rankings, making them heavily read, but with a low number of writes:
- One with the most commented news posts (The 10 most "popular" news posts)
- Another with the users with most comments (The 10 most "active" users)
The "normal" way to do this is to create the appropriate queries and leave them on that index page:
Top10 of Posts:
SELECT TOP 10 post_id, COUNT(comment_id) AS num_comments
FROM Comments
GROUP BY post_id
ORDER BY num_comments DESCWhich could be heavy to process, since it has to process ALL the news comments rows every time the page is accessed. Well, I'm not sure about this because I don't know how efficient caching is.
That's why I'm thinking in a alternative, that would be to create two tables, that would have the processed data for those two rankings.
Table Comments_PostsTop:
post_id | num_comments
Table Comments_UsersTop:
user_id | num_commentsWould this be a good alternative?
If so, is there an automated way to update these tables based on updates on the Comments table besides triggers? Or are triggers the only good way to achieve this?
Or is caching enough to keep high performance? (SQL Server 2014)
Or is there another model that could be more efficient?
EDIT:
After accepting the answer, I'm posting how I created and queried the materialized view.
It's based on the steps provided by the documentation:
https://msdn.microsoft.com/en-us/library/ms191432.aspx
Creation
```
--Create view with schemabinding.
IF OBJECT_ID ('dbo.commentsposts', 'view') IS NOT NULL
DROP VIEW dbo.commentsposts;
GO
CREATE VIEW dbo.commentsposts
WITH SCHEMABINDING
Solution
Have you looked into using SQL Server's support for Indexed Views? (Generically this is called a materialized view.) If your code really needs an assist in getting the answers this is a option, though not without its own difficulties.
An Indexed View is a duplicate set of data taken from the main tables and stored within the Indexed View. There are rules and limitations, but it basically creates a specialized "table" that should be index-tuned to suite your purpose for using it.
This MSDN topic points you to the details that you need:
https://msdn.microsoft.com/en-us/library/ms191432.aspx
The definition of an indexed view must be deterministic, needs certain settings to be adjusted, and there are actions such as OUTER JOIN and ORDER BY that are forbidden.
EDIT: You should also use the NOEXPAND hint, as explained by Aaron Bertrand, to "ensure the uniqueness guarantee provided by the view indexes is used by the optimizer." See: http://sqlperformance.com/2015/12/sql-performance/noexpand-hints
An Indexed View is a duplicate set of data taken from the main tables and stored within the Indexed View. There are rules and limitations, but it basically creates a specialized "table" that should be index-tuned to suite your purpose for using it.
This MSDN topic points you to the details that you need:
https://msdn.microsoft.com/en-us/library/ms191432.aspx
The definition of an indexed view must be deterministic, needs certain settings to be adjusted, and there are actions such as OUTER JOIN and ORDER BY that are forbidden.
EDIT: You should also use the NOEXPAND hint, as explained by Aaron Bertrand, to "ensure the uniqueness guarantee provided by the view indexes is used by the optimizer." See: http://sqlperformance.com/2015/12/sql-performance/noexpand-hints
Context
StackExchange Database Administrators Q#129590, answer score: 2
Revisions (0)
No revisions yet.