snippetsqlMinor
How can I implement a viewed system for my website's posts?
Viewed 0 times
websitecanimplementsystemviewedforhowposts
Problem
Here is my current structure:
Here is my query:
Currently I've faced with a huge date for
A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of
Hah
// posts
+----+--------+----------+-----------+------------+
| id | title | content | author_id | date_time |
+----+--------+----------+-----------+------------+
| 1 | title1 | content1 | 435 | 1468111492 |
| 2 | title2 | content2 | 657 | 1468113910 |
| 3 | title3 | content3 | 712 | 1468113791 |
+----+--------+----------+-----------+------------+
// viewed
+----+---------------+---------+------------+
| id | user_id_or_ip | post_id | date_tiem |
+----+---------------+---------+------------+
| 1 | 324 | 1 | 1468111493 |
| 2 | 546 | 3 | 1468111661 |
| 3 | 135.54.12.1 | 1 | 1468111691 |
| 5 | 75 | 1 | 1468112342 |
| 6 | 56.26.32.1 | 2 | 1468113190 |
| 7 | 56.26.32.1 | 3 | 1468113194 |
| 5 | 75 | 2 | 1468112612 |
+----+---------------+---------+------------+Here is my query:
SELECT p.*,
(SELECT count(*) FROM viewed WHERE post_id = :id) AS total_viewed
FROM posts p
WHERE id = :idCurrently I've faced with a huge date for
viewed table. Well what's wrong with my table structure (or database design)? In other word how can I improve it?A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of
viewed's rows should be: 12000000 * 500 = 6,000,000,000 rowsHah
:-) .. Honestly I cannot even read that number (btw that number will grow up per sec). Well how stackoverflow handles the number of viewed for each post? Will it always calculate count(*) from viewed per post showing?Solution
What makes you think that there is a
If you had to implement
For several reasons, in some cases a separate table may be preferred (e.g. to avoid making the main table hot; if there is a very normalized table such as post_properties; to save space if most posts have 0 views)- it all depends on external factors. In that case, you would have:
And then:
(The
In both cases, after a "view" is added, additionally (or alternatively) add 1 to the number of views in a SERIALIZED way.
Try to avoid logs on relational databases (specially MySQL). You can bring the summaries back after being analysed.
viewed table like yours on stackoverflow's database? Raw logs are very costly to maintain on a relational database, and on high-traffic websites, caching has a very important role, and it is almost impossible to conserve full logs of everything (they are summarized). For Wikipedia, for example, we maintain a complete different infrastructure for analytics (fed with Apache Kafka from Varnish and Mediawiki), and then an API is offered that can be called from Mediawiki itself: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Main_PageIf you had to implement
viewed X times functionality within the datbase, you could denormalize the tables by maintaining an extra column as part of the main posts table:// posts
+----+--------+----------+-----------+------------+------------+
| id | title | content | author_id | date_time | view_count |
+----+--------+----------+-----------+------------+------------+
| 1 | title1 | content1 | 435 | 1468111492 | 3 |
| 2 | title2 | content2 | 657 | 1468113910 | 2 |
| 3 | title3 | content3 | 712 | 1468113791 | 2 |
+----+--------+----------+-----------+------------+------------+For several reasons, in some cases a separate table may be preferred (e.g. to avoid making the main table hot; if there is a very normalized table such as post_properties; to save space if most posts have 0 views)- it all depends on external factors. In that case, you would have:
// posts_views
+---------+------------+
| post_id | view_count |
+---------+------------+
| 1 | 3 |
| 2 | 2 |
| 3 | 2 |
+---------+------------+And then:
SELECT posts.*, post_views.view_count
FROM posts
LEFT JOIN post_views
ON posts.id = post_views.post_id
WHERE id = :id(The
LEFT JOIN is in case you have posts without view counts, if not, just use a regular JOIN)In both cases, after a "view" is added, additionally (or alternatively) add 1 to the number of views in a SERIALIZED way.
Try to avoid logs on relational databases (specially MySQL). You can bring the summaries back after being analysed.
Code Snippets
// posts
+----+--------+----------+-----------+------------+------------+
| id | title | content | author_id | date_time | view_count |
+----+--------+----------+-----------+------------+------------+
| 1 | title1 | content1 | 435 | 1468111492 | 3 |
| 2 | title2 | content2 | 657 | 1468113910 | 2 |
| 3 | title3 | content3 | 712 | 1468113791 | 2 |
+----+--------+----------+-----------+------------+------------+// posts_views
+---------+------------+
| post_id | view_count |
+---------+------------+
| 1 | 3 |
| 2 | 2 |
| 3 | 2 |
+---------+------------+SELECT posts.*, post_views.view_count
FROM posts
LEFT JOIN post_views
ON posts.id = post_views.post_id
WHERE id = :idContext
StackExchange Database Administrators Q#143413, answer score: 4
Revisions (0)
No revisions yet.