HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How can I implement a viewed system for my website's posts?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
websitecanimplementsystemviewedforhowposts

Problem

Here is my current structure:

// posts
+----+--------+----------+-----------+------------+
| id | title  | content  | author_id | date_time  |
+----+--------+----------+-----------+------------+
| 1  | title1 | content1 | 435       | 1468111492 |
| 2  | title2 | content2 | 657       | 1468113910 |
| 3  | title3 | content3 | 712       | 1468113791 |
+----+--------+----------+-----------+------------+

// viewed
+----+---------------+---------+------------+
| id | user_id_or_ip | post_id | date_tiem  |
+----+---------------+---------+------------+
| 1  | 324           | 1       | 1468111493 |
| 2  | 546           | 3       | 1468111661 |
| 3  | 135.54.12.1   | 1       | 1468111691 |
| 5  | 75            | 1       | 1468112342 |
| 6  | 56.26.32.1    | 2       | 1468113190 |
| 7  | 56.26.32.1    | 3       | 1468113194 |
| 5  | 75            | 2       | 1468112612 |
+----+---------------+---------+------------+


Here is my query:

SELECT p.*,
       (SELECT count(*) FROM viewed WHERE post_id = :id) AS total_viewed
 FROM posts p
WHERE id = :id


Currently I've faced with a huge date for viewed table. Well what's wrong with my table structure (or database design)? In other word how can I improve it?

A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of viewed's rows should be:

12000000 * 500 = 6,000,000,000 rows


Hah :-) .. Honestly I cannot even read that number (btw that number will grow up per sec). Well how stackoverflow handles the number of viewed for each post? Will it always calculate count(*) from viewed per post showing?

Solution

What makes you think that there is a viewed table like yours on stackoverflow's database? Raw logs are very costly to maintain on a relational database, and on high-traffic websites, caching has a very important role, and it is almost impossible to conserve full logs of everything (they are summarized). For Wikipedia, for example, we maintain a complete different infrastructure for analytics (fed with Apache Kafka from Varnish and Mediawiki), and then an API is offered that can be called from Mediawiki itself: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Main_Page

If you had to implement viewed X times functionality within the datbase, you could denormalize the tables by maintaining an extra column as part of the main posts table:

// posts
+----+--------+----------+-----------+------------+------------+
| id | title  | content  | author_id | date_time  | view_count |
+----+--------+----------+-----------+------------+------------+
| 1  | title1 | content1 | 435       | 1468111492 | 3          |
| 2  | title2 | content2 | 657       | 1468113910 | 2          |
| 3  | title3 | content3 | 712       | 1468113791 | 2          |
+----+--------+----------+-----------+------------+------------+


For several reasons, in some cases a separate table may be preferred (e.g. to avoid making the main table hot; if there is a very normalized table such as post_properties; to save space if most posts have 0 views)- it all depends on external factors. In that case, you would have:

// posts_views
+---------+------------+
| post_id | view_count |
+---------+------------+
| 1       | 3          |
| 2       | 2          |
| 3       | 2          |
+---------+------------+


And then:

SELECT posts.*, post_views.view_count
FROM posts
LEFT JOIN post_views
ON posts.id = post_views.post_id
WHERE id = :id


(The LEFT JOIN is in case you have posts without view counts, if not, just use a regular JOIN)

In both cases, after a "view" is added, additionally (or alternatively) add 1 to the number of views in a SERIALIZED way.

Try to avoid logs on relational databases (specially MySQL). You can bring the summaries back after being analysed.

Code Snippets

// posts
+----+--------+----------+-----------+------------+------------+
| id | title  | content  | author_id | date_time  | view_count |
+----+--------+----------+-----------+------------+------------+
| 1  | title1 | content1 | 435       | 1468111492 | 3          |
| 2  | title2 | content2 | 657       | 1468113910 | 2          |
| 3  | title3 | content3 | 712       | 1468113791 | 2          |
+----+--------+----------+-----------+------------+------------+
// posts_views
+---------+------------+
| post_id | view_count |
+---------+------------+
| 1       | 3          |
| 2       | 2          |
| 3       | 2          |
+---------+------------+
SELECT posts.*, post_views.view_count
FROM posts
LEFT JOIN post_views
ON posts.id = post_views.post_id
WHERE id = :id

Context

StackExchange Database Administrators Q#143413, answer score: 4

Revisions (0)

No revisions yet.