HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Estimating distribution of row access in mysql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
estimatingmysqldistributionrowaccess

Problem

I am trying to get an idea of the size of the "hot data" part of a rather large table, and I was wondering if this could be done directly in mysql. I know that with the percona version of mysql, I can have access to figures like "number of rows accessed per table", but I would actually need those data on a per row basis (e.g. row with id 1 was read 200 times, row with id 2 was read 300 times, where id is the auto increment column)

Solution

If you want that kind of granular statistics per id, it sounds like you may want to try something a little convoluted to collect such info. Let's explore this scenario:

If you have a table with the following layout in the wp database:

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
  `post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content` longtext NOT NULL,
  `post_title` text NOT NULL,
  `post_excerpt` text NOT NULL,
  `post_status` varchar(20) NOT NULL DEFAULT 'publish',
  `comment_status` varchar(20) NOT NULL DEFAULT 'open',
  `ping_status` varchar(20) NOT NULL DEFAULT 'open',
  `post_password` varchar(20) NOT NULL DEFAULT '',
  `post_name` varchar(200) NOT NULL DEFAULT '',
  `to_ping` text NOT NULL,
  `pinged` text NOT NULL,
  `post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content_filtered` text NOT NULL,
  `post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `guid` varchar(255) NOT NULL DEFAULT '',
  `menu_order` int(11) NOT NULL DEFAULT '0',
  `post_type` varchar(20) NOT NULL DEFAULT 'post',
  `post_mime_type` varchar(100) NOT NULL DEFAULT '',
  `comment_count` bigint(20) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `post_name` (`post_name`),
  KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
  KEY `post_parent` (`post_parent`),
  KEY `post_author` (`post_author`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


and you want to track access to this table, you must force the recording of the table's id using a series of operations combining the use of Triggers, BLACKHOLE tables, MySQL Replication, and "Socially Responsible" Coding.

TRIGGERS

Triggers follow the course of any INSERT, UPDATE, and DELETE. In this scenario, you will need three(3) types of Triggers: 1) After INSERT, 2) After UPDATE, 3) After DELETE.

Each time wp_posts has an INSERT, UPDATE or DELETE, record the ID in another table. What kind of table ???

BLACKHOLE TABLES

Let's create a table for recording IDs

CREATE TABLE wp_posts_idtracker
(
  `ID` bigint(20) unsigned NOT NULL,
  `DTSTAMP` datetime NOT NULL
) ENGINE=BLACKHOLE;


Now here are triggers to record the ID accesses:

CREATE TRIGGER wp_posts_insertafter AFTER INSERT ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (NEW.ID,NOW());

CREATE TRIGGER wp_posts_updateafter AFTER UPDATE ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (OLD.ID,NOW());

CREATE TRIGGER wp_posts_deleteafter AFTER DELETE ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (OLD.ID,NOW());


WAIT A MINUTE !!! The table wp_posts_idtracker is a BLACKHOLE table. It stores nothing. So where do the statistics get written ??? Make sure the MySQL Instance with wp_posts has binary logging enabled. Big deal, the stats get written to the binary logs. How can you read those stats ???

MySQL REPLICATION

Using commodity hardware, employee the use of a MySQL Replication Slave. Setup the slave to only accept one table: wp_posts_idtracker. Place this line in /etc/my.cnf in the slave:

[mysqld]
replicate-do-table=wp.wp_posts_idtracker


Since this BLACKHOLE table would be replicated over to the slave, how will it store data ??? Convert it to MyISAM on the slave. Also, index it by ID and DTSTAMP and by DTSTAMP:

use wp
ALTER TABLE wp_posts_idtracker ENGINE=MyISAM;
ALTER TABLE wp_posts_idtracker ADD INDEX (ID,DTSTAMP);
ALTER TABLE wp_posts_idtracker ADD INDEX (DTSTAMP);


Now the Master will simply record every access to a row in wp_posts into the binary logs. MySQL Replicaton takes responsibility to record that over to the Slave. As an alternative to using a separate server for recording this information, you may want to create a second instance of MySQL on port 3307 and have it act as slave of MySQL running on port 3306. You must make sure that the datadir of MySQL on port 3307 is on a separate data volume from that of MySQL on port 3306. Another variation would be to alter the storage engine for wp_posts_idtracker on the slave with the MEMORY storage engine to reduce disk I/O (Caution: If you go with the MEMORY storage engine fro the table wp_posts_idtracker, remember to make the indexes BTREE instead of the default HASH index because running range queries against a HASH-indexed table has horrible performance, even for a MEMORY table). Still another variation would be to place the binary logs in a RAM Disk for even faster replication or placing the relay logs in a RAM Disk as well, along with further reducing disk I/O.

Thus far, IDs involved in INSERTs, UPDATEs and DELETEs are stored safely in a Replication Slave. Are we forgetting any other types of access to wp_posts ??? Oh yes, SELECT statements. How do we record SELECTs ??? There are no triggers for

Code Snippets

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
  `post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content` longtext NOT NULL,
  `post_title` text NOT NULL,
  `post_excerpt` text NOT NULL,
  `post_status` varchar(20) NOT NULL DEFAULT 'publish',
  `comment_status` varchar(20) NOT NULL DEFAULT 'open',
  `ping_status` varchar(20) NOT NULL DEFAULT 'open',
  `post_password` varchar(20) NOT NULL DEFAULT '',
  `post_name` varchar(200) NOT NULL DEFAULT '',
  `to_ping` text NOT NULL,
  `pinged` text NOT NULL,
  `post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content_filtered` text NOT NULL,
  `post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `guid` varchar(255) NOT NULL DEFAULT '',
  `menu_order` int(11) NOT NULL DEFAULT '0',
  `post_type` varchar(20) NOT NULL DEFAULT 'post',
  `post_mime_type` varchar(100) NOT NULL DEFAULT '',
  `comment_count` bigint(20) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `post_name` (`post_name`),
  KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
  KEY `post_parent` (`post_parent`),
  KEY `post_author` (`post_author`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE wp_posts_idtracker
(
  `ID` bigint(20) unsigned NOT NULL,
  `DTSTAMP` datetime NOT NULL
) ENGINE=BLACKHOLE;
CREATE TRIGGER wp_posts_insertafter AFTER INSERT ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (NEW.ID,NOW());

CREATE TRIGGER wp_posts_updateafter AFTER UPDATE ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (OLD.ID,NOW());

CREATE TRIGGER wp_posts_deleteafter AFTER DELETE ON wp_posts FOR EACH ROW
INSERT INTO wp_posts_idtracker VALUES (OLD.ID,NOW());
[mysqld]
replicate-do-table=wp.wp_posts_idtracker
use wp
ALTER TABLE wp_posts_idtracker ENGINE=MyISAM;
ALTER TABLE wp_posts_idtracker ADD INDEX (ID,DTSTAMP);
ALTER TABLE wp_posts_idtracker ADD INDEX (DTSTAMP);

Context

StackExchange Database Administrators Q#3255, answer score: 4

Revisions (0)

No revisions yet.