patternsqlMinor
Which database system can break me free?
Viewed 0 times
canfreesystemdatabasewhichbreak
Problem
I am currently running a MySQL Database for logging and analyzing those logs.
My current table schema looks like this:
I run a service where I deliver a lot of downloads that go over 1 extra hosts between the destination and origin host.
They are represented by the integer interpretation of the 32 bit IP address.
My system currently handles about 500 inserts/second during peak hours.
I run a master-slave system. The master has an apache webserver with a PHP file that gets called from remote hosts and inserts a line into the log table.
Then the changes get replicated to the slaves where queries happen.
My queries are primarily aggregations over the mb_transferred field over a range in the time field filtered by client_id.
The maser server runs an apache webserver with a simple php file that does the insert and is called by other servers.
My server is now almost at the limit. I already upgraded to big hardware.
I thought about using a GUID as primary key and using master master replication, that will for sure relieve something, but I think its short sighted, because it does not decrease the insert amount per server.
I am expecting higher trough-puts in the future and I am also worried about database size.
Also in future I plan to have a second table which defines "weights" for certain services.
Something like:
``
My current table schema looks like this:
CREATE TABLE `mylogs` (
`transfer_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`client_id` INT UNSIGNED NOT NULL ,
`client_ip` INT UNSIGNED NOT NULL ,
`server_1_ip` INT UNSIGNED NOT NULL ,
`server_2_ip` INT UNSIGNED NOT NULL ,
`service` ENUM( 'service1', 'service2', 'service3', '...', 'service500' ) NOT NULL ,
`mb_transferred` FLOAT UNSIGNED NOT NULL ,
`time` TIMESTAMP NOT NULL
) ENGINE = MYISAM ;I run a service where I deliver a lot of downloads that go over 1 extra hosts between the destination and origin host.
They are represented by the integer interpretation of the 32 bit IP address.
My system currently handles about 500 inserts/second during peak hours.
I run a master-slave system. The master has an apache webserver with a PHP file that gets called from remote hosts and inserts a line into the log table.
Then the changes get replicated to the slaves where queries happen.
My queries are primarily aggregations over the mb_transferred field over a range in the time field filtered by client_id.
SELECT SUM(mb_transferred) FROM mylogs WHERE client_id = 123 AND time > '2012-01-01 00:00:00'The maser server runs an apache webserver with a simple php file that does the insert and is called by other servers.
My server is now almost at the limit. I already upgraded to big hardware.
I thought about using a GUID as primary key and using master master replication, that will for sure relieve something, but I think its short sighted, because it does not decrease the insert amount per server.
I am expecting higher trough-puts in the future and I am also worried about database size.
Also in future I plan to have a second table which defines "weights" for certain services.
Something like:
``
CREATE TABLE mylogs.service_weight (
plan TINYINT NOT NULL ,
service_name` ENUM( 'service_1', 'service_2' ) NOT NSolution
There's a lot to consider. I'm going to narrow this down to a single problem and offer one solution, although there's many approaches.
Problem: You need AD-HOC queries on lots of rows.
First, setup a server to handle writes and a replicated slave (running some flavor of MySQL 5.5.x) to handle reads. My personal preference is Percona Server. Your mileage may vary.
Once you have replication caught up, ALTER your slave table to: engine=InnoDB, ROW_FORMAT=COMPRESSED, partition by date, and index the data to optimize your query WHERE clauses. Each of these have an optimization for your case and there is further tuning required.
Finally, setup a schedule to archive/shard the data when it falls out of scope. Say 2-3 years. Depends on your SLA for providing back reporting. You can always union it if you have to.
MySQL 5.6 will have Memcached built in which will allow you to store pre-processed results. Map/Reduce is great, as long as it's not the first run of the reduce function. NoSQL is great as long as the data is already processed. Neither is really your problem.
Additions 2012-06-13:
Since it appears my recommendations have stirred another response, I'll submit my reasons:
you can use InnoDB to avoid implicit read locks at the table level.
Problem: You need AD-HOC queries on lots of rows.
First, setup a server to handle writes and a replicated slave (running some flavor of MySQL 5.5.x) to handle reads. My personal preference is Percona Server. Your mileage may vary.
Once you have replication caught up, ALTER your slave table to: engine=InnoDB, ROW_FORMAT=COMPRESSED, partition by date, and index the data to optimize your query WHERE clauses. Each of these have an optimization for your case and there is further tuning required.
Finally, setup a schedule to archive/shard the data when it falls out of scope. Say 2-3 years. Depends on your SLA for providing back reporting. You can always union it if you have to.
MySQL 5.6 will have Memcached built in which will allow you to store pre-processed results. Map/Reduce is great, as long as it's not the first run of the reduce function. NoSQL is great as long as the data is already processed. Neither is really your problem.
Additions 2012-06-13:
Since it appears my recommendations have stirred another response, I'll submit my reasons:
- Why InnoDB in this case? With the constant insertions of many rows,
you can use InnoDB to avoid implicit read locks at the table level.
- Why COMPRESSED + Barracuda? "The compression means less data is transferred between disk and memory, and takes up less space in memory. The benefits are amplified for tables with secondary indexes, because index data is compressed also."
- Why partition by date? Reduce scan breadth.
- Why index properly? Decrease search times by reducing cpu bottleneck through elimination of full table scans.
Context
StackExchange Database Administrators Q#19091, answer score: 5
Revisions (0)
No revisions yet.