patternsqlMinor
Large accounting table, best way to partitioning by two dates
Viewed 0 times
partitioningdateswayaccountingtwolargetablebest
Problem
I need to create a log table to stores the connections to out network (last 3 years, after that the log will go into backup). The hardware/software used is proprietary, and for accounting it just call our custom script with some arguments like this:
Sometimes we do not receive the disconnect message. So we need to adapt to this.
So far I came up with this structure for the accounting table:
The query used for writing are (these needs to be real fast):
A better workaround, is to update the stop_datetime for rows without it, at every start, and use another column for storing that this is not a norma
- when a user connects (
our_script START user mac ip);
- when a user disconnects (
our_script STOP user mac ip in_bytes out_bytes more)
Sometimes we do not receive the disconnect message. So we need to adapt to this.
So far I came up with this structure for the accounting table:
CREATE TABLE `accounting` (
`user` varchar(50) NOT NULL DEFAULT '',
`mac` varchar(20) NOT NULL DEFAULT '',
`ip` varchar(15) NOT NULL DEFAULT '',
`ipv6` varchar(39) DEFAULT NULL,
`start_datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`stop_datetime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`in_bytes` bigint(32) unsigned DEFAULT '0',
`out_bytes` bigint(32) unsigned DEFAULT '0',
`more_columns` varchar(255) default NULL,
PRIMARY KEY (`user`,`mac`,`ip`,`start_datetime`,`stop_datetime`),
KEY `prim_ipv6` (`user`,`mac`,`ipv6`,`start_datetime`,`stop_datetime`),
KEY `user` (`user`) USING HASH,
KEY `mac` (`mac`) USING HASH,
KEY `ip` (`ip`) USING HASH,
KEY `ipv6` (`ipv6`) USING HASH,
KEY `start_datetime` (`start_datetime`),
KEY `stop_datetime` (`stop_datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1The query used for writing are (these needs to be real fast):
- when a user connects a simple insert
- when a user disconnects
update accounting set stop_datetime=now(), in_bytes=$in_bytes, out_bytes=$out_bytes, more_columns=$more where user="$user" and ip="$ip" and stop_datetime="0000-00-00 00:00:00" order by start_datetime DESC limit 1;, this is a workaround in the case we have more then one row for start, we just update the last start.
A better workaround, is to update the stop_datetime for rows without it, at every start, and use another column for storing that this is not a norma
Solution
Since stop_datetime will not be known at insert time the partition must be done by start_datetime.
I think partitioning by day (
Updating indexes is expensive and it happens at inserts, updates and deletes. Keep the indexes to a minimum. If the main queries will be those selects then I would keep the primary key like the following and trash all the other keys:
I think partitioning by day (
date_format(start_datetime, '%Y-%m-%d')) would be adequate since 1 billion rows divided by (3 years * 365 days) ≃ 900,000 rows per partitionUpdating indexes is expensive and it happens at inserts, updates and deletes. Keep the indexes to a minimum. If the main queries will be those selects then I would keep the primary key like the following and trash all the other keys:
PRIMARY KEY (`user`,`start_datetime`,`stop_datetime`,`mac`,`ip`)Code Snippets
PRIMARY KEY (`user`,`start_datetime`,`stop_datetime`,`mac`,`ip`)Context
StackExchange Database Administrators Q#15258, answer score: 2
Revisions (0)
No revisions yet.