HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Need suggestions for best way to archive MySQL (InnoDB) tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesneedarchiveinnodbwaymysqlforsuggestionsbest

Problem

Problem:
I have two fairly large tables. A 'wall' table for messages between users with 9 million rows at 2 GB and a 'chapter' table at 2 million rows and 18 GB. I want to keep the number of active rows for the 'wall' table small while I want to diminish the size of the chapter table. I made the mistake of not compressing text data to begin with and I'd like to start compressing data in the archives.

For the 'wall' table, I'm thinking that everything older than a certain wall id will be transferred and compressed to a 'wall_archive'. Anyone wanting to view older posts will just be given a "view archive" link where older post queries use the archive table. Then I run a cron job to do this every now and then and the last wall id archived will be stored somewhere for reference. Am I going the right direction here?

I'm not so sure how to keep the 'chapter' table manageable. Perhaps it's less archiving and more needing to partition the table (or both). But what's the best way to do this? I was thinking of splitting 'story' IDs into odds and evens and dividing the chapter into two tables but I'll run into the same problem again down the road. Or I can archive stories modified before a certain date. Or before a certain story ID. Any suggestions for a scalable solution?

Lastly, how should I go about compressing text data? Should I use PHP's gzcompress function at level 9 to store text data into a BLOB column then gzuncompress the data on retrieval? Or should I use MySql's COMPRESS/UNCOMPRESS functions? I'm leaning towards using PHP in case I separate the web server(s) from the DB server where I can have PHP do the compression processes instead of the more valuable DB server but I'd like to know what best practices are.

Considerations:
I'll still need to be able to access old 'chapter' data easily. 'wall' data can be put into slower storage if needed but it isn't necessary at the moment.

Environment:
6 Core AMD Opteron, 16 GB RAM, 256 GB SSD for MySql,
Percona Server 5.

Solution

Performance

I would develop analytics that tell me how frequently older data is accessed and by what date intervals. If very few people look at anything older than 1 month, your options are numerous. If it's all over the board, that will be a different strategy.

Let's pretend that content older than 1 month is only accessed occasionally, say < 2% of the requests. Two options available to you include: partitioning and horizontal sharding, both by date. With partitioning, you simply partition by RANGE using the 'created' field in your table. With horizontal sharding you do the same thing, except using a cron job or event to create a new periodic version of your table, then migrate rows to the periodic table. With horizontal sharding, you could use UNION statements to span multiple tables.

Partitioning will have little or no effect on your data model. Sharding will require data model intelligence.

Context

StackExchange Database Administrators Q#21291, answer score: 4

Revisions (0)

No revisions yet.