patternsqlMinor
MySQL - How much physical space will be reduced by changing the column type?
Viewed 0 times
spacemuchthecolumntypemysqlwillhowphysicalchanging
Problem
Lets say I have the following table:
and lets say this table have lots of records: 100M
I have 2 possible values to
since I want to reduce the size of DB (size on hard drive), how would it effect if I would change the
which column type would you use? tinyint?
CREATE TABLE `my_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`relationship_status` varchar(48) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1and lets say this table have lots of records: 100M
I have 2 possible values to
relationship_status: 'following' or 'not_following'since I want to reduce the size of DB (size on hard drive), how would it effect if I would change the
relationship_status to Boolean instead of varchar(48) (where if can define 0 as not following and 1 as following)?which column type would you use? tinyint?
Solution
I think
That way, in the worst case, the Query Optimizer would perform an index scan on queries like:
The EXPLAIN plan should improve in that sorting and temp table usage would be minimized or eliminated.
Now concerning diskspace, let's compute the savings in space. First, What is the length of each string?
Each VARCHAR has an additional byte for tracking string length. The values are really 10 and 14. That gets ignored since you are shrinking the field to 1 byte. If you collapse
Since the the number of rows is about 100M, the Savings in Diskspace would have to be between 900M and 1.3G.
UPDATE 2012-08-23 13:00 EDT
To answer your comment
lets say the table has 2 more columns, 1 which is varchar(255) and another one which is DATETIME, lets say I can change both column types to int(11) unsigned, how much disk space that would save?
First of all, a
Now, the varchar(255) field. As I mentioned before, a varchar has an extra byte for length management. Sum up all the lengths of all the varchar fields and add the row count (the extra from field on every row) and subtract 4 bytes times the row count.
Putting the DATETIME and VARCHAR calculations together, you get this
TINYINT would be a good choice. I would suggest not indexing relationship_status at all. You should not index relationship_status by itself because the index cardinality is 2. The Query Optimizer would never use the index. You could index as follows:ALTER TABLE my_table ADD INDEX (relationship_status,user_id);That way, in the worst case, the Query Optimizer would perform an index scan on queries like:
SELECT user_id FROM my_table WHERE relationship_status = 1;
SELECT relationship_status,COUNT(1) statuscount
FROM my_table GROUP BY relationship_status;The EXPLAIN plan should improve in that sorting and temp table usage would be minimized or eliminated.
Now concerning diskspace, let's compute the savings in space. First, What is the length of each string?
mysql> select length('following') followlength,length('not_following') notfollowlength;
+--------------+-----------------+
| followlength | notfollowlength |
+--------------+-----------------+
| 9 | 13 |
+--------------+-----------------+
1 row in set (0.01 sec)Each VARCHAR has an additional byte for tracking string length. The values are really 10 and 14. That gets ignored since you are shrinking the field to 1 byte. If you collapse
relationship_status to TINYINT, the savings would be as followsSELECT
BB SavingsBytes,
BB/POWER(1024,1) SavingsKB,
BB/POWER(1024,2) SavingsMB,
BB/POWER(1024,3) SavingsGB
FROM
(
SELECT
fcount*length('following')+ncount*length('not_following') BB
FROM
(
SELECT COUNT(1) fcount FROM my_table
WHERE relationship_status='following'
) F,
(
SELECT COUNT(1) ncount FROM my_table
WHERE relationship_status='not_following'
) N
);Since the the number of rows is about 100M, the Savings in Diskspace would have to be between 900M and 1.3G.
UPDATE 2012-08-23 13:00 EDT
To answer your comment
lets say the table has 2 more columns, 1 which is varchar(255) and another one which is DATETIME, lets say I can change both column types to int(11) unsigned, how much disk space that would save?
First of all, a
DATETIME is 8 bytes and INT(11) is 4 bytes. Since 8 - 4 = 4, your savings would be 4 bytes per row. For the table, SELECT COUNT(1)*4 FROM my_table;Now, the varchar(255) field. As I mentioned before, a varchar has an extra byte for length management. Sum up all the lengths of all the varchar fields and add the row count (the extra from field on every row) and subtract 4 bytes times the row count.
Putting the DATETIME and VARCHAR calculations together, you get this
SELECT
BB SavingsBytes,
BB/POWER(1024,1) SavingsKB,
BB/POWER(1024,2) SavingsMB,
BB/POWER(1024,3) SavingsGB
FROM
(
SELECT
(VarCharBytes - (RowCount * 3)) BB
FROM
(
SELECT SUM(LENGTH(varcharfield)) VarCharBytes
FROM my_table
) A,
(SELECT COUNT(1) RowCount FROM my_table) T
);Code Snippets
ALTER TABLE my_table ADD INDEX (relationship_status,user_id);SELECT user_id FROM my_table WHERE relationship_status = 1;
SELECT relationship_status,COUNT(1) statuscount
FROM my_table GROUP BY relationship_status;mysql> select length('following') followlength,length('not_following') notfollowlength;
+--------------+-----------------+
| followlength | notfollowlength |
+--------------+-----------------+
| 9 | 13 |
+--------------+-----------------+
1 row in set (0.01 sec)SELECT
BB SavingsBytes,
BB/POWER(1024,1) SavingsKB,
BB/POWER(1024,2) SavingsMB,
BB/POWER(1024,3) SavingsGB
FROM
(
SELECT
fcount*length('following')+ncount*length('not_following') BB
FROM
(
SELECT COUNT(1) fcount FROM my_table
WHERE relationship_status='following'
) F,
(
SELECT COUNT(1) ncount FROM my_table
WHERE relationship_status='not_following'
) N
);SELECT
BB SavingsBytes,
BB/POWER(1024,1) SavingsKB,
BB/POWER(1024,2) SavingsMB,
BB/POWER(1024,3) SavingsGB
FROM
(
SELECT
(VarCharBytes - (RowCount * 3)) BB
FROM
(
SELECT SUM(LENGTH(varcharfield)) VarCharBytes
FROM my_table
) A,
(SELECT COUNT(1) RowCount FROM my_table) T
);Context
StackExchange Database Administrators Q#23012, answer score: 3
Revisions (0)
No revisions yet.