HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MySQL - How much physical space will be reduced by changing the column type?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
spacemuchthecolumntypemysqlwillhowphysicalchanging

Problem

Lets say I have the following table:

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `relationship_status` varchar(48) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1


and lets say this table have lots of records: 100M

I have 2 possible values to relationship_status: 'following' or 'not_following'

since I want to reduce the size of DB (size on hard drive), how would it effect if I would change the relationship_status to Boolean instead of varchar(48) (where if can define 0 as not following and 1 as following)?

which column type would you use? tinyint?

Solution

I think TINYINT would be a good choice. I would suggest not indexing relationship_status at all. You should not index relationship_status by itself because the index cardinality is 2. The Query Optimizer would never use the index. You could index as follows:

ALTER TABLE my_table ADD INDEX (relationship_status,user_id);


That way, in the worst case, the Query Optimizer would perform an index scan on queries like:

SELECT user_id FROM my_table WHERE relationship_status = 1;
SELECT relationship_status,COUNT(1) statuscount
FROM my_table GROUP BY relationship_status;


The EXPLAIN plan should improve in that sorting and temp table usage would be minimized or eliminated.

Now concerning diskspace, let's compute the savings in space. First, What is the length of each string?

mysql> select length('following') followlength,length('not_following') notfollowlength;
+--------------+-----------------+
| followlength | notfollowlength |
+--------------+-----------------+
|            9 |              13 |
+--------------+-----------------+
1 row in set (0.01 sec)


Each VARCHAR has an additional byte for tracking string length. The values are really 10 and 14. That gets ignored since you are shrinking the field to 1 byte. If you collapse relationship_status to TINYINT, the savings would be as follows

SELECT
    BB SavingsBytes,
    BB/POWER(1024,1) SavingsKB,
    BB/POWER(1024,2) SavingsMB,
    BB/POWER(1024,3) SavingsGB
FROM
(
    SELECT
        fcount*length('following')+ncount*length('not_following') BB
    FROM
        (
            SELECT COUNT(1) fcount FROM my_table
            WHERE relationship_status='following'
        ) F,
        (
            SELECT COUNT(1) ncount FROM my_table
            WHERE relationship_status='not_following'
        ) N
);


Since the the number of rows is about 100M, the Savings in Diskspace would have to be between 900M and 1.3G.
UPDATE 2012-08-23 13:00 EDT

To answer your comment

lets say the table has 2 more columns, 1 which is varchar(255) and another one which is DATETIME, lets say I can change both column types to int(11) unsigned, how much disk space that would save?

First of all, a DATETIME is 8 bytes and INT(11) is 4 bytes. Since 8 - 4 = 4, your savings would be 4 bytes per row. For the table, SELECT COUNT(1)*4 FROM my_table;

Now, the varchar(255) field. As I mentioned before, a varchar has an extra byte for length management. Sum up all the lengths of all the varchar fields and add the row count (the extra from field on every row) and subtract 4 bytes times the row count.

Putting the DATETIME and VARCHAR calculations together, you get this

SELECT
    BB SavingsBytes,
    BB/POWER(1024,1) SavingsKB,
    BB/POWER(1024,2) SavingsMB,
    BB/POWER(1024,3) SavingsGB
FROM
(
    SELECT 
        (VarCharBytes - (RowCount * 3)) BB
    FROM
    (
        SELECT SUM(LENGTH(varcharfield)) VarCharBytes
        FROM my_table
    ) A,
    (SELECT COUNT(1) RowCount FROM my_table) T
);

Code Snippets

ALTER TABLE my_table ADD INDEX (relationship_status,user_id);
SELECT user_id FROM my_table WHERE relationship_status = 1;
SELECT relationship_status,COUNT(1) statuscount
FROM my_table GROUP BY relationship_status;
mysql> select length('following') followlength,length('not_following') notfollowlength;
+--------------+-----------------+
| followlength | notfollowlength |
+--------------+-----------------+
|            9 |              13 |
+--------------+-----------------+
1 row in set (0.01 sec)
SELECT
    BB SavingsBytes,
    BB/POWER(1024,1) SavingsKB,
    BB/POWER(1024,2) SavingsMB,
    BB/POWER(1024,3) SavingsGB
FROM
(
    SELECT
        fcount*length('following')+ncount*length('not_following') BB
    FROM
        (
            SELECT COUNT(1) fcount FROM my_table
            WHERE relationship_status='following'
        ) F,
        (
            SELECT COUNT(1) ncount FROM my_table
            WHERE relationship_status='not_following'
        ) N
);
SELECT
    BB SavingsBytes,
    BB/POWER(1024,1) SavingsKB,
    BB/POWER(1024,2) SavingsMB,
    BB/POWER(1024,3) SavingsGB
FROM
(
    SELECT 
        (VarCharBytes - (RowCount * 3)) BB
    FROM
    (
        SELECT SUM(LENGTH(varcharfield)) VarCharBytes
        FROM my_table
    ) A,
    (SELECT COUNT(1) RowCount FROM my_table) T
);

Context

StackExchange Database Administrators Q#23012, answer score: 3

Revisions (0)

No revisions yet.