patternMinor

Redshift DISTSTYLE ALL table takes less disk space?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

spacediskallredshiftlessdiststyletakestable

Problem

(Self-migrating from StackOverflow.)

We have a fat table of exactly 591 columns in Redshift, distributed by a key.

We were unaware that fat tables chew up space - 2 blocks per column per slice, minimum, and we aren't close to filling up the first block on any slice.

I misunderstood a coworker to recommend using "DISTSTYLE ALL." That seemed maddeningly counter-intuitive, but I set about trying this, and got a surprise that the disk usage of the new table, with the same rows and columns, is actually 1/20th of the size of the original table.

I'm using this query to get the sizes, but Aginity agrees when reporting disk usage for these tables:

select trim(pgdb.datname) as Database,
       trim(a.name) as Table, 
       b.mbytes
   from stv_tbl_perm a
   join pg_database as pgdb on pgdb.oid = a.db_id
   join (select tbl, sum(decode(unsorted, 1, 1, 0)) as unsorted_mbytes,   
                count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
join ( select sum(capacity) as  total
  from stv_partitions where part_begin=0 ) as part on 1=1
where a.slice=0 and pgdb.datname = 'my_schema' 
AND a.name IN ('my_table',
               'my_table_distall')
order by Database, Name;

Yielding this output:

Schema       Table                mbytes
my_schema    my_table             222080
my_schema    my_table_distall      10740

The reason I mentioned the exact number of columns is that 222080=5945322, which means we use two blocks for every 532 slices for every column (including Redshift's three hidden columns.)

The new table, with DISTSTYLE ALL, is taking up an average of 3.6 block per column per node.

This is very confusing. Why is my new table smaller by a factor of 20?

Due to the size of our data, I actually didn't expect it to get bigger, because even with all the data, we would not take up more than one block per slice per column, but smaller seems insane.

Is it just as simple as "DISTSTYLE ALL duplicates the table to one slice on each node?" So

Solution

Well, after lots of experimentation, we have found that, yes, if you have a fat table with not enough rows to fill much of a block at the slice level using DISTSTYLE KEY, it is much more efficient, space-wise, to use DISTSTYLE ALL, even though this replicates more data.

For example, our fat tables have, say, 600000 rows and 1000 columns. Then, on a 5-node 32-slice setup, they were using up 1Mb per column per node per slice, or 160 Gb, and we were only filling less than 1% of that space.

DISTSTYLE ALL appears to not use all the slices.

So using DISTSTYLE ALL, we multiplied the data by 5, but this still takes up less than 1/20th of the space that the DISTSTYLE KEY tables did.

It slows down our queries a little (due to the need for IPC between the slices, and because the DISTSTYLE KEY means we have only the data on the node that matches our other data) but, since these tables are loaded once and never change after, we don't suffer from the other problems of DISTSTYLE ALL.

Context

StackExchange Database Administrators Q#128197, answer score: 4

Revisions (0)

No revisions yet.