HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Index Generation in Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexgenerationpostgres

Problem

We have a database with a single table made up of around 700 million entries.
We update the database by adding new entries on a server then transfer the server to the production server using pg_dump:

pg_dump -c database > /tmp/database_gen

(by the way we use postgres 8.4)
we export the database to the production server by using psql.
The pg_dump generated file has instruction on how to create and fill the table.

The problem is with the index creation. Postgres fills the table then spend days creating the index. That was OK, until postgres couldn't create the index anymore due to the fact that it has no more diskspace because it uses a lot of disk space to for the temporary files for sorting and creating the index. Normaly the database takes around 200GB but during the index creation the used disk space increases to 600GB then after the creation it goes back to 200 GB.

My question is : can we create the index in several steps, like create the index for half the table then add the rest of the table and update the index?

Has anyone had the same issues ?

Thanks

Solution

If you create the index before loading the table, the time taken to load the data will be significantly increased.

pre load:

create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 ms


post load:

create table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 ms


If you are ok with that, with pg_restore you can:

  • Load just the schema using --schema-only



  • Create the index with --index



  • Load the data using --data-only



Of course "Buy more storage" may well be the best answer here...

Code Snippets

create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 ms
create table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 ms

Context

StackExchange Database Administrators Q#4200, answer score: 9

Revisions (0)

No revisions yet.