patternsqlMinor
Index Generation in Postgres
Viewed 0 times
indexgenerationpostgres
Problem
We have a database with a single table made up of around 700 million entries.
We update the database by adding new entries on a server then transfer the server to the production server using pg_dump:
pg_dump -c database > /tmp/database_gen
(by the way we use postgres 8.4)
we export the database to the production server by using psql.
The pg_dump generated file has instruction on how to create and fill the table.
The problem is with the index creation. Postgres fills the table then spend days creating the index. That was OK, until postgres couldn't create the index anymore due to the fact that it has no more diskspace because it uses a lot of disk space to for the temporary files for sorting and creating the index. Normaly the database takes around 200GB but during the index creation the used disk space increases to 600GB then after the creation it goes back to 200 GB.
My question is : can we create the index in several steps, like create the index for half the table then add the rest of the table and update the index?
Has anyone had the same issues ?
Thanks
We update the database by adding new entries on a server then transfer the server to the production server using pg_dump:
pg_dump -c database > /tmp/database_gen
(by the way we use postgres 8.4)
we export the database to the production server by using psql.
The pg_dump generated file has instruction on how to create and fill the table.
The problem is with the index creation. Postgres fills the table then spend days creating the index. That was OK, until postgres couldn't create the index anymore due to the fact that it has no more diskspace because it uses a lot of disk space to for the temporary files for sorting and creating the index. Normaly the database takes around 200GB but during the index creation the used disk space increases to 600GB then after the creation it goes back to 200 GB.
My question is : can we create the index in several steps, like create the index for half the table then add the rest of the table and update the index?
Has anyone had the same issues ?
Thanks
Solution
If you create the index before loading the table, the time taken to load the data will be significantly increased.
pre load:
post load:
If you are ok with that, with
Of course "Buy more storage" may well be the best answer here...
pre load:
create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 mspost load:
create table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 msIf you are ok with that, with
pg_restore you can:- Load just the schema using
--schema-only
- Create the index with
--index
- Load the data using
--data-only
Of course "Buy more storage" may well be the best answer here...
Code Snippets
create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 mscreate table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 msContext
StackExchange Database Administrators Q#4200, answer score: 9
Revisions (0)
No revisions yet.