HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Postgres speed up index creation for large table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgreslargeforindexcreationspeedtable

Problem

I have a large Postgres table with 2+ billion entries (1.5TB) and mostly non-null, char var columns. To speed up inserts, I dropped the indexes before bulk uploading. However, it is now taking forever for the b-tree indexes to be created. For one of the runs that I cut short, it had spent >12 hours creating the indexes.

Sample table and indexes that I'm trying to make:

Column         |            Type             | Modifiers 
-----------------------+-----------------------------+-----------
 name                  | character varying           | not null
 id                    | character varying           | 
 lifecycle_id          | character varying           | 
 dt                    | character varying           | 
 address               | character varying           | 
 ...

Indexes: 
"name_idx" PRIMARY KEY, btree (name)

"id_idx" btree (rec_id)

"lifecycle_id_idx" btree (lifecycle_id)


The actual table has 18 columns. I've set the maintenance_work_mem to 15GB. This is running on Postgres 9.6.11 on RDS. The instance class is db.m4.4xlarge.

Since there are three indexes, it would be hard to sort the data before inserting. Would it be faster to just insert the data without dropping the indexes? Any other suggestions for speeding up the index creation?

Solution

Before I posted this question, I had two tests running but I wasn't sure if they would ever finish. I was hoping to get some insight into how to speed it up if they never finished. Thankfully, one of the tests did finish. I also got good results by trying the suggestions that jjanes posted.

Using Postgres 9, I ran two tests:

  • Inserting the 2 billion entries with indexes



  • Building indexes for a table that already had the 2 billion entries inserted



The first test has been running for a week and has only inserted 20% of the 2 billion entries. Inserting with index is definitely not the way to go.

The second test took 45 hours. Without knowing how long index creation was going to take, I prematurely cut off some of the previous tests and never got to see them complete. It takes a long time but for now, I'm happy that it doesn't run on forever.

Taking jjanes suggestions, I upgraded to Postgres 11, set the parallel workers to 8 and set the maintenance work memory to 7GB. I followed this guide to pick the parameters: https://www.cybertec-postgresql.com/en/postgresql-parallel-create-index-for-better-performance/.
This sped up the index creation and it only took 35 hours. Shaving off 10 hours was a nice surprise. Thanks!

Context

StackExchange Database Administrators Q#240527, answer score: 10

Revisions (0)

No revisions yet.