HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

PostgreSQL 9.4 analysis, performance of normal column, indexed column and jsonb key

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcolumnanalysisnormalindexedperformanceandjsonbkey

Problem

If I have a table containing:

CREATE TABLE test(
  id SERIAL PRIMARY KEY,
  name VARCHAR(200),
  age INT,
  data JSONB
);


and data column populated with {"name": xxx, "age": yyy}, sometimes {"name": xxx, "age": yyy, "somethingElse": zzz}

which one would be faster? querying test.data->>name or test.name ? and test.data->>age or test.age?

Could anyone do an analysis for me for those case and case below?

INSERT INTO test(name,age,data) 
  SELECT 
      z
    , gs.y
    , ('{"name":"' || z || '","age":' || gs.y || '}')::JSONB
      FROM (SELECT md5(z::text) z, random() AS y FROM generate_series(1,1000000) AS gs(z)) AS gs;

EXPLAIN SELECT * FROM test ORDER BY random() LIMIT 30;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Limit  (cost=30412.00..30412.07 rows=30 width=458)
   ->  Sort  (cost=30412.00..31139.32 rows=290928 width=458)
         Sort Key: (random())
         ->  Seq Scan on test  (cost=0.00..21819.60 rows=290928 width=458)


integer test

EXPLAIN SELECT * FROM test WHERE age   Sort  (cost=43861.14..44694.47 rows=333333 width=116)
         Sort Key: (random())
         ->  Seq Scan on test  (cost=0.00..34016.33 rows=333333 width=116)
               Filter: ((age)::numeric >'age')::FLOAT   Sort  (cost=48861.14..49694.47 rows=333333 width=116)
         Sort Key: (random())
         ->  Seq Scan on test  (cost=0.00..39016.33 rows=333333 width=116)
               Filter: (((data ->> 'age'::text))::double precision >'age')::NUMERIC   Sort  (cost=48861.14..49694.47 rows=333333 width=116)
         Sort Key: (random())
         ->  Seq Scan on test  (cost=0.00..39016.33 rows=333333 width=116)
               Filter: (((data ->> 'age'::text))::numeric < 0.5)


string test

```
EXPLAIN SELECT * FROM test WHERE name LIKE '%aaa%' ORDER BY random() LIMIT 30;
QUERY PLAN

Solution


  1. Table definition



Better:

CREATE TABLE test(
  test_id serial PRIMARY KEY,
  age INT,
  name text,
  data JSONB
);


Due to alignment requirements of the types integer and varchar / text it's better to put the two int columns first and together. More:

  • Configuring PostgreSQL for read performance



Also, "age" is a dubious column to begin with. Normally it's better to store a "birthday" or "creation_date" as absolute information, while "age" is rotting quickly.
  1. Test data



INSERT INTO test(name,age,data) 
SELECT name, age, row_to_json(sub)::jsonb
FROM  (
   SELECT md5(g::text) AS name, (random() * 100)::int AS age
   FROM   generate_series(1,1000000) g
   ) sub;


-
row_to_json() is much more elegant than building the string by hand.

-
If you cast random() to integer, you get 0 or 1. Largely useless data for any kind of test. In particular, indexes are useless with just two distinct common values. Multiplying by 100 to get something halfway useful.

  1. Why are indexes not used?



Your btree index:

CREATE INDEX test_name ON test(name);


is useless for LIKE queries that are not left-anchored (name LIKE '%aaa%'). Details here:

  • PostgreSQL LIKE query performance variations



  • How is LIKE implemented?



  • Pattern matching with LIKE, SIMILAR TO or regular expressions



  • Difference between LIKE and ~ in Postgres



Your btree index:

CREATE INDEX test_age on test(age);


was useless, since you only had 0 and 1 in your column age, and an idex is typically only used if a small portion (~ below 5 %, it depends) of the table is expected to qualify. Try again with more realistic data.
  1. Test method



EXPLAIN only shows query plans. To get actual performance use EXPLAIN ANALYZE or, for most accurate total times: EXPLAIN ( ANALYZE, TIMING OFF).
General advice

Columns with scalar values are generally much smaller and faster in every respect than document types like jsonb. jsonb (pg 9.4) has received major improvements over json, but this is still true. Queries with predicates on values inside a JSON object are slower than predicates on scalar columns. With optimized (functional) indexes, jsonb can get close, but it will never be as fast.

Code Snippets

CREATE TABLE test(
  test_id serial PRIMARY KEY,
  age INT,
  name text,
  data JSONB
);
INSERT INTO test(name,age,data) 
SELECT name, age, row_to_json(sub)::jsonb
FROM  (
   SELECT md5(g::text) AS name, (random() * 100)::int AS age
   FROM   generate_series(1,1000000) g
   ) sub;
CREATE INDEX test_name ON test(name);
CREATE INDEX test_age on test(age);

Context

StackExchange Database Administrators Q#83842, answer score: 3

Revisions (0)

No revisions yet.