HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Use case for hstore

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
caseusehstorefor

Problem

This is the first time (I think) I have the chance to use the hstore data type, but I would like to hear from more experienced folks if what I have in mind is actually a good idea.

Now, there's this web app in which we import payroll data in the form of XML files, which look roughly like this simplified version:


    
        Smith
        John
        9999.99
        
    
    
        
     


Each employee carries extremely detailed information, and when I say "several other hundreds of tags" it's because they will usually range between 800 and upwards of 1400. And this is for each month. Apart from a set of core tags, each employee can have a different combination of them, hence the very fluctuating but very realistic figures I gave above.

Now, part of this data is imported with a long, slow and very complex process, and I've observed that, with increasing frequency, we are finding ourselves saying "gosh, if only we had always imported that particular tag!".

While the import process is highly configurable, making it run for just a small set of data is slow, impractical and downright painful. Adding whatever customization is needed to import the hypothetical new tag from now on is much easier, but for building up historic data as if we always imported it, it's messy and error prone.

As an added bonus, the task always falls on those two guys, and I'm one of them, and I would love to make our life a tad simpler.

This is why I'm thinking of writing a quick tool that, during the night, will crack those XML files open and, for each month and each employee, create a record with an hstore column containing all of the employee's tags for that month.

Being an absolute beginner at hstore, this looks to me like a very good use case for it, especially if we consider that:

-
since the tags can differ for each employee, this is essentially schema-less data.

-
storing the tags as an EAV with one row per tag would mean approximately 240k rows per month f

Solution

At the scope at which you're working. I think JSONB is ideal. It handles deeply nested structures and structures with array-keys. It's also standardized and in the spec for sql2016.

In addition, as I answered here, there is an extension that should help you with space consumption called ZSON,


ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents (not only keys, but also values, array elements, etc).


In some cases ZSON can save half of your disk space and give you about 10% more TPS. Memory is saved as well. See docs/benchmark.md. Everything depends on your data and workload, though. Don't believe any benchmarks, re-check everything on your data, configuration, hardware, workload and PostgreSQL version.

You may want to look into ZSON.

Context

StackExchange Database Administrators Q#122950, answer score: 2

Revisions (0)

No revisions yet.