HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Should I use the PostgreSQL bit string?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlthebitshouldusestring

Problem

I've been learning about the bit string data type lately, and I am quite curious about:

-
At the bottom of this doc page there is the sentence:


... plus 5 or 8 bytes overhead depending on the length of the string

-
How are bit strings handled in other languages such as PHP, Java, C#, C++, etc., through drivers like Npgsql, ODBC, etc.

For question #1, using smallint or bigint will be much more storage efficient, and would perhaps offer a performance gain since integers are supported everywhere. Most programming languages handle bit operations on integers with ease. If that is the case, what is the point of introducing the bit-string data type? Is it Only for cases that need a large amount of bit masks? Bit field indexing maybe? I'm more curious about how bit field indexing is done in PostgreSQL.

For #2, I'm confused, more than curious. For example, what if I store week day bit masks in a bit(7) field, one bit for a day, with lowest bit representing Monday. Then I query for the value in PHP and C++. What will I get? The documentation says I'll have a bit string, however a bit string is not something I can use directly - as with integers. Then in this case, should I give up on bit field?

Can anyone elaborate why and when I should use bit or bit varying?

Solution

If you only have a few variables I would consider keeping separate boolean columns.

  • Indexing is easy. In particular, indexes on expressions are easy.



  • Conditions for queries and partial indexing are easy to write and read and meaningful.



  • A boolean column occupies 1 byte. For only a few variables this occupies the least space.



  • Unlike the other options boolean columns allow NULL values for individual bits if you should need that. You can always define columns NOT NULL if you don't.



Optimizing storage

If you have more than a hand full variables but less than 33, an integer column may serve you best. (Or a bigint for up to 64 variables.)

  • Occupies 4 bytes on disk.



  • Very fast indexing for exact matches (= operator).



  • Handling individual values may be slower / less convenient than with bit string or boolean.



With even more variables, or if you want to manipulate the values a lot, or if you don't have huge tables and disk space / RAM is no issue, or if you are not sure what to pick, I would consider bit(n) or bit varying(n).

  • Occupies at least 5 bytes (or 8 for very long strings) plus 1 byte for each group of 8 bits (rounded up).



  • You can use bit string functions and operators directly.



Examples

For just 3 bits of information, individual boolean columns get by with 3 bytes, an integer needs 4 bytes and a bit string 6 bytes (5 + 1).

For 32 bits of information, an integer still needs 4 bytes, a bit string occupies 9 bytes for the same (5 + 4) and boolean columns occupy 32 bytes.
Further reading

  • Is there any difference between integer and bit(n) data types for a bitmask?



  • Calculating and saving space in PostgreSQL



  • Can I convert a bunch of boolean columns to a single bitmap in PostgreSQL?

Context

StackExchange Database Administrators Q#25073, answer score: 24

Revisions (0)

No revisions yet.