snippetsqlMinor
Parsing COPY's binary format to access a tsrange
Viewed 0 times
tsrangeformatbinaryparsingaccesscopy
Problem
How is tsrange stored in binary?
For example create table
It outputs:
One field is:
There:
How to convert it to integer timestamp?
For example create table
CREATE TABLE public.test (t tsrange);
INSERT INTO test VALUES ('[2010-01-01 14:30, 2010-01-01 15:30)');
INSERT INTO test VALUES ('[2011-01-01 14:31, 2015-11-01 15:30)');
INSERT INTO test VALUES ('[2017-01-01 14:31, 2018-11-01 15:30)');
COPY test TO '/tmp/pgcopy' WITH (FORMAT binary);
COPY test TO '/tmp/pgcopy.csv' WITH (FORMAT csv);It outputs:
cat /tmp/pgcopy.csv
"[""2010-01-01 14:30:00"",""2010-01-01 15:30:00"")"
"[""2011-01-01 14:31:00"",""2015-11-01 15:30:00"")"
"[""2017-01-01 14:31:00"",""2018-11-01 15:30:00"")"
hexdump -C /tmp/pgcopy
00000000 50 47 43 4f 50 59 0a ff 0d 0a 00 00 00 00 00 00 |PGCOPY..........|
00000010 00 00 00 00 01 00 00 00 19 02 00 00 00 08 00 01 |................|
00000020 1f 19 f9 a9 aa 00 00 00 00 08 00 01 1f 1a d0 3d |...............=|
00000030 4e 00 00 01 00 00 00 19 02 00 00 00 08 00 01 3b |N..............;|
00000040 c8 89 51 11 00 00 00 00 08 00 01 c6 7b 1a 3a 0e |..Q.........{.:.|
00000050 00 00 01 00 00 00 19 02 00 00 00 08 00 01 e8 08 |................|
00000060 0d 77 11 00 00 00 00 08 00 02 1c 9a dc 4d 0e 00 |.w...........M..|
00000070 ff ff |..|
00000072One field is:
00 00 00 19 02 00 00 00 08 00 01 e8 08 0d 77 11 00 00 00 00 08 00 02 1c 9a dc 4d 0e 00There:
00000019 - is 25 bytes length02 - brackets00000008 - subfield length0001e808 0d771100 and 00021c9a dc4d0e00 - stored timestamp with miroseconds.How to convert it to integer timestamp?
Solution
As a minor note,
From the docs on
To determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source, in particular the
Further the docs say binary format (currently) has
Then the tuple has
Then the fields have
So essentially we skip 25 bytes to get to the first column
The
So it's in the format specified by
Binary representation: The first byte is the flags, then the lower bound (if present), then the upper bound (if present). Each bound is represented by a 4-byte length header and the binary representation of that bound (as returned by a call to the send function for the subtype).
The
In your case, that subtype is timestamp and the send is
You can see a timestamp is stored as 8 bytes, and that's just sent with a simple
I'm not going to more-entertain this here, but maybe next go around.
Playing around
We try first
Now we swap that out..
Result: paren-comments added.
And that's your number for the first time zone. For
So to access the first internal timestamp, we skip an addition 5 bytes, on top of already 25 bytes of skipage for 30 bytes total.
This gives us the same result as above..
Simply change
COPY .. (WITH BINARY) doesn't have brackets. It's the flags (which represent amongst other things the brackets).COPY ... (WITH BINARY)From the docs on
COPYTo determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source, in particular the
send and recv functions for each column's data type (typically these functions are found in the src/backend/utils/adt/ directory of the source distribution).Further the docs say binary format (currently) has
- 11 bytes signature
- 4 bytes for flags
- 4 byte potential variable-width field, currently not in use so we skip the size (4 byte
\0\0\0\0) this technically not nice. If these four bytes had 15, we'd have to skip over not just the four, but an addition 15.
Then the tuple has
- 2 bytes for field count
Then the fields have
- 4 byte length qualifier followed by that many bytes of field data. (Which we already in the case of
timestamportsrange)
So essentially we skip 25 bytes to get to the first column
The
tsrangeSo it's in the format specified by
range_send You can see that explained a bit below in the comments above range_recvBinary representation: The first byte is the flags, then the lower bound (if present), then the upper bound (if present). Each bound is represented by a 4-byte length header and the binary representation of that bound (as returned by a call to the send function for the subtype).
The
timestamp subtypeIn your case, that subtype is timestamp and the send is
timestamp_send.You can see a timestamp is stored as 8 bytes, and that's just sent with a simple
pq_sendint64 (a 64 bit/8 byte int). You'll have to read how timestamp_recv works to see how you should handle a binary representation of a timestamp. Hint: it gets into a struct for in-memory representation in timestamp2tm/* timestamp2tm()
* Convert timestamp data type to POSIX time structure.
* Note that year is _not_ 1900-based, but is an explicit full value.
* Also, month is one-based, _not_ zero-based.
* Returns:
* 0 on success
* -1 on out of rangeI'm not going to more-entertain this here, but maybe next go around.
Playing around
We try first
DEADBEEFin it which an isolation to track the marker 8 byte marker.psql -d test -c 'COPY ( SELECT E'\''DEADBEEF'\'' ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=25 --endian big --read-bytes=8 -cNow we swap that out..
psql -d test -c 'COPY ( SELECT $2010-01-01 14:30$::timestamp without time zone ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=25 --endian big --read-bytes=8 --format=d8 -xResult: paren-comments added.
0000031 315671400000000 (timestamp in int8)
0001 1f19 f9a9 aa00 (hex representation)
0000041And that's your number for the first time zone. For
tsrange as per the above section we have- one byte for on the interval
- upper: 4 bytes (header) + 8 bytes (timestamp)
- lower: 4 bytes (header) + 8 bytes (timestamp)
So to access the first internal timestamp, we skip an addition 5 bytes, on top of already 25 bytes of skipage for 30 bytes total.
psql -d test -c 'COPY ( SELECT $[2010-01-01 14:30, 2010-01-01 15:30)$::tsrange ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=30 --endian big --read-bytes=8 --format=d8 -xThis gives us the same result as above..
0000036 315671400000000
0001 1f19 f9a9 aa00
0000046Simply change
--skip-bytes to 42 to skip over that 8 byte time stamp, and the next 4 bytes of header for the lower and you'll get another time stamp.Code Snippets
/* timestamp2tm()
* Convert timestamp data type to POSIX time structure.
* Note that year is _not_ 1900-based, but is an explicit full value.
* Also, month is one-based, _not_ zero-based.
* Returns:
* 0 on success
* -1 on out of rangepsql -d test -c 'COPY ( SELECT E'\''DEADBEEF'\'' ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=25 --endian big --read-bytes=8 -cpsql -d test -c 'COPY ( SELECT $$2010-01-01 14:30$$::timestamp without time zone ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=25 --endian big --read-bytes=8 --format=d8 -x0000031 315671400000000 (timestamp in int8)
0001 1f19 f9a9 aa00 (hex representation)
0000041psql -d test -c 'COPY ( SELECT $$[2010-01-01 14:30, 2010-01-01 15:30)$$::tsrange ) TO STDOUT WITH ( FORMAT BINARY );' |
od --skip-bytes=30 --endian big --read-bytes=8 --format=d8 -xContext
StackExchange Database Administrators Q#190780, answer score: 5
Revisions (0)
No revisions yet.