gotchasqlMinor
Why does PostgreSQL perform a seq scan when comparing a numeric value with a bigint column?
Viewed 0 times
postgresqlwhyscancomparingnumericwithseqcolumnvalueperform
Problem
Given a table
Integer comparison:
Index Scan using vp_ts_idx on vp (cost=0.57..257.87 rows=2381 width=57) (actual time=0.014..38.669 rows=80323 loops=1)
Index Cond: ("timestamp" > 1470752584)
Total runtime: 48.322 ms
Numeric comparison:
Seq Scan on vp (cost=0.00..7378353.16 rows=95403915 width=57) (actual time=62625.420..103122.701 rows=98240 loops=1)
Filter: (("timestamp")::numeric > 1470752584.1)
Rows Removed by Filter: 285945491
Total runtime: 103134.333 ms
Context: A query for recent vehicle positions compared
Why wouldn't the query planner doesn't do this automatically when the column type is
vp with column timestamp type bigint, and a btree index on timestamp, why would Postgres ignore the index and run a seq scan on comparison of timestamp with a floating point value, when an index scan would produce identical results?Integer comparison:
SELECT * FROM vp WHERE vp.timestamp > 1470752584 takes 48 ms:Index Scan using vp_ts_idx on vp (cost=0.57..257.87 rows=2381 width=57) (actual time=0.014..38.669 rows=80323 loops=1)
Index Cond: ("timestamp" > 1470752584)
Total runtime: 48.322 ms
Numeric comparison:
SELECT * FROM vp WHERE vp.timestamp > 1470752584.1 takes 103 seconds because it ignores vp_ts_idx and performs a seq scan of the entire table:Seq Scan on vp (cost=0.00..7378353.16 rows=95403915 width=57) (actual time=62625.420..103122.701 rows=98240 loops=1)
Filter: (("timestamp")::numeric > 1470752584.1)
Rows Removed by Filter: 285945491
Total runtime: 103134.333 ms
Context: A query for recent vehicle positions compared
timestamp with EXTRACT(EPOCH FROM NOW()) - %s, where %s was the desired number of seconds, without explicitly casting to a bigint. The workaround is to use CAST(EXTRACT(EPOCH FROM NOW()) - %s AS bigint).Why wouldn't the query planner doesn't do this automatically when the column type is
bigint? Is this a bug, or am I not considering some edge case where this behavior would be useful?Solution
The key is that you don't compare the same types. When comparing a
So, what you get in your comparison, is a
It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from
According to the linked documentation page,
Indicates what contexts the cast can be invoked in.
So this means that the
Notes:
bigint to a numeric, the easier way is to 'expand' the bigint with the decimal places being 0 (like 1 -> 1.0), while the other way around it would mean rounding/truncation. (In this specific case it is easy to see that the both lead to the same result, but what if the values are negative?)So, what you get in your comparison, is a
numeric to numeric comparison, which is not something a bigint index could serve.It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from
pg_cast:SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'bigint'::regtype
AND casttarget::regtype = 'numeric'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
bigint │ numeric │ i
SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'numeric'::regtype
AND casttarget::regtype = 'bigint'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
numeric │ bigint │ aAccording to the linked documentation page,
castcontextIndicates what contexts the cast can be invoked in.
e means only as an explicit cast (using CAST or :: syntax). a means implicitly in assignment to a target column, as well as explicitly. i means implicitly in expressions, as well as the other cases.So this means that the
numeric -> bigint direction is happening 'by itself' (i. e. you not calling one of the casting operators explicitly) only if you assign the former to the latter. In expressions like your comparison, this is not the case, so the parser will only consider the other way (marked with an i above). That means, you get a numeric to numeric comparison, unless you force it otherwise.Notes:
psqlis a command line client to PostgreSQL, it does not do any sort of these things itself (I've edited the title accordingly)
- using keywords as column names (like
timestamp) is never a good idea - you'll possibly get unexpected parsing errors here and there, unless you are careful enough to double-quote them everywhere
- using the Unix epoch as a timestamp might be cumbersome. There is a rich functionality for 'real' timestamps in PostgreSQL - in most cases it is much easier to use it.
Code Snippets
SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'bigint'::regtype
AND casttarget::regtype = 'numeric'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
bigint │ numeric │ i
SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'numeric'::regtype
AND casttarget::regtype = 'bigint'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
numeric │ bigint │ aContext
StackExchange Database Administrators Q#146294, answer score: 7
Revisions (0)
No revisions yet.