HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Separate month and year columns, or date with day always set to 1?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
yearcolumnswithalwaysseparatedatemonthanddayset

Problem

I'm building a database with Postgres where there's going to be a lot of grouping of things by month and year, but never by the date.

  • I could create integer month and year columns and use those.



  • Or I could have a month_year column and always set the day to 1.



The former seems a bit simpler and clearer if someone is looking at the data, but the latter is nice in that it uses a proper type.

Solution

Personally if it's a date, or can be a date, I suggest always storing it as one. It's just easier to work with as a rule of thumb.

  • A date is 4 bytes.



  • A smallint is 2 bytes (we need two)



  • ... 2 bytes: one smallint for year



  • ... 2 bytes: one smallint for month



You can have one date which will support day if you ever need it, or one smallint for year and month which will never support the extra precision.

Sample data

Let's look at an example now.. Let's create 1 million dates for our sample. This is approximately 5,000 rows for 200 years between 1901, and 2100. Every year should have something for every month.

CREATE TABLE foo
AS
  SELECT
    x,
    make_date(year,month,1)::date AS date,
    year::smallint,
    month::smallint
  FROM generate_series(1,1e6) AS gs(x)
  CROSS JOIN LATERAL CAST(trunc(random()*12+1+x-x) AS int) AS month
  CROSS JOIN LATERAL CAST(trunc(random()*200+1901+x-x) AS int) AS year
;
CREATE INDEX ON foo(date);
CREATE INDEX ON foo (year,month);
VACUUM FULL ANALYZE foo;


Testing

Simple WHERE

Now we can test these theories of not using date.. I ran each of these a few times so as to warm things up.

EXPLAIN ANALYZE SELECT * FROM foo WHERE date = '2014-1-1'
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=11.56..1265.16 rows=405 width=14) (actual time=0.164..0.751 rows=454 loops=1)
   Recheck Cond: (date = '2014-04-01'::date)
   Heap Blocks: exact=439
   ->  Bitmap Index Scan on foo_date_idx  (cost=0.00..11.46 rows=405 width=0) (actual time=0.090..0.090 rows=454 loops=1)
         Index Cond: (date = '2014-04-01'::date)
 Planning time: 0.090 ms
 Execution time: 0.795 ms


Now, let's try the other method with them separate

EXPLAIN ANALYZE SELECT * FROM foo WHERE year = 2014 AND month = 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=12.75..1312.06 rows=422 width=14) (actual time=0.139..0.707 rows=379 loops=1)
   Recheck Cond: ((year = 2014) AND (month = 1))
   Heap Blocks: exact=362
   ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.64 rows=422 width=0) (actual time=0.079..0.079 rows=379 loops=1)
         Index Cond: ((year = 2014) AND (month = 1))
 Planning time: 0.086 ms
 Execution time: 0.749 ms
(7 rows)


In fairness, they're not all 0.749.. some are a bit more or less, but it doesn't matter. They're all relatively the same. It simply isn't needed.

Within one month

Now, let's have fun with it.. Let's say you want to find all intervals within 1 month of Jan 2014 (the same month we used above).

EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE date
    BETWEEN
      ('2014-1-1'::date - '1 month'::interval)::date 
      AND ('2014-1-1'::date + '1 month'::interval)::date;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=21.27..2310.97 rows=863 width=14) (actual time=0.384..1.644 rows=1226 loops=1)
   Recheck Cond: ((date >= '2013-12-01'::date) AND (date   Bitmap Index Scan on foo_date_idx  (cost=0.00..21.06 rows=863 width=0) (actual time=0.208..0.208 rows=1226 loops=1)
         Index Cond: ((date >= '2013-12-01'::date) AND (date <= '2014-02-01'::date))
 Planning time: 0.104 ms
 Execution time: 1.727 ms
(7 rows)


Compare that to the combined method

```
EXPLAIN ANALYZE
SELECT *
FROM foo
WHERE year = 2013 AND month = 12
OR ( year = 2014 AND ( month = 1 OR month = 2) );

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=38.79..2999.66 rows=1203 width=14) (actual time=0.664..2.291 rows=1226 loops=1)
Recheck Cond: (((year = 2013) AND (month = 12)) OR (((year = 2014) AND (month = 1)) OR ((year = 2014) AND (month = 2))))
Heap Blocks: exact=1083
-> BitmapOr (cost=38.79..38.79 rows=1237 width=0) (actual time=0.479..0.479 rows=0 loops=1)
-> Bitmap Index Scan on foo_year_month_idx (cost=0.00..12.64 rows=421 width=0) (actual time=0.112..0.112 rows=402 loops=1)
Index Cond: ((year = 2013) AND (month = 12))
-> BitmapOr (cost=25.60..25.60 rows=816 width=0) (actual time=0.218..0.218 rows=0 loops=1)
-> Bitmap Index Scan on foo_yea

Code Snippets

CREATE TABLE foo
AS
  SELECT
    x,
    make_date(year,month,1)::date AS date,
    year::smallint,
    month::smallint
  FROM generate_series(1,1e6) AS gs(x)
  CROSS JOIN LATERAL CAST(trunc(random()*12+1+x-x) AS int) AS month
  CROSS JOIN LATERAL CAST(trunc(random()*200+1901+x-x) AS int) AS year
;
CREATE INDEX ON foo(date);
CREATE INDEX ON foo (year,month);
VACUUM FULL ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE date = '2014-1-1'
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=11.56..1265.16 rows=405 width=14) (actual time=0.164..0.751 rows=454 loops=1)
   Recheck Cond: (date = '2014-04-01'::date)
   Heap Blocks: exact=439
   ->  Bitmap Index Scan on foo_date_idx  (cost=0.00..11.46 rows=405 width=0) (actual time=0.090..0.090 rows=454 loops=1)
         Index Cond: (date = '2014-04-01'::date)
 Planning time: 0.090 ms
 Execution time: 0.795 ms
EXPLAIN ANALYZE SELECT * FROM foo WHERE year = 2014 AND month = 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=12.75..1312.06 rows=422 width=14) (actual time=0.139..0.707 rows=379 loops=1)
   Recheck Cond: ((year = 2014) AND (month = 1))
   Heap Blocks: exact=362
   ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.64 rows=422 width=0) (actual time=0.079..0.079 rows=379 loops=1)
         Index Cond: ((year = 2014) AND (month = 1))
 Planning time: 0.086 ms
 Execution time: 0.749 ms
(7 rows)
EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE date
    BETWEEN
      ('2014-1-1'::date - '1 month'::interval)::date 
      AND ('2014-1-1'::date + '1 month'::interval)::date;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=21.27..2310.97 rows=863 width=14) (actual time=0.384..1.644 rows=1226 loops=1)
   Recheck Cond: ((date >= '2013-12-01'::date) AND (date <= '2014-02-01'::date))
   Heap Blocks: exact=1083
   ->  Bitmap Index Scan on foo_date_idx  (cost=0.00..21.06 rows=863 width=0) (actual time=0.208..0.208 rows=1226 loops=1)
         Index Cond: ((date >= '2013-12-01'::date) AND (date <= '2014-02-01'::date))
 Planning time: 0.104 ms
 Execution time: 1.727 ms
(7 rows)
EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE year = 2013 AND month = 12
    OR ( year = 2014 AND ( month = 1 OR month = 2) );

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=38.79..2999.66 rows=1203 width=14) (actual time=0.664..2.291 rows=1226 loops=1)
   Recheck Cond: (((year = 2013) AND (month = 12)) OR (((year = 2014) AND (month = 1)) OR ((year = 2014) AND (month = 2))))
   Heap Blocks: exact=1083
   ->  BitmapOr  (cost=38.79..38.79 rows=1237 width=0) (actual time=0.479..0.479 rows=0 loops=1)
         ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.64 rows=421 width=0) (actual time=0.112..0.112 rows=402 loops=1)
               Index Cond: ((year = 2013) AND (month = 12))
         ->  BitmapOr  (cost=25.60..25.60 rows=816 width=0) (actual time=0.218..0.218 rows=0 loops=1)
               ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.62 rows=420 width=0) (actual time=0.108..0.108 rows=423 loops=1)
                     Index Cond: ((year = 2014) AND (month = 1))
               ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.38 rows=395 width=0) (actual time=0.108..0.108 rows=401 loops=1)
                     Index Cond: ((year = 2014) AND (month = 2))
 Planning time: 0.256 ms
 Execution time: 2.421 ms
(13 rows)

Context

StackExchange Database Administrators Q#166117, answer score: 21

Revisions (0)

No revisions yet.