HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Updating a PostgreSQL Table With Different Random Dates

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlrandomwithupdatingdatesdifferenttable

Problem

I have two tables in PostgreSQL, Employee and Leave like following:

CREATE TEMP TABLE employee_table AS SELECT
  id::int,
  name::text
FROM ( VALUES
  (1, 'John' ),
  (2, 'David')  
) AS t(id, name);

CREATE TEMP TABLE leave_table AS SELECT
  id::int,
  leave_date::date,
  emp_id::int
FROM ( VALUES
  (1, '01/10/1993' ,1),
  (2, Null         ,1),
  (3, Null         ,1),
  (4, '02/12/1990' ,2),
  (5, Null         ,2),
  (6, Null         ,2) 
) AS t(ID,Leave_Date,Emp_ID);


I want to update the Leave table and set the Leave_Date column to a random date maybe 01/01/2000 for each employee but if an employee has more than one Null entry in Leave table, I want to update his null entries with two different dates which means one employee should not have two same Leave_Date value in Leave table and after update my leave table should look like following:

ID        Leave_Date    Emp_ID  
1         01/10/1993      1  
2         01/01/2000      1  
3         01/01/2001      1


As shown above, initially john had two null entries in Leave table and the result shows that those entries are update with two different dates. Is there any way to this?

Solution

First you never gave a test case for when there is only one date and it is null, so we create that.

INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );


Then we run a command to check whether or not an emp_id has more than one entry in leave_table. The results are in that derived table. We update accordingly. Here we generate a date that represents the year start between 1900-2020. Just update this for what you mean by "random date" you didn't define it in your question.

UPDATE leave_table
SET leave_date = CASE
  WHEN t.count = 1 OR t.count IS NULL
  THEN '01/01/2000'::date
  ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
  SELECT emp_id, count(*) FROM leave_table
  WHERE leave_date IS NULL
  GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;


Then we have it

TABLE leave_table;
 id | leave_date | emp_id 
----+------------+--------
  1 | 1993-01-10 |      1
  4 | 1990-02-12 |      2
  2 | 1964-01-01 |      1
  3 | 1929-01-01 |      1
  5 | 1933-01-01 |      2
  6 | 1902-01-01 |      2
 10 | 2000-01-01 |      3


Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [emp_id] null entries with two different dates

WITH t AS (
  SELECT
    id,
    emp_id,
    leave_date,
    count(*) OVER (PARTITION BY emp_id) AS max_nulls,
    row_number() OVER (PARTITION BY emp_id)
  FROM leave_table
  WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
  WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
  THEN '01/01/2000'::date
  ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
  SELECT distinct_emps.emp_id,
    gs.ds,
    count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
  FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
  CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
    AS gs(ds)
) AS date_series_emp
  USING ( emp_id, row_number )
WHERE t.id = leave_table.id;


Breaking it apart, the CTE does this

SELECT
    id,
    emp_id,
    leave_date,
    count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
    row_number() OVER (PARTITION BY emp_id)
  FROM leave_table


That generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,

id │ emp_id │ leave_date │ max_nulls │ row_number 
────┼────────┼────────────┼───────────┼────────────
  2 │      1 │            │         2 │          1
  3 │      1 │            │         2 │          2
  5 │      2 │            │         2 │          1
  6 │      2 │            │         2 │          2
 10 │      3 │            │         1 │          1


The only other tricky part is the inner-join select,

SELECT distinct_emps.emp_id,
    gs.ds,
    count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
  FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
  CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
    AS gs(ds)


There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.

Then we join this to the table and perform the update..

This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.

Code Snippets

INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );
UPDATE leave_table
SET leave_date = CASE
  WHEN t.count = 1 OR t.count IS NULL
  THEN '01/01/2000'::date
  ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
  SELECT emp_id, count(*) FROM leave_table
  WHERE leave_date IS NULL
  GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;
TABLE leave_table;
 id | leave_date | emp_id 
----+------------+--------
  1 | 1993-01-10 |      1
  4 | 1990-02-12 |      2
  2 | 1964-01-01 |      1
  3 | 1929-01-01 |      1
  5 | 1933-01-01 |      2
  6 | 1902-01-01 |      2
 10 | 2000-01-01 |      3
WITH t AS (
  SELECT
    id,
    emp_id,
    leave_date,
    count(*) OVER (PARTITION BY emp_id) AS max_nulls,
    row_number() OVER (PARTITION BY emp_id)
  FROM leave_table
  WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
  WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
  THEN '01/01/2000'::date
  ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
  SELECT distinct_emps.emp_id,
    gs.ds,
    count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
  FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
  CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
    AS gs(ds)
) AS date_series_emp
  USING ( emp_id, row_number )
WHERE t.id = leave_table.id;
SELECT
    id,
    emp_id,
    leave_date,
    count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
    row_number() OVER (PARTITION BY emp_id)
  FROM leave_table

Context

StackExchange Database Administrators Q#159425, answer score: 6

Revisions (0)

No revisions yet.