patternsqlMinor
Updating a PostgreSQL Table With Different Random Dates
Viewed 0 times
postgresqlrandomwithupdatingdatesdifferenttable
Problem
I have two tables in PostgreSQL, Employee and Leave like following:
I want to update the Leave table and set the Leave_Date column to a random date maybe 01/01/2000 for each employee but if an employee has more than one Null entry in Leave table, I want to update his null entries with two different dates which means one employee should not have two same Leave_Date value in Leave table and after update my leave table should look like following:
As shown above, initially john had two null entries in Leave table and the result shows that those entries are update with two different dates. Is there any way to this?
CREATE TEMP TABLE employee_table AS SELECT
id::int,
name::text
FROM ( VALUES
(1, 'John' ),
(2, 'David')
) AS t(id, name);
CREATE TEMP TABLE leave_table AS SELECT
id::int,
leave_date::date,
emp_id::int
FROM ( VALUES
(1, '01/10/1993' ,1),
(2, Null ,1),
(3, Null ,1),
(4, '02/12/1990' ,2),
(5, Null ,2),
(6, Null ,2)
) AS t(ID,Leave_Date,Emp_ID);I want to update the Leave table and set the Leave_Date column to a random date maybe 01/01/2000 for each employee but if an employee has more than one Null entry in Leave table, I want to update his null entries with two different dates which means one employee should not have two same Leave_Date value in Leave table and after update my leave table should look like following:
ID Leave_Date Emp_ID
1 01/10/1993 1
2 01/01/2000 1
3 01/01/2001 1As shown above, initially john had two null entries in Leave table and the result shows that those entries are update with two different dates. Is there any way to this?
Solution
First you never gave a test case for when there is only one date and it is null, so we create that.
Then we run a command to check whether or not an
Then we have it
Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [
Breaking it apart, the CTE does this
That generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,
The only other tricky part is the inner-join select,
There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.
Then we join this to the table and perform the update..
This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.
INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );Then we run a command to check whether or not an
emp_id has more than one entry in leave_table. The results are in that derived table. We update accordingly. Here we generate a date that represents the year start between 1900-2020. Just update this for what you mean by "random date" you didn't define it in your question.UPDATE leave_table
SET leave_date = CASE
WHEN t.count = 1 OR t.count IS NULL
THEN '01/01/2000'::date
ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
SELECT emp_id, count(*) FROM leave_table
WHERE leave_date IS NULL
GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;Then we have it
TABLE leave_table;
id | leave_date | emp_id
----+------------+--------
1 | 1993-01-10 | 1
4 | 1990-02-12 | 2
2 | 1964-01-01 | 1
3 | 1929-01-01 | 1
5 | 1933-01-01 | 2
6 | 1902-01-01 | 2
10 | 2000-01-01 | 3Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [
emp_id] null entries with two different datesWITH t AS (
SELECT
id,
emp_id,
leave_date,
count(*) OVER (PARTITION BY emp_id) AS max_nulls,
row_number() OVER (PARTITION BY emp_id)
FROM leave_table
WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
THEN '01/01/2000'::date
ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
SELECT distinct_emps.emp_id,
gs.ds,
count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
AS gs(ds)
) AS date_series_emp
USING ( emp_id, row_number )
WHERE t.id = leave_table.id;Breaking it apart, the CTE does this
SELECT
id,
emp_id,
leave_date,
count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
row_number() OVER (PARTITION BY emp_id)
FROM leave_tableThat generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,
id │ emp_id │ leave_date │ max_nulls │ row_number
────┼────────┼────────────┼───────────┼────────────
2 │ 1 │ │ 2 │ 1
3 │ 1 │ │ 2 │ 2
5 │ 2 │ │ 2 │ 1
6 │ 2 │ │ 2 │ 2
10 │ 3 │ │ 1 │ 1The only other tricky part is the inner-join select,
SELECT distinct_emps.emp_id,
gs.ds,
count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
AS gs(ds)There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.
Then we join this to the table and perform the update..
This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.
Code Snippets
INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );UPDATE leave_table
SET leave_date = CASE
WHEN t.count = 1 OR t.count IS NULL
THEN '01/01/2000'::date
ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
SELECT emp_id, count(*) FROM leave_table
WHERE leave_date IS NULL
GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;TABLE leave_table;
id | leave_date | emp_id
----+------------+--------
1 | 1993-01-10 | 1
4 | 1990-02-12 | 2
2 | 1964-01-01 | 1
3 | 1929-01-01 | 1
5 | 1933-01-01 | 2
6 | 1902-01-01 | 2
10 | 2000-01-01 | 3WITH t AS (
SELECT
id,
emp_id,
leave_date,
count(*) OVER (PARTITION BY emp_id) AS max_nulls,
row_number() OVER (PARTITION BY emp_id)
FROM leave_table
WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
THEN '01/01/2000'::date
ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
SELECT distinct_emps.emp_id,
gs.ds,
count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
AS gs(ds)
) AS date_series_emp
USING ( emp_id, row_number )
WHERE t.id = leave_table.id;SELECT
id,
emp_id,
leave_date,
count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
row_number() OVER (PARTITION BY emp_id)
FROM leave_tableContext
StackExchange Database Administrators Q#159425, answer score: 6
Revisions (0)
No revisions yet.