patternsqlMinor
Best way to map different JSON keys to same target columns
Viewed 0 times
mapsametargetcolumnswaydifferentkeysjsonbest
Problem
I have CSV import data that comes from multiple vendors with 50+ different columns, and each vendor has slightly different naming.
Instead of trying to spend hours mapping all of the column names to a master table with tons of columns, instead I want to store the original CSV row as a JSON column, such as
I also want to store a unique identifier that is shared by all of the vendors, like
So my columns in my `contacts table would be:
Then I was hoping I could create a view or materialized view which I could start with a small subset of columns that I know I will need (and will spend time mapping). Then, if I ever need the new columns later I could modify the view with new mappings.
I was hoping I could do some kind of case switch, like this:
Unfortunately this is a syntax error... is there any more sane way to do this? I couldn't come up with good ideas. I truly don't want to create some enormous mapping as so many of the column names are messy and probably will never be used. I want to store the data in case I need it, but for my 'results' query I want to create a nice clean mapping that I only add to when I need it.
I'm open to better ways of doing this overall as well. I was trying to avoid the nuclear route of the massive massive table. But I guess I would if I absolutely advised as the best way.
Instead of trying to spend hours mapping all of the column names to a master table with tons of columns, instead I want to store the original CSV row as a JSON column, such as
csv_data.I also want to store a unique identifier that is shared by all of the vendors, like
email.So my columns in my `contacts table would be:
vendor_id
email
csv_dataThen I was hoping I could create a view or materialized view which I could start with a small subset of columns that I know I will need (and will spend time mapping). Then, if I ever need the new columns later I could modify the view with new mappings.
I was hoping I could do some kind of case switch, like this:
SELECT
CASE
WHEN vendor_id = 100 THEN
csv_data->>'fullname' AS full_name,
csv_data->>'age' AS age,
... etc ...
WHEN vendor_id = 101 THEN
csv_data->>'FULL NAME' AS full_name,
csv_data->>'AGE' AS age,
... etc ...
WHEN vendor_id = 102 THEN
csv_data->>'full name' AS full_name,
csv_data->>'cust_age' AS age,
... etc ...
END
FROM contactsUnfortunately this is a syntax error... is there any more sane way to do this? I couldn't come up with good ideas. I truly don't want to create some enormous mapping as so many of the column names are messy and probably will never be used. I want to store the data in case I need it, but for my 'results' query I want to create a nice clean mapping that I only add to when I need it.
I'm open to better ways of doing this overall as well. I was trying to avoid the nuclear route of the massive massive table. But I guess I would if I absolutely advised as the best way.
Solution
If you go that route, storing original JSON data (I suggest as
Then your query can simply be:
db<>fiddle here
This is even fit for any varying set of key --> column translation per vendor. Every field that has no JSON key for the given vendor in
The
jsonb rather than json - cleaner and faster to extract), then consider a translation table with key names for every relevant target column per vendor. Conveniently as another single jsonb value. Like:CREATE TABLE key2col (
vendor_id int PRIMARY KEY
, keys jsonb NOT NULL
);
INSERT INTO key2col VALUES
(100, '{"full_name":"fullname", "age":"age"}')
, (101, '{"full_name":"FULL NAME", "age":"AGE"}')
-- more ...
;Then your query can simply be:
SELECT c.vendor_id, c.email
, c.csv_data->>(k.keys->>'full_name') AS full_name
, c.csv_data->>(k.keys->>'age') AS age
FROM contacts c
LEFT JOIN key2col k USING (vendor_id);db<>fiddle here
This is even fit for any varying set of key --> column translation per vendor. Every field that has no JSON key for the given vendor in
key2col.keys will be NULL. Same if the vendor did not actually provide the declared key in contacts.csv_data.The
LEFT JOIN in the query retains rows from contacts without any registered columns. You may want a plain JOIN instead ...Code Snippets
CREATE TABLE key2col (
vendor_id int PRIMARY KEY
, keys jsonb NOT NULL
);
INSERT INTO key2col VALUES
(100, '{"full_name":"fullname", "age":"age"}')
, (101, '{"full_name":"FULL NAME", "age":"AGE"}')
-- more ...
;SELECT c.vendor_id, c.email
, c.csv_data->>(k.keys->>'full_name') AS full_name
, c.csv_data->>(k.keys->>'age') AS age
FROM contacts c
LEFT JOIN key2col k USING (vendor_id);Context
StackExchange Database Administrators Q#308684, answer score: 4
Revisions (0)
No revisions yet.