snippetsqlMinor
How do I disable escaping when using COPY FROM in PostgreSQL?
Viewed 0 times
postgresqlescapinghowdisableusingwhenfromcopy
Problem
I have a large tab delimited file that I want to read into a table in PostgreSQL 9.5. It contains double-quotes and backslashes that I want to treat as regular characters.
I think COPY FROM is the way to go, but I can't figure out how to disable escaping.
Here is a sample of the data (it's from Google's ngram dataset):
And the table:
If I try without modifiers, the backslash escapes the tab:
So I switch to CSV, and then the double quote characters quote the tabs:
Using CSV lets me use the DELIMITER keyword. It works if I pick a delimiter that doesn't occur in the sample (space, in this case):
But I want to be able to include any character (except for tab and newline). So, how can I disable QUOTE? Or what can I use instead of COPY FROM?
Edit: For slightly arbitrary reasons, ideally I'd like an option t
I think COPY FROM is the way to go, but I can't figure out how to disable escaping.
Here is a sample of the data (it's from Google's ngram dataset):
aX13_X 2006 8 5
aX13_X 2007 4 3
aX13_X 2008 2 1
a\ 1852 1 1
a\ 1935 1 1
a\ 1937 2 2
ACT1V1T1ES 2003 15 11
ACT1V1T1ES 2004 63 6
ACT1V1T1ES 2005 1 1
ACT1V1T1ES 2006 5 4
ACT1V1T1ES 2008 4 3
ACTION=" 1995 3 3
ACTION=" 1996 6 5
ACTION=" 1997 9 7
ACTION=" 1998 19 11
ACTION=" 1999 11 5And the table:
CREATE TABLE onegram (
id SERIAL,
ngram character text,
year integer NOT NULL,
match_count integer NOT NULL,
volume_count integer NOT NULL
);If I try without modifiers, the backslash escapes the tab:
ngram=# copy onegram (ngram, year, match_count, volume_count)
from '/home/tims/data/ngram/test.tsv';
ERROR: missing data for column "volume_count"
CONTEXT: COPY onegram, line 4: "a\ 1852 1 1"So I switch to CSV, and then the double quote characters quote the tabs:
ngram=# copy onegram (ngram, year, match_count, volume_count)
from '/home/tims/data/ngram/test.tsv' WITH CSV DELIMITER E'\t';
ERROR: unterminated CSV quoted field
CONTEXT: COPY onegram, line 17: "ACTION=" 1999 11 5
"Using CSV lets me use the DELIMITER keyword. It works if I pick a delimiter that doesn't occur in the sample (space, in this case):
ngram=# copy onegram (ngram, year, match_count, volume_count)
from '/home/tims/data/ngram/test.tsv' WITH CSV DELIMITER E'\t' QUOTE E' ';
COPY 16But I want to be able to include any character (except for tab and newline). So, how can I disable QUOTE? Or what can I use instead of COPY FROM?
Edit: For slightly arbitrary reasons, ideally I'd like an option t
Solution
The
copy command by default uses text format with tab delimiter. So only one thing you need is to escape backslashes:copy onegram (ngram, year, match_count, volume_count)
from program 'sed ''s/\\/\\\\/g'' < /home/tims/data/ngram/test.tsv';
select * from onegram;
╔════╤════════════╤══════╤═════════════╤══════════════╗
║ id │ ngram │ year │ match_count │ volume_count ║
╠════╪════════════╪══════╪═════════════╪══════════════╣
║ 1 │ aX13_X │ 2006 │ 8 │ 5 ║
║ 2 │ aX13_X │ 2007 │ 4 │ 3 ║
║ 3 │ aX13_X │ 2008 │ 2 │ 1 ║
║ 4 │ a\ │ 1852 │ 1 │ 1 ║
║ 5 │ a\ │ 1935 │ 1 │ 1 ║
║ 6 │ a\ │ 1937 │ 2 │ 2 ║
║ 7 │ ACT1V1T1ES │ 2003 │ 15 │ 11 ║
║ 8 │ ACT1V1T1ES │ 2004 │ 63 │ 6 ║
║ 9 │ ACT1V1T1ES │ 2005 │ 1 │ 1 ║
║ 10 │ ACT1V1T1ES │ 2006 │ 5 │ 4 ║
║ 11 │ ACT1V1T1ES │ 2008 │ 4 │ 3 ║
║ 12 │ ACTION=" │ 1995 │ 3 │ 3 ║
║ 13 │ ACTION=" │ 1996 │ 6 │ 5 ║
║ 14 │ ACTION=" │ 1997 │ 9 │ 7 ║
║ 15 │ ACTION=" │ 1998 │ 19 │ 11 ║
║ 16 │ ACTION=" │ 1999 │ 11 │ 5 ║
╚════╧════════════╧══════╧═════════════╧══════════════╝Code Snippets
copy onegram (ngram, year, match_count, volume_count)
from program 'sed ''s/\\/\\\\/g'' < /home/tims/data/ngram/test.tsv';
select * from onegram;
╔════╤════════════╤══════╤═════════════╤══════════════╗
║ id │ ngram │ year │ match_count │ volume_count ║
╠════╪════════════╪══════╪═════════════╪══════════════╣
║ 1 │ aX13_X │ 2006 │ 8 │ 5 ║
║ 2 │ aX13_X │ 2007 │ 4 │ 3 ║
║ 3 │ aX13_X │ 2008 │ 2 │ 1 ║
║ 4 │ a\ │ 1852 │ 1 │ 1 ║
║ 5 │ a\ │ 1935 │ 1 │ 1 ║
║ 6 │ a\ │ 1937 │ 2 │ 2 ║
║ 7 │ ACT1V1T1ES │ 2003 │ 15 │ 11 ║
║ 8 │ ACT1V1T1ES │ 2004 │ 63 │ 6 ║
║ 9 │ ACT1V1T1ES │ 2005 │ 1 │ 1 ║
║ 10 │ ACT1V1T1ES │ 2006 │ 5 │ 4 ║
║ 11 │ ACT1V1T1ES │ 2008 │ 4 │ 3 ║
║ 12 │ ACTION=" │ 1995 │ 3 │ 3 ║
║ 13 │ ACTION=" │ 1996 │ 6 │ 5 ║
║ 14 │ ACTION=" │ 1997 │ 9 │ 7 ║
║ 15 │ ACTION=" │ 1998 │ 19 │ 11 ║
║ 16 │ ACTION=" │ 1999 │ 11 │ 5 ║
╚════╧════════════╧══════╧═════════════╧══════════════╝Context
StackExchange Database Administrators Q#157212, answer score: 5
Revisions (0)
No revisions yet.