HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres: Insert distinct values from one table into another table with constrains in place

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctinsertpostgreswithintoplaceoneconstrainsanothervalues

Problem

I'm trying to insert data from one PostgreSQL 9.3 table to another while removing duplicate entries.

I have two tables containing email addresses.

The main table contains emails and a tag for each email address. The combination (email, tag) must be unique, and for this there is a constraint Unique(email, tag) on it.

The second table get created on the fly from text files which contain only the email address. There are many duplicates there.

I need to import the data from the temp table into the main one without breaking the constraint above. For a certain file containing email addresses the tag is constant.

Table structures:

CREATE TABLE emails (   
  email character varying(128),
  tag bigint,
  CONSTRAINT "unique-tag-email" UNIQUE (email, tag) )


and

CREATE TABLE emails_temp (email character varying(128)


Here is my query:

insert into emails(tag,email) 
select 
 655,t.email 
from 
 emails_temp as t 
where 
 not exists ( select email from emails where email = t.email )


Note: 655 is just a tag of a certain group of email addresses.

This is the error I get:


ERROR: duplicate key value violates unique constraint
"unique-tag-email" SQL state: 23505 Detail: Key (email,
tag)=(user@hotmail.com, 655) already exists.

There are indeed, two email addresses user@hotmail.com in the file.

Useless to say, because of this error, there is nothing added in the main table (email).

What am I doing wrong?

Solution

There are 3 possible kinds of duplicates:

-
Duplicates within the rows of the bulk insert.

-
Duplicates between inserted rows and existing rows.

-
Duplicates between inserted rows and concurrently inserted / updated rows from other transactions.

Just like I explained in this closely related answer:

  • Using EXCEPTION to ignore duplicates during bulk inserts



But things have become easier for 2. and 3. since Postgres 9.5 introduced UPSERT (INSERT .. ON CONFLICT DO NOTHING).

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp
ON CONFLICT (email) DO NOTHING;


If your duplicates only stem from duplicate entries in the source (1.), like you indicated, then all you need is DISTINCT. Works in any version of Postgres:

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp;

Code Snippets

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp
ON CONFLICT (email) DO NOTHING;
INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp;

Context

StackExchange Database Administrators Q#160469, answer score: 5

Revisions (0)

No revisions yet.