HiveBrain v1.2.0
Get Started
← Back to all entries
snippetrubyMinor

How to properly call an "upsert" using parameterized raw SQL to Postgresql in ActiveRecord?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
postgresqlproperlysqlupsertactiverecordrawcallusinghowparameterized

Problem

I need to call an upsert to my Category table in PostgreSQL. My current solution uses the exec_query API in ActiveRecord, using binding.

What I got so far looks like this:

VERIFY_CATEGORY_SQL = <<SQL
  WITH sel AS (SELECT id FROM categories WHERE name=$1 AND source=$2),
  ins AS (INSERT INTO categories (name, source, created_at, updated_at) SELECT
    $3, $4, 'now', 'now' WHERE NOT EXISTS (SELECT * from sel) RETURNING id)
  select id from ins union all select id from sel
SQL


VERIFY_CATEGORY_SQL = <<SQL
  WITH sel AS (SELECT id FROM categories WHERE name=$1 AND source=$2),
  ins AS (INSERT INTO categories (name, source, created_at, updated_at) SELECT
    $1, $2, now(), now() WHERE NOT EXISTS (SELECT * from sel) RETURNING id)
  select id from ins union all select id from sel
SQL


def verify_category_existence(name, source)
ActiveRecord::Base.connection.exec_query(VERIFY_CATEGORY_SQL, 'SQL',
[[nil, name], [nil, source]]).first['id'].to_i
end


Notes

  • In order to make this work I needed to add prepared_statements: true attribute to my database.yml file.



  • Although I need to reuse my name and source parameters, I can't and have to duplicate them as parameters $1, $3 and $2, $4 respectively (Apparently this works fine...)



  • The method call seems a bit hackish, and I believe maybe it was not meant as a public method, and only intended for internal uses (hence the awkward [[nil, x]...] syntax and the 'SQL' parameter, which apparently is for logging purposes)



What do you think of this solution?

Do you know of a better way to do this?

Edit:

I've received great responses to the SQL/Postgresql side of my solution, but I would love to also hear from our rubyists regarding the Ruby/ActiveRecord part of the solution - is this way correct? is there a better solution?

Edit:

Just found out that using 'now' produced unexpected results - timestamp values were re-used rather than calculated on each insert. Changed it to `now()

Solution

Statement, formatted readably:

WITH 
sel AS (
  SELECT id FROM categories WHERE name=$1 AND source=$2
),
ins AS (
  INSERT INTO categories (name, source, created_at, updated_at )
  SELECT  $1, $2, 'now', 'now'
  WHERE NOT EXISTS (SELECT * FROM sel) 
  RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM sel;


Dummy table:

CREATE TABLE categories (
    id serial not null PRIMARY KEY,
    name text not null,
    source text not null,
    created_at timestamp,
    updated_at timestamp,
    UNIQUE(name, source)
);


Demo, with three concurrent psql sessions:

  • Session1: PREPARE q(text,text) AS WITH ...



  • Session2: PREPARE q(text,text) AS WITH ...



  • Session0: BEGIN;



  • Session0: LOCK TABLE categories IN ACCESS EXCLUSIVE MODE;



  • Session1: EXECUTE q('n', 's');



  • Session2: EXECUTE q('n', 's');



  • Session0: ROLLBACK;



One session will report:

regress=> EXECUTE q('n', 's');
 id 
----
  1
(1 row)


and one will report:

regress=> EXECUTE q('n', 's');
ERROR:  duplicate key value violates unique constraint "categories_name_source_key"
DETAIL:  Key (name, source)=(n, s) already exists.


So it might be OK if you have a unique constraint on (name, source) and are willing to retry errors, but if you don't have that unique constraint:

ALTER TABLE categories DROP CONSTRAINT categories_name_source_key;


then you'll get a double insert, because your SELECT for IDs can run on both sessions before either session executes the INSERT.

If you have a unique constraint, there's no point wasting your time with all this hoop jumping. You're better off just running the insert and if it fails, querying for the ID of the existing row.

Code Snippets

WITH 
sel AS (
  SELECT id FROM categories WHERE name=$1 AND source=$2
),
ins AS (
  INSERT INTO categories (name, source, created_at, updated_at )
  SELECT  $1, $2, 'now', 'now'
  WHERE NOT EXISTS (SELECT * FROM sel) 
  RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM sel;
CREATE TABLE categories (
    id serial not null PRIMARY KEY,
    name text not null,
    source text not null,
    created_at timestamp,
    updated_at timestamp,
    UNIQUE(name, source)
);
regress=> EXECUTE q('n', 's');
 id 
----
  1
(1 row)
regress=> EXECUTE q('n', 's');
ERROR:  duplicate key value violates unique constraint "categories_name_source_key"
DETAIL:  Key (name, source)=(n, s) already exists.
ALTER TABLE categories DROP CONSTRAINT categories_name_source_key;

Context

StackExchange Code Review Q#41277, answer score: 5

Revisions (0)

No revisions yet.