patternsqlMinor
Recursive CTE to find unique slug
Viewed 0 times
uniquecterecursiveslugfind
Problem
I have an article table where I want the slug to be unique.
When the user enters a title e.g.
It's possible that my understanding of recursive CTE is incorrect and there's no better way to find a unique slug. Please suggest any alternatives.
CREATE TABLE article (
title char(50) NOT NULL,
slug char(50) NOT NULL
);When the user enters a title e.g.
News on Apple, I want to check the database to see if a corresponding slug exists e.g. news-on-apple. If it does, I'll suffix a numerical value until I find a unique one e.g. news-on-apple-1. Can that be achieved with a recursive CTE query instead of doing recursion in my ORM. Is there a good ballpark number where I should stop recursing and error out. I can imagine people using the same title a 1000 times which would result in 1000 queries just to create 1 article. It's possible that my understanding of recursive CTE is incorrect and there's no better way to find a unique slug. Please suggest any alternatives.
Solution
First off, you do not want to use the data type
Assuming the following rules:
Note that all of the following methods are subject to a race conditions: concurrent operations might identify the same "free" name for the next slug.
To defend against it, you can impose a UNIQUE constraint on
If you glue the suffix to the basic slug name with a dash and allow basic slugs to end in separate numbers, the specification is a tiny bit ambiguous (see comments). I suggest a unique delimiter of your choice instead (which is otherwise disallowed).
Efficient rCTE
Better performance without rCTE
If you worry about thousands of slugs competing for the same slug or generally want to optimize performance, I'd consider a different, faster approach.
If the basic slug isn't taken yet, the more expensive second
Check for the leading string and the suffix separately, so the
%%CODEBLOCK_2%%
Detailed explanation:
Convert the suffix to integer before you apply
Optimize performance
To get the optimum, consider storing the suffix separated from the basic slug and concatenate the slug as needed:
%%CODEBLOCK_3%%
The query for a new slug then becomes:
%%CODEBLOCK_4%%
Ideally supported with a unique index on
Query for list of slugs
In any version of Postgres you can provide rows in a
%%CODEBLOCK_5%%
You can also use
%%CODEBLOCK_6%%
Details under this related question (as commented below):
For long lists, the
Since Postgres 9.4 you can also use the new variant of
Given an array of basic slugs and a corresponding array of suffixes (as per comment):
%%CODEBLOCK_7%%
char(50). Use varchar(50) or just text. See:- Any downsides of using data type “text” for storing strings?
Assuming the following rules:
- Basic slugs never end with a dash.
- Duplicate slugs are suffixed with a dash and a sequential number (
-123).
Note that all of the following methods are subject to a race conditions: concurrent operations might identify the same "free" name for the next slug.
To defend against it, you can impose a UNIQUE constraint on
slug and be prepared to repeat an INSERT upon duplicate key violation or you to take out a write lock on the table at the start of the transaction.If you glue the suffix to the basic slug name with a dash and allow basic slugs to end in separate numbers, the specification is a tiny bit ambiguous (see comments). I suggest a unique delimiter of your choice instead (which is otherwise disallowed).
Efficient rCTE
WITH RECURSIVE
input AS (SELECT 'news-on-apple'::text AS slug) -- input basic slug here once
, cte AS (
SELECT slug || '-' AS slug -- append '-' once, if basic slug exists
, 1 as suffix -- start with suffix 1
FROM article
JOIN input USING (slug)
UNION ALL
SELECT c.slug, c.suffix + 1 -- increment by 1 ...
FROM cte c
JOIN article a ON a.slug = c.slug || c.suffix -- ... if slug-n already exists
)
(
SELECT slug || suffix AS slug
FROM cte
ORDER BY suffix DESC -- pick the last (free) one
LIMIT 1
) -- parentheses required
UNION ALL -- if the basic slug wasn't taken, fall back to that
SELECT slug FROM input
LIMIT 1;Better performance without rCTE
If you worry about thousands of slugs competing for the same slug or generally want to optimize performance, I'd consider a different, faster approach.
WITH input AS (SELECT 'news-on-apple'::text AS slug
, 'news-on-apple-'::text AS slug1) -- input basic slug here
SELECT i.slug
FROM input i
LEFT JOIN article a USING (slug)
WHERE a.slug IS NULL -- doesn't exist yet.
UNION ALL
( -- parentheses required
SELECT i.slug1 || COALESCE(right(a.slug, length(i.slug1) * -1)::int + 1, 1)
FROM input i
LEFT JOIN article a ON a.slug LIKE (i.slug1 || '%') -- match up to last "-"
AND right(a.slug, length(i.slug1) * -1) ~ '^\d+
If the basic slug isn't taken yet, the more expensive second SELECT is never executed - same as above, but much more important here. Check with EXPLAIN ANALYZE, Postgres is smart that way with LIMIT queries. See:
- Optimize a query on two big tables
Check for the leading string and the suffix separately, so the LIKE expression can use a basic btree index with text_pattern_ops like
CREATE INDEX article_slug_idx ON article (slug text_pattern_ops);
Detailed explanation:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Convert the suffix to integer before you apply max(). Numbers in text representation don't work.
Optimize performance
To get the optimum, consider storing the suffix separated from the basic slug and concatenate the slug as needed: concat_ws('-' , slug, suffix::text) AS slug
CREATE TABLE article (
article_id serial PRIMARY KEY
, title text NOT NULL
, slug text NOT NULL
, suffix int
);
The query for a new slug then becomes:
SELECT slug
|| COALESCE((
SELECT '-'::text || (max(suffix) + 1)::text
FROM article a
WHERE a.slug = i.slug), '') As slug
FROM (SELECT 'news-on-apple'::text AS slug) i -- input basic slug here
Ideally supported with a unique index on (slug, suffix).
Query for list of slugs
In any version of Postgres you can provide rows in a VALUES expression.
SELECT *
FROM article
JOIN (
VALUES
('slug-foo'::text, 1)
('slug-bar',7)
) u(slug,suffix) USING (slug,suffix);
You can also use IN with a set of row-type expressions Which is shorter:
SELECT *
FROM article
WHERE (slug,suffix) IN (('slug-foo', 1), ('slug-bar',7));
Details under this related question (as commented below):
- the 's
For long lists, the JOIN to a VALUES expression is typically faster.
Since Postgres 9.4 you can also use the new variant of unnest() to unnest multiple arrays in parallel.
Given an array of basic slugs and a corresponding array of suffixes (as per comment):
SELECT *
FROM article
JOIN unnest('{slug-foo,slug-bar}'::text[]
, '{1,7}'::int[]) AS u(slug,suffix) USING (slug,suffix);
-- suffix numbers only
ORDER BY right(a.slug, length(i.slug1) * -1)::int DESC
)
LIMIT 1;If the basic slug isn't taken yet, the more expensive second
SELECT is never executed - same as above, but much more important here. Check with EXPLAIN ANALYZE, Postgres is smart that way with LIMIT queries. See:- Optimize a query on two big tables
Check for the leading string and the suffix separately, so the
LIKE expression can use a basic btree index with text_pattern_ops like%%CODEBLOCK_2%%
Detailed explanation:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Convert the suffix to integer before you apply
max(). Numbers in text representation don't work.Optimize performance
To get the optimum, consider storing the suffix separated from the basic slug and concatenate the slug as needed:
concat_ws('-' , slug, suffix::text) AS slug%%CODEBLOCK_3%%
The query for a new slug then becomes:
%%CODEBLOCK_4%%
Ideally supported with a unique index on
(slug, suffix).Query for list of slugs
In any version of Postgres you can provide rows in a
VALUES expression.%%CODEBLOCK_5%%
You can also use
IN with a set of row-type expressions Which is shorter:%%CODEBLOCK_6%%
Details under this related question (as commented below):
- the 's
For long lists, the
JOIN to a VALUES expression is typically faster.Since Postgres 9.4 you can also use the new variant of
unnest() to unnest multiple arrays in parallel.Given an array of basic slugs and a corresponding array of suffixes (as per comment):
%%CODEBLOCK_7%%
Code Snippets
WITH RECURSIVE
input AS (SELECT 'news-on-apple'::text AS slug) -- input basic slug here once
, cte AS (
SELECT slug || '-' AS slug -- append '-' once, if basic slug exists
, 1 as suffix -- start with suffix 1
FROM article
JOIN input USING (slug)
UNION ALL
SELECT c.slug, c.suffix + 1 -- increment by 1 ...
FROM cte c
JOIN article a ON a.slug = c.slug || c.suffix -- ... if slug-n already exists
)
(
SELECT slug || suffix AS slug
FROM cte
ORDER BY suffix DESC -- pick the last (free) one
LIMIT 1
) -- parentheses required
UNION ALL -- if the basic slug wasn't taken, fall back to that
SELECT slug FROM input
LIMIT 1;WITH input AS (SELECT 'news-on-apple'::text AS slug
, 'news-on-apple-'::text AS slug1) -- input basic slug here
SELECT i.slug
FROM input i
LEFT JOIN article a USING (slug)
WHERE a.slug IS NULL -- doesn't exist yet.
UNION ALL
( -- parentheses required
SELECT i.slug1 || COALESCE(right(a.slug, length(i.slug1) * -1)::int + 1, 1)
FROM input i
LEFT JOIN article a ON a.slug LIKE (i.slug1 || '%') -- match up to last "-"
AND right(a.slug, length(i.slug1) * -1) ~ '^\d+$' -- suffix numbers only
ORDER BY right(a.slug, length(i.slug1) * -1)::int DESC
)
LIMIT 1;CREATE INDEX article_slug_idx ON article (slug text_pattern_ops);CREATE TABLE article (
article_id serial PRIMARY KEY
, title text NOT NULL
, slug text NOT NULL
, suffix int
);SELECT slug
|| COALESCE((
SELECT '-'::text || (max(suffix) + 1)::text
FROM article a
WHERE a.slug = i.slug), '') As slug
FROM (SELECT 'news-on-apple'::text AS slug) i -- input basic slug hereContext
StackExchange Database Administrators Q#86370, answer score: 7
Revisions (0)
No revisions yet.