principlesqlMinor
Compare words in a string without considering their positions?
Viewed 0 times
withoutpositionswordsconsideringcomparestringtheir
Problem
In Postgres 9.6 I want to test whether two strings like these are considered the same:
So I have created this function:
which actually looks to be working correctly.
I was wondering, is there any better way to do this?
All the strings are already stripped out (spaces and punctuation) and duplication is not a problem. String length 50 characters as max estimation.
'this is a test number 01', 'number this is 01 a test' So I have created this function:
CREATE OR REPLACE FUNCTION sort_text(a text) RETURNS text AS $
declare t1 text;
BEGIN
select(array_to_string (
array(
select * from unnest(string_to_array(a, ' ')) order by 1), ' ')) into t1;
RETURN t1;
END;
$ LANGUAGE plpgsql;
select (sort_text('this is a test number 01') = sort_text('number this is 01 a test'));which actually looks to be working correctly.
I was wondering, is there any better way to do this?
'this and this' and 'and this' are considered to be different.All the strings are already stripped out (spaces and punctuation) and duplication is not a problem. String length 50 characters as max estimation.
Solution
I suggest a single SELECT in a plain SQL function:
Call:
This is assuming:
The function returns NULL for any empty string or NULL input.
The
You could use
BTW, your simple function
CREATE OR REPLACE FUNCTION strings_equivalent(a text, b text)
RETURNS bool AS
$func$
SELECT a1 = b1
FROM (
SELECT string_agg(w, ' ') AS a1
FROM (
SELECT w
FROM unnest(string_to_array(a, ' ')) w
ORDER BY w
) a1
) a2
, (
SELECT string_agg(w, ' ') AS b1
FROM (
SELECT w
FROM unnest(string_to_array(b, ' ')) w
ORDER BY w
) b1
) b2
WHERE length(a) = length(b)
UNION ALL
SELECT FALSE
LIMIT 1; -- for clarity, not needed
$func$ LANGUAGE sql IMMUTABLE;Call:
SELECT strings_equivalent('this is a test number 01', 'number this is 01 a test');This is assuming:
- Duplicate words count like any other words
- Separator is a single space
- No leading or trailing spaces
The function returns NULL for any empty string or NULL input.
The
UNION ALL construct is a shortcut to return FALSE immediately if the input strings don't have the same length and avoid more expensive processing. Related:- Return a value if no record is found
- SQL order by query results in any arbitrary (but reproducible) manner
LIMIT 1 is not needed because the function only returns the first column of the first row anyway and ignores the rest if there are more rows.IMMUTABLE (since the result never changes for the same input) helps performance with repeated evaluation and allows indexes on functional expressions.You could use
regexp_split_to_table(a, ' ') instead of unnest(string_to_array(a, ' ')), but regular expression functions are typically more expensive. (You can cover more sophisticated separator characters with the regex, though, like '\s+' for any white space). Related:- How to preserve the original order of elements in an unnested array?
BTW, your simple function
sort_text() looks good. But use string_agg() instead of array_to_string(ARRAY(...)) in a simple SQL function. No variable assignment needed:CREATE OR REPLACE FUNCTION sort_text(a text)
RETURNS text AS
$
SELECT string_agg(w, ' ')
FROM (
SELECT w
FROM unnest(string_to_array(a, ' ')) w
ORDER BY 1
) sub
$ LANGUAGE sql IMMUTABLE;Code Snippets
CREATE OR REPLACE FUNCTION strings_equivalent(a text, b text)
RETURNS bool AS
$func$
SELECT a1 = b1
FROM (
SELECT string_agg(w, ' ') AS a1
FROM (
SELECT w
FROM unnest(string_to_array(a, ' ')) w
ORDER BY w
) a1
) a2
, (
SELECT string_agg(w, ' ') AS b1
FROM (
SELECT w
FROM unnest(string_to_array(b, ' ')) w
ORDER BY w
) b1
) b2
WHERE length(a) = length(b)
UNION ALL
SELECT FALSE
LIMIT 1; -- for clarity, not needed
$func$ LANGUAGE sql IMMUTABLE;SELECT strings_equivalent('this is a test number 01', 'number this is 01 a test');CREATE OR REPLACE FUNCTION sort_text(a text)
RETURNS text AS
$$
SELECT string_agg(w, ' ')
FROM (
SELECT w
FROM unnest(string_to_array(a, ' ')) w
ORDER BY 1
) sub
$$ LANGUAGE sql IMMUTABLE;Context
StackExchange Database Administrators Q#175368, answer score: 3
Revisions (0)
No revisions yet.