HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SELECT all substrings (n-grams) of length n?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
substringsgramsalllengthselect

Problem

How could I select all n-grams, ie. substrings of length n from a string using SQL? For example, the 3-grams of string example are exa, xam, amp, mpl, ple.

I'm using PostgreSQL to be more precise.

Solution

This does the trick:

SELECT SUBSTRING('example' FROM n FOR 3) 
FROM GENERATE_SERIES(1, LENGTH( 'example' )-2, 1) n;
exa
xam
amp
mpl
ple


Here it is in a function:

CREATE OR REPLACE FUNCTION ngrams(varchar, integer) RETURNS SETOF TEXT AS $
SELECT SUBSTRING($1 FROM n FOR $2)::TEXT
FROM GENERATE_SERIES(1, LENGTH($1)-($2-1), 1) n;
$ LANGUAGE SQL;


Use it like:

SELECT ngrams('example', 3)

Code Snippets

SELECT SUBSTRING('example' FROM n FOR 3) 
FROM GENERATE_SERIES(1, LENGTH( 'example' )-2, 1) n;
exa
xam
amp
mpl
ple
CREATE OR REPLACE FUNCTION ngrams(varchar, integer) RETURNS SETOF TEXT AS $$
SELECT SUBSTRING($1 FROM n FOR $2)::TEXT
FROM GENERATE_SERIES(1, LENGTH($1)-($2-1), 1) n;
$$ LANGUAGE SQL;
SELECT ngrams('example', 3)

Context

StackExchange Database Administrators Q#81369, answer score: 5

Revisions (0)

No revisions yet.