snippetsqlMinor
How to find the count of words (all and unique) in a column row-wise?
Viewed 0 times
uniquethehowallcolumnwordswisefindandcount
Problem
I have a column in my Postgresql database named
id
Description
1
What a great day
2
This is a product. It is useful
I want to have following output:
id
word_count
unique_word_count
Description
1
4
4
What a great day
2
7
6
This is a product. It is useful
The
Description which contains text. For some analytics purpose, I want to find total number of words (seperated by spaces) - all words and unique words - in that description and set those counts in columns word_count and unique_word_count. For eg.id
Description
1
What a great day
2
This is a product. It is useful
I want to have following output:
id
word_count
unique_word_count
Description
1
4
4
What a great day
2
7
6
This is a product. It is useful
The
unique_word_count for id = 2 is 6 because the word is has been repeated 2 times.Solution
EDIT:
Not being 100% happy with my answer, I awarded a bonus for a question of my own on this topic here.
Erwin Brandstetter (as usual) provided a sublime answer - please use that instead of what I wrote below - unless your use case is very simple and even then...
==============================================================
To solve your issue I did the following:
Populate it:
Then performed the following query (fiddle available here):
Result:
id
Word_Count
Unique_Word_Count
Description
1
4
4
What a great day
2
7
6
This is a product. It is useful
If you're new to all this, I would advise you to go through the fiddle here which shows the development of my thought processes which led to the query above.
I would advise you to put this logic into an onInsert/onUpdate trigger - it's probably fairly computationally intensive.
Not being 100% happy with my answer, I awarded a bonus for a question of my own on this topic here.
Erwin Brandstetter (as usual) provided a sublime answer - please use that instead of what I wrote below - unless your use case is very simple and even then...
==============================================================
To solve your issue I did the following:
CREATE TABLE wordcount (id SERIAL NOT NULL PRIMARY KEY, description TEXT NOT NULL);Populate it:
INSERT INTO wordcount (description) VALUES ('What a great day');
INSERT INTO wordcount (description) VALUES ('This is a product. It is useful');Then performed the following query (fiddle available here):
WITH arranged AS
(
SELECT id,
UNNEST
(
STRING_TO_ARRAY
(
REGEXP_REPLACE(description, '[^\w\s]', '', 'g'), ' '
)
) AS word,
description
FROM wordcount
)
SELECT a.id, COUNT(a.word), COUNT(DISTINCT(a.word)), a.description
FROM arranged a
GROUP BY a.id, a.description;Result:
id
Word_Count
Unique_Word_Count
Description
1
4
4
What a great day
2
7
6
This is a product. It is useful
If you're new to all this, I would advise you to go through the fiddle here which shows the development of my thought processes which led to the query above.
I would advise you to put this logic into an onInsert/onUpdate trigger - it's probably fairly computationally intensive.
Code Snippets
CREATE TABLE wordcount (id SERIAL NOT NULL PRIMARY KEY, description TEXT NOT NULL);INSERT INTO wordcount (description) VALUES ('What a great day');
INSERT INTO wordcount (description) VALUES ('This is a product. It is useful');WITH arranged AS
(
SELECT id,
UNNEST
(
STRING_TO_ARRAY
(
REGEXP_REPLACE(description, '[^\w\s]', '', 'g'), ' '
)
) AS word,
description
FROM wordcount
)
SELECT a.id, COUNT(a.word), COUNT(DISTINCT(a.word)), a.description
FROM arranged a
GROUP BY a.id, a.description;Context
StackExchange Database Administrators Q#239873, answer score: 6
Revisions (0)
No revisions yet.