HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to find the count of words (all and unique) in a column row-wise?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
uniquethehowallcolumnwordswisefindandcount

Problem

I have a column in my Postgresql database named Description which contains text. For some analytics purpose, I want to find total number of words (seperated by spaces) - all words and unique words - in that description and set those counts in columns word_count and unique_word_count. For eg.

id
Description

1
What a great day

2
This is a product. It is useful

I want to have following output:

id
word_count
unique_word_count
Description

1
4
4
What a great day

2
7
6
This is a product. It is useful

The unique_word_count for id = 2 is 6 because the word is has been repeated 2 times.

Solution

EDIT:

Not being 100% happy with my answer, I awarded a bonus for a question of my own on this topic here.

Erwin Brandstetter (as usual) provided a sublime answer - please use that instead of what I wrote below - unless your use case is very simple and even then...

==============================================================

To solve your issue I did the following:

CREATE TABLE wordcount (id SERIAL NOT NULL PRIMARY KEY, description TEXT NOT NULL);


Populate it:

INSERT INTO wordcount (description) VALUES ('What a great day');
INSERT INTO wordcount (description) VALUES ('This is a product. It is useful');


Then performed the following query (fiddle available here):

WITH arranged AS
(
  SELECT id, 
  UNNEST
  (
    STRING_TO_ARRAY
    (
      REGEXP_REPLACE(description,  '[^\w\s]', '', 'g'), ' '
    )
  ) AS word, 
  description 
  FROM wordcount
)  
SELECT a.id, COUNT(a.word), COUNT(DISTINCT(a.word)), a.description
FROM arranged a
GROUP BY a.id, a.description;


Result:

id
Word_Count
Unique_Word_Count
Description

1
4
4
What a great day

2
7
6
This is a product. It is useful

If you're new to all this, I would advise you to go through the fiddle here which shows the development of my thought processes which led to the query above.

I would advise you to put this logic into an onInsert/onUpdate trigger - it's probably fairly computationally intensive.

Code Snippets

CREATE TABLE wordcount (id SERIAL NOT NULL PRIMARY KEY, description TEXT NOT NULL);
INSERT INTO wordcount (description) VALUES ('What a great day');
INSERT INTO wordcount (description) VALUES ('This is a product. It is useful');
WITH arranged AS
(
  SELECT id, 
  UNNEST
  (
    STRING_TO_ARRAY
    (
      REGEXP_REPLACE(description,  '[^\w\s]', '', 'g'), ' '
    )
  ) AS word, 
  description 
  FROM wordcount
)  
SELECT a.id, COUNT(a.word), COUNT(DISTINCT(a.word)), a.description
FROM arranged a
GROUP BY a.id, a.description;

Context

StackExchange Database Administrators Q#239873, answer score: 6

Revisions (0)

No revisions yet.