HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Cassandra - Query a column with collection type

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnwithquerycollectiontypecassandra

Problem

I am pretty new to cassandra, so pardon me if this turns out to be a silly question.

I have a table structure as below

CREATE TABLE data_points (
  id text PRIMARY KEY,
  created_at timestamp,
  previous_event_id varchar,
  properties map
);


I wanted to know, whether I can execute a query which gives matching records from the map type fields.

For instance, if I insert values in the table as below

INSERT INTO datapoints (id, properties) VALUES ('1', { 'fruit' : 'apple', 'band' : 'Beatles' });


Will I be able to fetch it as

SELECT * from data_points WHERE properties.band='Beatles';

Please Help.

Solution

You can index collection types in cassandra 2.1 and later. You are after:

SELECT * FROM WHERE CONTAINS

Detailed example:

cqlsh> USE ks;
cqlsh:ks> CREATE TABLE data_points (
            id text PRIMARY KEY,
            created_at timestamp,
            previous_event_id varchar,
            properties map
         );
cqlsh:ks> create index on data_points (properties);
cqlsh:ks> INSERT INTO data_points (id, properties) VALUES ('1', { 'fruit' : 'apple', 'band' : 'Beatles' });
cqlsh:ks> INSERT INTO data_points (id, properties) VALUES ('2', { 'fruit' : 'cherry', 'band' : 'Beatles' });
cqlsh:ks> SELECT * FROM data_points WHERE properties CONTAINS 'Beatles';

 id | created_at | previous_event_id | properties
----+------------+-------------------+----------------------------------------
  2 |       null |              null | {'band': 'Beatles', 'fruit': 'cherry'}
  1 |       null |              null |  {'band': 'Beatles', 'fruit': 'apple'}

(2 rows)


Word of warning, secondary indexes don't scale out well as they use a scatter/gather algorithm to find what you need, if you plan to use them for heavy tagging it might be better to denormalize the properties field int a separate table and carry out multiple queries.

Further reading:

  • Is it possible to use cql to query collections in a row?

Code Snippets

cqlsh> USE ks;
cqlsh:ks> CREATE TABLE data_points (
            id text PRIMARY KEY,
            created_at timestamp,
            previous_event_id varchar,
            properties map<text,text>
         );
cqlsh:ks> create index on data_points (properties);
cqlsh:ks> INSERT INTO data_points (id, properties) VALUES ('1', { 'fruit' : 'apple', 'band' : 'Beatles' });
cqlsh:ks> INSERT INTO data_points (id, properties) VALUES ('2', { 'fruit' : 'cherry', 'band' : 'Beatles' });
cqlsh:ks> SELECT * FROM data_points WHERE properties CONTAINS 'Beatles';

 id | created_at | previous_event_id | properties
----+------------+-------------------+----------------------------------------
  2 |       null |              null | {'band': 'Beatles', 'fruit': 'cherry'}
  1 |       null |              null |  {'band': 'Beatles', 'fruit': 'apple'}

(2 rows)

Context

StackExchange Database Administrators Q#55165, answer score: 17

Revisions (0)

No revisions yet.