Working with Storage-attached indexing (SAI)

Prerequisites

Create SAI index

To create an SAI index, you must define the index name, table name, and column name for the column to be indexed.

To create a simple SAI index:

  1. CREATE CUSTOM INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname)
  2. USING 'StorageAttachedIndex'
  3. WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
  4. CREATE CUSTOM INDEX age_sai_idx ON cycling.cyclist_semi_pro (age)
  5. USING 'StorageAttachedIndex';
  6. CREATE CUSTOM INDEX country_sai_idx ON cycling.cyclist_semi_pro (country)
  7. USING 'StorageAttachedIndex'
  8. WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
  9. CREATE CUSTOM INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration)
  10. USING 'StorageAttachedIndex';

For most SAI indexes, the column name is defined in the CREATE CUSTOM INDEX statement that also uses USING 'StorageAttachedIndex'. The SAI index options are defined in the WITH OPTIONS clause. The case_sensitive option is set to false to allow case-insensitive searches. The normalize option is set to true to allow searches to be normalized for Unicode characters. The ascii_only option is set to true to allow searches to be limited to ASCII characters.

The map collection data type is the one exception, as shown in the example below.

Partition key SAI error

SAI indexes cannot be created on the partition key, as a primary index already exists and is used for queries. If you attempt to create an SAI on the partition key column, an error will be returned:

  • CQL

  • Result

  1. CREATE CUSTOM INDEX ON demo2.person_id_name_primarykey (id)
  2. USING 'StorageAttachedIndex';
  1. InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot create secondary index on the only partition key column id"

map collection in SAI index

Map collections do have a different format than other SAI indexes:

  1. // Create an index on a map key to find all cyclist/team combos for a year
  2. // tag::keysidx[]
  3. CREATE INDEX IF NOT EXISTS team_year_keys_idx
  4. ON cycling.cyclist_teams ( KEYS (teams) );
  5. // end::keysidx[]
  6. // Create an index on a map key to find all cyclist/team combos for a year
  7. // tag::valuesidx[]
  8. CREATE INDEX IF NOT EXISTS team_year_values_idx
  9. ON cycling.cyclist_teams ( VALUES (teams) );
  10. // end::valuesidx[]
  11. // Create an index on a map key to find all cyclist/team combos for a year
  12. // tag::entriesidx[]
  13. CREATE INDEX IF NOT EXISTS team_year_entries_idx
  14. ON cycling.cyclist_teams ( ENTRIES (teams) );
  15. // end::entriesidx[]

This example uses the following table:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);

To check if comment_vector has a particular similarity function set, use the similarity-function option set to one of the supported similarity functions: DOT_PRODUCT, COSINE, or EUCLIDEAN. The default similarity function is COSINE.

This index creates an index on the comment_vector column with the similarity function set to DOT_PRODUCT:

  1. CREATE CUSTOM INDEX sim_comments_idx
  2. ON cycling.comments_vs (comment_vector)
  3. USING 'StorageAttachedIndex'
  4. WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT'};

Other resources

See CREATE CUSTOM INDEX for more information about creating SAI indexes.

Alter SAI index

SAI indexes cannot be altered. If you need to modify an SAI index, you will need to drop the current index, create a new index, and rebuild the cycling.

  1. Drop index

    1. DROP INDEX IF EXISTS cycling.lastname_sai_idx;
  2. Create new index

    1. CREATE CUSTOM INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname)
    2. USING 'StorageAttachedIndex'
    3. WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
    4. CREATE CUSTOM INDEX age_sai_idx ON cycling.cyclist_semi_pro (age)
    5. USING 'StorageAttachedIndex';
    6. CREATE CUSTOM INDEX country_sai_idx ON cycling.cyclist_semi_pro (country)
    7. USING 'StorageAttachedIndex'
    8. WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
    9. CREATE CUSTOM INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration)
    10. USING 'StorageAttachedIndex';

Drop SAI index

SAI indexes can be dropped (deleted).

To drop an SAI index:

  1. DROP INDEX IF EXISTS cycling.lastname_sai_idx;

This command does not return a result.

Querying with SAI

The SAI quickstart focuses only on defining multiple indexes based on non-primary key columns (a very useful feature). Let’s explore other options using some examples of how you can run queries on tables that have differently defined SAI indexes.

Unresolved include directive in modules/cassandra/pages/developing/cql/indexing/sai/sai-query.adoc - include::developing:partial$sai/sai-only-select.adoc[]

This example uses the following table and index:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX IF NOT EXISTS ann_index
  12. ON cycling.comments_vs(comment_vector) USING 'StorageAttachedIndex';

Query vector data with CQL

To query data using Vector Search, use a SELECT query:

  • CQL

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
  3. LIMIT 3;
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. --------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
  3. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 616e77e0-22a2-11ee-b99d-1f350647414a
  4. c7fceba0-c141-4207-9494-a29f9809de6f | 2017-02-17 08:43:20.234000+0000 | Glad you ran the race in the rain | [0.3, 0.34, 0.2, 0.78, 0.25] | Amy | 6170c1d0-22a2-11ee-b99d-1f350647414a
  5. c7fceba0-c141-4207-9494-a29f9809de6f | 2017-04-01 13:43:08.030000+0000 | Last climb was a killer | [0.3, 0.75, 0.2, 0.2, 0.5] | Amy | 62105d30-22a2-11ee-b99d-1f350647414a

The limit has to be 1,000 or fewer.

Scrolling to the right on the results shows the comments from the table that most closely matched the embeddings used for the query.

Single index match on a column

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX commenter_idx
  12. ON cycling.comments_vs (commenter)
  13. USING 'StorageAttachedIndex';
  14. CREATE CUSTOM INDEX created_at_idx
  15. ON cycling.comments_vs (created_at)
  16. USING 'StorageAttachedIndex';
  17. CREATE CUSTOM INDEX ann_index
  18. ON cycling.comments_vs (comment_vector)
  19. USING 'StorageAttachedIndex';

The column commenter is not the partition key in this table, so an index is required to query on it.

Query for a match on that column:

  • Query

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. WHERE commenter = 'Alex';
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. --------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
  3. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 6d0cdaa0-272b-11ee-859f-b9098002fcac
  4. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac

Failure with index

Note that a failure will occur if you try this query before creating the index:

  • Query

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. WHERE commenter = 'Alex';
  1. InvalidRequest: Error from server: code=2200
  2. [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.
  3. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Single index match on a column with options

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX commenter_cs_idx ON cycling.comments_vs (commenter)
  12. USING 'StorageAttachedIndex'
  13. WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};

Case-sensitivty

The column commenter is not the partition key in this table, so an index is required to query on it. If we want to check commenter as a case-sensitive value, we can use the case_sensitive option set to true.

Note that no results are returned if you use an inappropriately case-sensitive value in the query:

  • Query

  • Result

  1. SELECT * FROM comments_vs WHERE commenter ='alex';
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. ----+------------+---------+----------------+-----------+-----------
  3. (0 rows)

When we switch the case of the cyclist’s name to match the case in the index, the query succeeds:

  • Query

  • Result

  1. SELECT comment,commenter FROM comments_vs WHERE commenter ='Alex';
  1. comment | commenter
  2. ----------------------------------------+-----------
  3. LATE RIDERS SHOULD NOT DELAY THE START | Alex
  4. Second rest stop was out of water | Alex
  5. (2 rows)

Index match on a composite partition key column

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
  2. race_year int,
  3. race_name text,
  4. cyclist_name text,
  5. rank int,
  6. PRIMARY KEY ((race_year, race_name), rank)
  7. );
  8. CREATE CUSTOM INDEX race_name_idx
  9. ON cycling.rank_by_year_and_name (race_name)
  10. USING 'StorageAttachedIndex';
  11. CREATE CUSTOM INDEX race_year_idx
  12. ON cycling.rank_by_year_and_name (race_year)
  13. USING 'StorageAttachedIndex';

Composite partition keys have a partition defined by multiple columns in a table. Normally, you would need to specify all the columns in the partition key to query the table with a WHERE clause. However, an SAI index makes it possible to define an index using a single column in the table’s composite partition key. You can create an SAI index on each column in the composite partition key, if you need to query based on just one column.

SAI indexes also allow you to query tables without using the inefficient ALLOW FILTERING directive. The ALLOW FILTERING directive requires scanning all the partitions in a table, leading to poor performance.

The race_year and race_name columns comprise the composite partition key for the cycling.rank_by_year_and_name table.

Query for a match on the column race_name:

  • Query

  • Result

  1. SELECT * FROM cycling.rank_by_year_and_name
  2. WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
  1. race_year | race_name | rank | cyclist_name
  2. -----------+--------------------------------------------+------+----------------------
  3. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
  4. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
  5. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
  6. 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
  7. 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN
  8. 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS

Query for a match on the column race_year:

  • Query

  • Result

  1. SELECT * FROM cycling.rank_by_year_and_name
  2. WHERE race_year = 2014;
  1. race_year | race_name | rank | cyclist_name
  2. -----------+--------------------------------------------+------+----------------------
  3. 2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
  4. 2014 | 4th Tour of Beijing | 2 | Daniel MARTIN
  5. 2014 | 4th Tour of Beijing | 3 | Johan Esteban CHAVES
  6. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
  7. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
  8. 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES

Multiple indexes matched with AND

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX commenter_idx
  12. ON cycling.comments_vs (commenter)
  13. USING 'StorageAttachedIndex';
  14. CREATE CUSTOM INDEX created_at_idx
  15. ON cycling.comments_vs (created_at)
  16. USING 'StorageAttachedIndex';
  17. CREATE CUSTOM INDEX ann_index
  18. ON cycling.comments_vs (comment_vector)
  19. USING 'StorageAttachedIndex';

Several indexes are created for the table to demonstrate how to query for matches on more than one column.

Query for matches on more than one column, and both columns must match:

  • Query

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. WHERE
  3. created_at='2017-03-21 21:11:09.999000+0000'
  4. AND commenter = 'Alex';
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. --------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
  3. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac

Multiple indexes matched with OR

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX commenter_idx
  12. ON cycling.comments_vs (commenter)
  13. USING 'StorageAttachedIndex';
  14. CREATE CUSTOM INDEX created_at_idx
  15. ON cycling.comments_vs (created_at)
  16. USING 'StorageAttachedIndex';
  17. CREATE CUSTOM INDEX ann_index
  18. ON cycling.comments_vs (comment_vector)
  19. USING 'StorageAttachedIndex';

Several indexes are created for the table to demonstrate how to query for matches on more than one column.

Query for a match on either one column or the other:

  • Query

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. WHERE
  3. created_at='2017-03-21 21:11:09.999000+0000'
  4. OR created_at='2017-03-22 01:16:59.001000+0000';
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. --------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
  3. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
  4. c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac

Multiple indexes matched with IN

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  2. record_id timeuuid,
  3. id uuid,
  4. commenter text,
  5. comment text,
  6. comment_vector VECTOR <FLOAT, 5>,
  7. created_at timestamp,
  8. PRIMARY KEY (id, created_at)
  9. )
  10. WITH CLUSTERING ORDER BY (created_at DESC);
  11. CREATE CUSTOM INDEX commenter_idx
  12. ON cycling.comments_vs (commenter)
  13. USING 'StorageAttachedIndex';
  14. CREATE CUSTOM INDEX created_at_idx
  15. ON cycling.comments_vs (created_at)
  16. USING 'StorageAttachedIndex';
  17. CREATE CUSTOM INDEX ann_index
  18. ON cycling.comments_vs (comment_vector)
  19. USING 'StorageAttachedIndex';

Several indexes are created for the table to demonstrate how to query for matches on more than one column.

Query for match with column values in a list of values:

  • Query

  • Result

  1. SELECT * FROM cycling.comments_vs
  2. WHERE created_at IN
  3. ('2017-03-21 21:11:09.999000+0000'
  4. ,'2017-03-22 01:16:59.001000+0000');
  1. id | created_at | comment | comment_vector | commenter | record_id
  2. --------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
  3. e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
  4. c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac

User-defined type

SAI can index either a user-defined type (UDT) or a list of UDTs. This example shows how to index a list of UDTs.

This example uses the following user-defined type (UDT), table and index:

  1. CREATE TYPE IF NOT EXISTS cycling.race (
  2. race_title text,
  3. race_date timestamp,
  4. race_time text
  5. );
  6. CREATE TABLE IF NOT EXISTS cycling.cyclist_races (
  7. id UUID PRIMARY KEY,
  8. lastname text,
  9. firstname text,
  10. races list<FROZEN <race>>
  11. );
  12. CREATE CUSTOM INDEX races_idx
  13. ON cycling.cyclist_races (races)
  14. USING 'StorageAttachedIndex';

An index is created on the list of UDTs column races in the cycling.cyclist_races table.

Query with CONTAINS from the list races column:

  • CQL

  • Result

  1. SELECT * FROM cycling.cyclist_races
  2. WHERE races CONTAINS {
  3. race_title:'Rabobank 7-Dorpenomloop Aalburg',
  4. race_date:'2015-05-09',
  5. race_time:'02:58:33'};
  1. id | firstname | lastname | races
  2. --------------------------------------+-----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | [{race_title: 'Rabobank 7-Dorpenomloop Aalburg', race_date: '2015-05-09 00:00:00.000000+0000', race_time: '02:58:33'}, {race_title: 'Ronde van Gelderland', race_date: '2015-04-19 00:00:00.000000+0000', race_time: '03:22:23'}]
  4. (1 rows)

SAI indexing with collections

SAI supports collections of type map, list, and set. Collections allow you to group and store data together in a column.

In a relational database, a grouping such as a user’s multiple email addresses is achieved via many-to-one joined relationship between (for example) a user table and an email table. Apache Cassandra avoids joins between two tables by storing the user’s email addresses in a collection column in the user table. Each collection specifies the data type of the data held.

A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns.

In CQL queries of database tables with SAI indexes, the CONTAINS clauses are supported with, and specific to:

  • SAI collection maps with keys, values, and entries

  • SAI collections with list and set types

Using the set type

This example uses the following table and index:

  1. CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
  2. id UUID PRIMARY KEY,
  3. lastname text,
  4. teams set<text>
  5. );
  6. CREATE CUSTOM INDEX teams_idx
  7. ON cycling.cyclist_career_teams (teams)
  8. USING 'StorageAttachedIndex';

An index is created on the set column teams in the cyclist_career_teams table.

Query with CONTAINS from the set teams column:

  • CQL

  • Result

  1. SELECT * FROM cycling.cyclist_career_teams
  2. WHERE teams CONTAINS 'Rabobank-Liv Giant';
  1. id | lastname | teams
  2. --------------------------------------+----------+------------------------------------------------------------------------------------------------------
  3. 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}

Using the list type

This example uses the following table and index:

  1. CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
  2. year int,
  3. month int,
  4. events list<text>,
  5. PRIMARY KEY (year, month)
  6. );
  7. CREATE CUSTOM INDEX events_idx
  8. ON cycling.upcoming_calendar (events)
  9. USING 'StorageAttachedIndex';

An index is created on the list column events in the upcoming_calendar table.

Query with CONTAINS from the list events column:

  • CQL

  • Result

  1. SELECT * FROM cycling.upcoming_calendar
  2. WHERE events CONTAINS 'Criterium du Dauphine';
  1. year | month | events
  2. ------+-------+-----------------------------------------------
  3. 2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']

A slightly more complex query selects rows that either contain a particular event or have a particular month date:

  • CQL

  • Result

  1. SELECT * FROM cycling.upcoming_calendar
  2. WHERE events CONTAINS 'Criterium du Dauphine'
  3. OR month = 7;
  1. year | month | events
  2. ------+-------+-----------------------------------------------
  3. 2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']
  4. 2015 | 7 | ['Tour de France']

Using the map type

This example uses the following table and indexes:

  1. CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
  2. id uuid PRIMARY KEY,
  3. firstname text,
  4. lastname text,
  5. teams map<int, text>
  6. );
  7. CREATE INDEX IF NOT EXISTS team_year_keys_idx
  8. ON cycling.cyclist_teams ( KEYS (teams) );
  9. CREATE INDEX IF NOT EXISTS team_year_entries_idx
  10. ON cycling.cyclist_teams ( ENTRIES (teams) );
  11. CREATE INDEX IF NOT EXISTS team_year_values_idx
  12. ON cycling.cyclist_teams ( VALUES (teams) );

Indexes created on the map column teams in the cyclist_career_teams table target the keys, values, and full entries of the column data.

Query with KEYS from the map teams column:

  • CQL

  • Result

  1. SELECT * FROM cyclist_teams WHERE teams CONTAINS KEY 2014;
  1. id | firstname | lastname | teams
  2. --------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
  4. 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2014: 'Rabobank-Liv Woman Cycling Team', 2015: 'Rabobank-Liv Woman Cycling Team'}

Query a value from the map teams column, noting that only the keyword CONTAINS is included:

  • CQL

  • Result

  1. SELECT * FROM cyclist_teams WHERE teams CONTAINS 'Team Garmin - Cervelo';
  1. id | firstname | lastname | teams
  2. --------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

Query entries from the map teams column, noting the difference in the WHERE clause:

  • CQL

  • Result

  1. SELECT * FROM cyclist_teams
  2. WHERE
  3. teams[2014] = 'Boels:Dolmans Cycling Team'
  4. AND teams[2015] = 'Boels:Dolmans Cycling Team';
  1. id | firstname | lastname | teams
  2. --------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

This example looks for a row where two entries are present in the map teams column.

For more information, see: