Working with secondary indexing (2i)

Prerequisites

Create a secondary index (2i)

Create indexes on one or more columns after defining a table. Secondary indexes created with 2i can be used to query a table using a column other than the table’s partition key.

In a production environment, certain columns might not be good choices, depending on their cardinality.

Do not add an storage-attached index (SAI) to the same table. See the difference between these index types in the overview.

Create simple 2i

Create simple 2i indexes on a table to see how indexing works. Start by creating a table, cycling.alt_stats, that yields statistics about cyclists:

  1. CREATE TABLE IF NOT EXISTS cycling.cyclist_alt_stats (
  2. id UUID PRIMARY KEY,
  3. lastname text,
  4. birthday date,
  5. nationality text,
  6. weight float,
  7. w_units text,
  8. height float,
  9. first_race date,
  10. last_race date
  11. );

Now create indexes on the columns birthday and nationality:

  1. CREATE INDEX IF NOT EXISTS birthday_idx
  2. ON cycling.cyclist_alt_stats (birthday);
  3. CREATE INDEX IF NOT EXISTS nationality_idx
  4. ON cycling.cyclist_alt_stats (nationality);

The following query attempts to retrieve the cyclists with a specified birthday and nationality. The query returns an error:

  • CQL

  • Result

  1. SELECT *
  2. FROM cycling.cyclist_alt_stats
  3. WHERE birthday = '1982-01-29'
  4. AND nationality = 'Russia';
  1. InvalidRequest: Error from server: code=2200 [Invalid query]
  2. message="Cannot execute this query as it might involve data
  3. filtering and thus may have unpredictable performance.
  4. If you want to execute this query despite the performance
  5. unpredictability, use ALLOW FILTERING"

The indexes have been created on appropriate low-cardinality columns, but the previous query still fails. Why?

The answer lies with the partition key, which has not been defined. When you attempt a potentially expensive query, such as searching a range of rows, the database requires the ALLOW FILTERING directive. The error is not due to multiple indexes, but the lack of a partition key definition in the query.

  • CQL

  • Result

  1. SELECT *
  2. FROM cycling.cyclist_alt_stats
  3. WHERE birthday = '1982-01-29'
  4. AND nationality = 'Russia'
  5. ALLOW FILTERING;
  1. id | birthday | first_race | height | last_race | lastname | nationality | w_units | weight
  2. --------------------------------------+------------+------------+--------+------------+----------+-------------+---------+--------
  3. e0953617-07eb-4c82-8f91-3b2757981625 | 1982-01-29 | 1998-02-15 | 1.78 | 2017-04-16 | BRUTT | Russia | kg | 68
  4. (1 rows)

Thus, one of the difficulties of using 2is is illustrated. SAI is almost always a better option.

Create a 2i on a collection column

Collections can be indexed and queried to find a collection containing a particular value. Sets and lists are indexed a bit differently from maps, given the key-value nature of maps.

Sets and lists can index all values found by indexing the collection column. Maps can index a map key, map value, or map entry using the methods shown below. Multiple indexes can be created on the same map column in a table so that map keys, values, or entries can be queried. In addition, frozen collections can be indexed using FULL to index the full content of a frozen collection.

All the cautions about using secondary indexes apply to indexing collections.

  • For set and list collections, create an index on the column name. Create an index on a set to find all the cyclists that have been on a particular team.
  1. CREATE INDEX IF NOT EXISTS teams_idx
  2. ON cycling.cyclist_career_teams (teams);
  1. SELECT *
  2. FROM cycling.cyclist_career_teams
  3. WHERE teams CONTAINS 'Rabobank-Liv Giant';
  1. id | lastname | teams
  2. --------------------------------------+----------+-----------------------------------------------------------------------------
  3. -----------------------
  4. 1c9ebc13-1eab-4ad5-be87-dce433216d40 | BRAND | {'AA Drink - Leontien.nl', 'Leontien.nl', 'Rabobank-Liv Giant', 'Rabobank-Li
  5. v Woman Cycling Team'}
  6. (1 rows)
  • For map collections, create an index on the map key, map value, or map entry. Create an index on a map key to find all cyclist/team combinations for a particular year.
  1. CREATE INDEX IF NOT EXISTS team_year_keys_idx
  2. ON cycling.cyclist_teams ( KEYS (teams) );
  1. SELECT *
  2. FROM cycling.cyclist_teams
  3. WHERE teams CONTAINS KEY 2015;
  1. id | firstname | lastname | teams
  2. --------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
  4. 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2015: 'Rabobank-Liv Woman Cycling Team'}
  5. (2 rows)
  • Create an index on the map entries and find cyclists who are the same age. An index using ENTRIES is only valid for maps.
  1. CREATE TABLE IF NOT EXISTS cycling.birthday_list (
  2. cyclist_name text PRIMARY KEY,
  3. blist map<text, text>
  4. );
  1. CREATE INDEX IF NOT EXISTS blist_idx
  2. ON cycling.birthday_list ( ENTRIES(blist) );
  1. SELECT *
  2. FROM cycling.birthday_list
  3. WHERE blist[ 'age' ] = '23';
  1. cyclist_name | blist
  2. ------------------+----------------------------------------------------------
  3. Claudio HEINEN | {'age': '23', 'bday': '27/07/1992', 'nation': 'GERMANY'}
  4. Laurence BOURQUE | {'age': '23', 'bday': '27/07/1992', 'nation': 'CANADA'}
  5. (2 rows)
  • Using the same index, find cyclists from the same country.
  1. SELECT *
  2. FROM cycling.birthday_list
  3. WHERE blist[ 'nation' ] = 'NETHERLANDS';
  1. cyclist_name | blist
  2. ---------------+--------------------------------------------------------------
  3. Luc HAGENAARS | {'age': '28', 'bday': '27/07/1987', 'nation': 'NETHERLANDS'}
  4. Toine POELS | {'age': '52', 'bday': '27/07/1963', 'nation': 'NETHERLANDS'}
  5. (2 rows)
  • Create an index on the map values and find cyclists who have a particular value found in the specified map.
  1. CREATE TABLE IF NOT EXISTS cycling.birthday_list (
  2. cyclist_name text PRIMARY KEY,
  3. blist map<text, text>
  4. );
  1. CREATE INDEX IF NOT EXISTS blist_values_idx
  2. ON cycling.birthday_list ( VALUES(blist) );
  1. SELECT *
  2. FROM cycling.birthday_list
  3. WHERE blist CONTAINS 'NETHERLANDS';

+

  1. cyclist_name | blist
  2. ---------------+--------------------------------------------------------------
  3. Luc HAGENAARS | {'age': '28', 'bday': '27/07/1987', 'nation': 'NETHERLANDS'}
  4. Toine POELS | {'age': '52', 'bday': '27/07/1963', 'nation': 'NETHERLANDS'}
  5. (2 rows)
  • Create an index on the full content of a FROZEN map. The table in this example stores the number of Pro wins, Grand Tour races, and Classic races that a cyclist has competed in. The SELECT statement finds any cyclist who has 39 Pro race wins, 7 Grand Tour starts, and 14 Classic starts.
  1. CREATE TABLE IF NOT EXISTS cycling.race_starts (
  2. cyclist_name text PRIMARY KEY,
  3. rnumbers FROZEN<LIST<int>>
  4. );
  1. CREATE INDEX IF NOT EXISTS rnumbers_idx
  2. ON cycling.race_starts ( FULL(rnumbers) );
  1. SELECT *
  2. FROM cycling.race_starts
  3. WHERE rnumbers = [39, 7, 14];
  1. cyclist_name | rnumbers
  2. ----------------+-------------
  3. John DEGENKOLB | [39, 7, 14]
  4. (1 rows)

Check secondary index (2i) existence

Verify that an index exists:

  • CQL

  • Result

  1. DESCRIBE TABLE cycling.birthday_list;
  1. TBD

Alter a secondary index (2i)

Secondary indexes cannot be altered. If you wish to change a 2i, you need to drop the index and create a new one.

Drop a secondary index (2i)

Drop a 2i:

  1. DROP INDEX IF EXISTS cycling.teams_idx;

Querying using secondary indexes (2i)