Secondary Indexes Beta
AttentionThis page documents an earlier version. Go to the latest (v2.1)version.
A database index is a data structure that improves the speed of data retrieval operations on a database table. Secondary indexes require additional writes and storage space to maintain the index data structure. They can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
YugabyteDB provides consistent (ACID), performant secondary indexes. They are built on top of distributed ACID transactions. You can read more about transactions in our architecture docs.
If you haven’t installed YugabyteDB yet, do so first by following the Quick Start guide.
NOTE: Secondary indexes are a work in progress. Here are some requirements to keep in mind currently when using secondary indexes in Yugabyte:
- To create a secondary index on a table, the primary table needs to be created with distributed transaction enabled using the
with transactions = { 'enabled' : true }
clause. - The secondary index needs to be created before any data is inserted into the primary table.
- A secondary index will be used to execute a query only when the index covers all columns selected by the query.
These requirements may be removed in the future.
1. Setup - create universe
If you have a previously running local universe, destroy it using the following.
$ ./bin/yb-ctl destroy
Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.
$ ./bin/yb-ctl create
2. Create a table with secondary indexes
Connect to the cluster using cqlsh
.
$ ./bin/cqlsh
Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>
Create a keyspace.
cqlsh> CREATE KEYSPACE store;
Create a table with the transactions
property set to enabled.
cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };
Now create a secondary index on the order_date
column. Note that we include the amount
column in the secondary index in order to respond to queries selecting the amount
column directly from the secondary index table with just one read.
cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);
3. Insert sample data
Let us seed this table with some sample data. Paste the following into the cqlsh prompt.
INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);
4. Perform some queries
- Get the total amount for a customer
Let us write a query to fetch the sum total of the order amount
column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id
, and therefore does not use the secondary index.
cqlsh> select sum(amount) from store.orders where customer_id = 1;
sum(amount)——————- 120.55(1 rows)
- Get the total amount for a specific date
Now, let us write a query to fetch the sum total of order amount
across all orders for a specific date. Because we have a secondary index on the order_date
column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date
and avoid a full-table scan of the primary table.
cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
sum(amount)——————- 150.75(1 rows)
5. Clean up (optional)
Optionally, you can shutdown the local cluster created in Step 1.
$ ./bin/yb-ctl destroy
1. Setup - create universe
If you have a previously running local universe, destroy it using the following.
$ ./bin/yb-ctl destroy
Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.
$ ./bin/yb-ctl create
2. Create a table with secondary indexes
Connect to the cluster using cqlsh
.
$ ./bin/cqlsh
Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>
Create a keyspace.
cqlsh> CREATE KEYSPACE store;
Create a table with the transactions
property set to enabled.
cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };
Now create a secondary index on the order_date
column. Note that we include the amount
column in the secondary index in order to respond to queries selecting the amount
column directly from the secondary index table with just one read.
cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);
3. Insert sample data
Let us seed this table with some sample data. Paste the following into the cqlsh prompt.
INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);
4. Perform some queries
- Get the total amount for a customer
Let us write a query to fetch the sum total of the order amount
column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id
, and therefore does not use the secondary index.
cqlsh> select sum(amount) from store.orders where customer_id = 1;
sum(amount)——————- 120.55(1 rows)
- Get the total amount for a specific date
Now, let us write a query to fetch the sum total of order amount
across all orders for a specific date. Because we have a secondary index on the order_date
column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date
and avoid a full-table scan of the primary table.
cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
sum(amount)——————- 150.75(1 rows)
5. Clean up (optional)
Optionally, you can shutdown the local cluster created in Step 1.
$ ./bin/yb-ctl destroy
1. Setup - create universe
If you have a previously running local universe, destroy it using the following.
$ ./yb-docker-ctl destroy
Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.
$ ./yb-docker-ctl create
2. Create a table with secondary indexes
Connect to cqlsh on node 1.
$ docker exec -it yb-tserver-n1 /home/yugabyte/bin/cqlsh
Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>
Create a keyspace.
cqlsh> CREATE KEYSPACE store;
Create a table with the transactions
property set to enabled.
cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };
Now create a secondary index on the order_date
column. Note that we include the amount
column in the secondary index in order to respond to queries selecting the amount
column directly from the secondary index table with just one read.
cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);
3. Insert sample data
Let us seed this table with some sample data. Paste the following into the cqlsh prompt.
INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);
4. Perform some queries
- Get the total amount for a customer
Let us write a query to fetch the sum total of the order amount
column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id
, and therefore does not use the secondary index.
cqlsh> select sum(amount) from store.orders where customer_id = 1;
sum(amount)——————- 120.55(1 rows)
- Get the total amount for a specific date
Now, let us write a query to fetch the sum total of order amount
across all orders for a specific date. Because we have a secondary index on the order_date
column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date
and avoid a full-table scan of the primary table.
cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
sum(amount)——————- 150.75(1 rows)
5. Clean up (optional)
Optionally, you can shutdown the local cluster created in Step 1.
$ ./yb-docker-ctl destroy
1. Setup - create universe and table
If you have a previously running local universe, destroy it using the following.
$ kubectl delete -f yugabyte-statefulset.yaml
Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.
$ kubectl apply -f yugabyte-statefulset.yaml
Make sure there are 3 yb-tserver and 3 yb-master pods representing the 3 nodes of the cluster.
$ kubectl get pods
NAME READY STATUS RESTARTS AGEyb-master-0 1/1 Running 0 13syb-master-1 1/1 Running 0 13syb-master-2 1/1 Running 0 13syb-tserver-0 1/1 Running 1 12syb-tserver-1 1/1 Running 1 12syb-tserver-2 1/1 Running 1 12s
2. Create a table with secondary indexes
Connect to the cluster using cqlsh
.
Connect to cqlsh on node 1.
$ kubectl exec -it yb-tserver-0 /home/yugabyte/bin/cqlsh
Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>
Create a keyspace.
cqlsh> CREATE KEYSPACE store;
Create a table with the transactions
property set to enabled.
cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };
Now create a secondary index on the order_date
column. Note that we include the amount
column in the secondary index in order to respond to queries selecting the amount
column directly from the secondary index table with just one read.
cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);
3. Insert sample data
Let us seed this table with some sample data. Paste the following into the cqlsh prompt.
INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);
4. Perform some queries
- Get the total amount for a customer
Let us write a query to fetch the sum total of the order amount
column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id
, and therefore does not use the secondary index.
cqlsh> select sum(amount) from store.orders where customer_id = 1;
sum(amount)
120.55
(1 rows)
- Get the total amount for a specific date
Now, let us write a query to fetch the sum total of order amount
across all orders for a specific date. Because we have a secondary index on the order_date
column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date
and avoid a full-table scan of the primary table.
cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
sum(amount)
150.75
(1 rows)
5. Clean up (optional)
Optionally, you can shutdown the local cluster created in Step 1.
$ kubectl delete -f yugabyte-statefulset.yaml
</code></code>