Partitioned Tables

Table of Contents

Introduction

A partitioned table is a virtual table that can be created by naming one or more columns by which it is split into separate internal tables, called partitions.

When a record with a new distinct combination of values for the configured PARTITIONED BY columns is inserted, a new partition is created and the document will be inserted into this partition.

You will end up with separate partitions under the hood that can be queried like a single table.

If you are usually interested in separate partitions of your data only, as might be the case for e.g. analyzing time based log data, you can query them much much faster because you don’t have to iterate over all rows of all partitions.

Deletion is faster too if you delete whole partitions at once, as a whole table can be deleted and no expensive query is involved.

Note

Keep in mind that the values of the columns used for partitioning are internally base32 encoded into the partition name (which is a separate table). So for every partition, the partition table name includes: (optional)schema_name + table_name + base32 encoded partition column value(s) + an internal overhead of 14 bytes

Altogether this must not exceed the 255 bytes length limitation. See Naming Restrictions.

Caution

Every table partition is clustered into as many shards as you configure for the table. Because of this, a good partition configuration depends on good shard allocation.

Well tuned shard allocation is vital. Read the Sharding Guide to make sure you’re getting the best performance out ot CrateDB.

Creation

It can be created using the CREATE TABLE statement using the PARTITIONED BY:

  1. cr> CREATE TABLE parted_table (
  2. ... id long,
  3. ... title string,
  4. ... content string,
  5. ... width double,
  6. ... day timestamp
  7. ... ) CLUSTERED BY (title) INTO 4 SHARDS PARTITIONED BY (day);
  8. CREATE OK, 1 row affected (... sec)

This creates an empty partitioned table which is not yet backed by real partitions. Nonetheless does it behave like a normal table.

When the value to partition by references one or more Base Columns, their values must be supplied upon INSERT or COPY FROM. Often these values are computed on client side. If this is not possible, a generated column can be used to create a suitable partition value from the given values on database-side:

  1. cr> CREATE TABLE computed_parted_table (
  2. ... id long,
  3. ... data double,
  4. ... created_at timestamp,
  5. ... month timestamp GENERATED ALWAYS AS date_trunc('month', created_at)
  6. ... ) PARTITIONED BY (month);
  7. CREATE OK, 1 row affected (... sec)

Information Schema

This table shows up in the information_schema.tables table, recognizable as partitioned table by a non null partitioned_by column (aliased as p_b here):

  1. cr> SELECT table_schema as schema,
  2. ... table_name,
  3. ... number_of_shards as num_shards,
  4. ... number_of_replicas as num_reps,
  5. ... clustered_by as c_b,
  6. ... partitioned_by as p_b,
  7. ... blobs_path
  8. ... FROM information_schema.tables
  9. ... WHERE table_name='parted_table';
  10. +--------+--------------+------------+----------+-------+---------+------------+
  11. | schema | table_name | num_shards | num_reps | c_b | p_b | blobs_path |
  12. +--------+--------------+------------+----------+-------+---------+------------+
  13. | doc | parted_table | 4 | 0-1 | title | ["day"] | NULL |
  14. +--------+--------------+------------+----------+-------+---------+------------+
  15. SELECT 1 row in set (... sec)
  1. cr> SELECT table_schema as schema, table_name, column_name, data_type
  2. ... FROM information_schema.columns
  3. ... WHERE table_schema = 'doc' AND table_name = 'parted_table'
  4. ... ORDER BY table_schema, table_name, column_name;
  5. +--------+--------------+-------------+-----------+
  6. | schema | table_name | column_name | data_type |
  7. +--------+--------------+-------------+-----------+
  8. | doc | parted_table | content | string |
  9. | doc | parted_table | day | timestamp |
  10. | doc | parted_table | id | long |
  11. | doc | parted_table | title | string |
  12. | doc | parted_table | width | double |
  13. +--------+--------------+-------------+-----------+
  14. SELECT 5 rows in set (... sec)

And so on.

You can get information about the partitions of a partitioned table by querying the information_schema.table_partitions table:

  1. cr> SELECT count(*) as partition_count
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table';
  4. +-----------------+
  5. | partition_count |
  6. +-----------------+
  7. | 0 |
  8. +-----------------+
  9. SELECT 1 row in set (... sec)

As this table is still empty, no partitions have been created.

Insert

  1. cr> INSERT INTO parted_table (id, title, width, day)
  2. ... VALUES (1, 'Don''t Panic', 19.5, '2014-04-08');
  3. INSERT OK, 1 row affected (... sec)
  1. cr> SELECT partition_ident, "values", number_of_shards
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
  4. ... ORDER BY partition_ident;
  5. +--------------------------+------------------------+------------------+
  6. | partition_ident | values | number_of_shards |
  7. +--------------------------+------------------------+------------------+
  8. | 04732cpp6osj2d9i60o30c1g | {"day": 1396915200000} | 4 |
  9. +--------------------------+------------------------+------------------+
  10. SELECT 1 row in set (... sec)

On subsequent inserts with the same PARTITIONED BY column values, no additional partition is created:

  1. cr> INSERT INTO parted_table (id, title, width, day)
  2. ... VALUES (2, 'Time is an illusion, lunchtime doubly so', 0.7, '2014-04-08');
  3. INSERT OK, 1 row affected (... sec)
  1. cr> REFRESH TABLE parted_table;
  2. REFRESH OK, 1 row affected (... sec)
  1. cr> SELECT partition_ident, "values", number_of_shards
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
  4. ... ORDER BY partition_ident;
  5. +--------------------------+------------------------+------------------+
  6. | partition_ident | values | number_of_shards |
  7. +--------------------------+------------------------+------------------+
  8. | 04732cpp6osj2d9i60o30c1g | {"day": 1396915200000} | 4 |
  9. +--------------------------+------------------------+------------------+
  10. SELECT 1 row in set (... sec)

Update

Updating partitioned tables has one big limitation. PARTITIONED BY columns cannot be changed, because this would involve moving all affected documents which is no atomic operation and could lead to inconsistent state:

  1. cr> UPDATE parted_table set content = 'now panic!', day = '2014-04-07'
  2. ... WHERE id = 1;
  3. SQLActionException[ColumnValidationException: Validation failed for day: Updating a partitioned-by column is not supported]

When using a generated column as PARTITIONED BY column all the columns referenced in its generation expression cannot be updated as well:

  1. cr> UPDATE computed_parted_table set created_at='1970-01-01'
  2. ... WHERE id = 1;
  3. SQLActionException[ColumnValidationException: Validation failed for created_at: Updating a column which is referenced in a partitioned by generated column expression is not supported]
  1. cr> UPDATE parted_table set content = 'now panic!'
  2. ... WHERE id = 2;
  3. UPDATE OK, 1 row affected (... sec)
  1. cr> REFRESH TABLE parted_table;
  2. REFRESH OK, 1 row affected (... sec)
  1. cr> SELECT * from parted_table WHERE id = 2;
  2. +------------+---------------+----+----------------------------------...-+-------+
  3. | content | day | id | title | width |
  4. +------------+---------------+----+----------------------------------...-+-------+
  5. | now panic! | 1396915200000 | 2 | Time is an illusion, lunchtime do... | 0.7 |
  6. +------------+---------------+----+----------------------------------...-+-------+
  7. SELECT 1 row in set (... sec)

Delete

Deleting with a WHERE clause matching all rows of a partition will drop the whole partition instead of deleting every matching document, which is a lot faster:

  1. cr> delete from parted_table where day = 1396915200000;
  2. DELETE OK, -1 rows affected (... sec)
  1. cr> SELECT count(*) as partition_count
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table';
  4. +-----------------+
  5. | partition_count |
  6. +-----------------+
  7. | 0 |
  8. +-----------------+
  9. SELECT 1 row in set (... sec)

Querying

UPDATE, DELETE and SELECT queries are all optimized to only affect as few partitions as possible based on the partitions referenced in the WHERE clause.

The WHERE clause is analyzed for referenced partitions by checking conditions on columns used in the PARTITIONED BY clause. For example the following query will only operate on the partition for day=1396915200000:

  1. cr> SELECT count(*) FROM parted_table
  2. ... WHERE day='1970-01-01'
  3. ... ORDER by 1;
  4. +----------+
  5. | count(*) |
  6. +----------+
  7. | 2 |
  8. +----------+
  9. SELECT 1 row in set (... sec)

Any combination of conditions that can be evaluated to a partition before actually executing the query is supported:

  1. cr> SELECT id, title FROM parted_table
  2. ... WHERE date_trunc('year', day) > '1970-01-01'
  3. ... OR extract(day_of_week from day) = 1
  4. ... ORDER BY id DESC;
  5. +----+--------------------+
  6. | id | title |
  7. +----+--------------------+
  8. | 4 | Spice Pork And haM |
  9. | 1 | The incredible foo |
  10. +----+--------------------+
  11. SELECT 2 rows in set (... sec)

Internally the WHERE clause is evaluated against the existing partitions and their partition values. These partitions are then filtered to obtain the list of partitions that need to be accessed.

Generated Columns in PARTITIONED BY

Querying on tables partitioned by generated columns is also optimized to infer a minimum list of partitions from the PARTITIONED BY columns referenced in the WHERE clause:

  1. cr> SELECT id, date_format('%Y-%m', month) as m FROM computed_parted_table
  2. ... WHERE created_at = '2015-11-16T13:27:00.000Z'
  3. ... ORDER BY id;
  4. +----+---------+
  5. | id | m |
  6. +----+---------+
  7. | 1 | 2015-11 |
  8. +----+---------+
  9. SELECT 1 row in set (... sec)

Alter

Parameters of partitioned tables can be changed as usual (see Altering Tables for more information on how to alter regular tables) with the ALTER TABLE statement. Common ALTER TABLE parameters affect both existing partitions and partitions that will be created in the future.

  1. cr> ALTER TABLE parted_table SET (number_of_replicas = '0-all')
  2. ALTER OK, -1 rows affected (... sec)

Altering schema information (such as the column policy or adding columns) can only be done on the table (not on single partitions) and will take effect on both existing and new partitions of the table.

  1. cr> ALTER TABLE parted_table ADD COLUMN new_col string
  2. ALTER OK, -1 rows affected (... sec)

Changing the Number of Shards

It is possible at any time to change the number of shards of a partitioned table.

  1. cr> ALTER TABLE parted_table SET (number_of_shards = 10)
  2. ALTER OK, -1 rows affected (... sec)

Note

This will not change the number of shards of existing partitions, but the new number of shards will be taken into account when new partitions are created.

  1. cr> INSERT INTO parted_table (id, title, width, day)
  2. ... VALUES (2, 'All Good', 3.1415, '2014-04-08');
  3. INSERT OK, 1 row affected (... sec)
  1. cr> SELECT count(*) as num_shards, sum(num_docs) as num_docs
  2. ... FROM sys.shards
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table';
  4. +------------+----------+
  5. | num_shards | num_docs |
  6. +------------+----------+
  7. | 10 | 1 |
  8. +------------+----------+
  9. SELECT 1 row in set (... sec)
  1. cr> SELECT partition_ident, "values", number_of_shards
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
  4. ... ORDER BY partition_ident;
  5. +--------------------------+------------------------+------------------+
  6. | partition_ident | values | number_of_shards |
  7. +--------------------------+------------------------+------------------+
  8. | 04732cpp6osj2d9i60o30c1g | {"day": 1396915200000} | 10 |
  9. +--------------------------+------------------------+------------------+
  10. SELECT 1 row in set (... sec)

Altering a Single Partition

We also provide the option to change the number of shards that are already allocated for an existing partition. This option operates on a partition basis, thus a specific partition needs to be specified

  1. cr> ALTER TABLE parted_table PARTITION (day=1396915200000) SET ("blocks.write" = true)
  2. ALTER OK, -1 rows affected (... sec)
  3. cr> ALTER TABLE parted_table PARTITION (day=1396915200000) SET (number_of_shards = 5)
  4. ALTER OK, 0 rows affected (... sec)
  5. cr> ALTER TABLE parted_table PARTITION (day=1396915200000) SET ("blocks.write" = false)
  6. ALTER OK, -1 rows affected (... sec)
  1. cr> SELECT partition_ident, "values", number_of_shards
  2. ... FROM information_schema.table_partitions
  3. ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
  4. ... ORDER BY partition_ident;
  5. +--------------------------+------------------------+------------------+
  6. | partition_ident | values | number_of_shards |
  7. +--------------------------+------------------------+------------------+
  8. | 04732cpp6osj2d9i60o30c1g | {"day": 1396915200000} | 5 |
  9. +--------------------------+------------------------+------------------+
  10. SELECT 1 row in set (... sec)

Note

The same prerequisites and restrictions as with normal tables apply. See Changing the Number of Shards.

Alter Partitions

It is also possible to alter parameters of single partitions of a partitioned table. However, unlike with partitioned tables, it is not possible to alter the schema information of single partitions.

To change table parameters such as number_of_replicas or other table settings use the Clauses.

  1. cr> ALTER TABLE parted_table PARTITION (day=1396915200000) RESET (number_of_replicas)
  2. ALTER OK, -1 rows affected (... sec)

Alter Table ONLY

Sometimes one wants to alter a partitioned table, but the changes should only affect new partitions and not existing ones. This can be done by using the ONLY keyword.

  1. cr> ALTER TABLE ONLY parted_table SET (number_of_replicas = 1);
  2. ALTER OK, -1 rows affected (... sec)

Closing and Opening a Partition

A single partition within a partitioned table can be opened and closed in the same way a normal table can.

  1. cr> ALTER TABLE parted_table PARTITION (day=1396915200000) CLOSE;
  2. ALTER OK, -1 rows affected (... sec)

This will all operations beside ALTER TABLE ... OPEN to fail on this partition. The partition will also not be included in any query on the partitioned table.

Limitations

  • PARTITIONED BY columns cannot be updated
  • WHERE clauses cannot contain queries like partitioned_by_column='x' OR normal_column=x

Consistency Notes Related to Concurrent DML Statement

If a partition is deleted during an active insert or update bulk operation this partition won’t be re-created.

The number of affected rows will always reflect the real number of inserted/updated documents.