Version 3.0.0
Released on 2018/05/16.
Note
If you are upgrading a cluster, you must be running CrateDB 2.0.4 or higher before you upgrade to 3.0.0.
We recommend that you upgrade to the latest 2.3 release before moving to 3.0.0.
You cannot perform a rolling upgrade to this version. Any upgrade to this version will require a full restart upgrade.
When restarting, CrateDB will migrate indexes to a newer format. Depending on the amount of data, this may delay node start-up time.
Please consult the Upgrade Notes before upgrading.
Warning
Tables that were created prior to upgrading to CrateDB 2.x will not function with 3.0 and must be recreated before moving to 3.0.x.
You can recreate tables using COPY TO
and COPY FROM
while running a 2.x release into a new table, or by inserting the data into a new table.
Before upgrading, you should back up your data.
Table of contents
Changelog
Breaking Changes
Dropped support for tables that have been created with CrateDB prior to version 2.0. Tables which require upgrading are indicated in the cluster checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release. The upgrade of tables needs to happen before updating CrateDB to this version. This can be done by exporting the data with
COPY TO
and importing it into a new table withCOPY FROM
. Alternatively you can useINSERT
with query.Data paths as defined in
path.data
must not contain the cluster name as a folder. Data paths which are not compatible with this version are indicated in the node checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release.The
region
setting forCREATE REPOSITORY
has been removed. It is automatically inferred but can still be manually specified by using theendpoint
setting.Store level throttling settings
indices.store.throttle.*
have been removed.The gateway recovery table setting
recovery.initial_shards
has been removed. Nodes will recover their unassigned local primary shards immediatelly after restart.The discovery setting
discovery.type
has been removed. To enable EC2 discovery, thediscovery.zen.hosts_provider
setting must be set toec2
.Dropped support for reading AWS credentials used for S3 and EC2 discovery from environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
as well as Java system propertiesaws.accessKeyId
andaws.secretKey
.EC2
cloud.aws.*
settings have been renamed todiscovery.ec2.*
.The setting that controls system call filters
bootstrap.seccomp
has been has been renamed tobootstrap.system_call_filter
.The columns
number_of_shards
,number_of_replicas
, andself_referencing_column_name
ininformation_schema.tables
changed to returnNULL
for non-sharded tables.Adapted queries in the Admin UI to be compatible with CrateDB 3.0 and greater.
For HTTP authentication, support was dropped for the
X-User
header, used to provide a username, which has been deprecated in2.3.0.
in favour of the standard HTTPAuthorization
header.The
error_trace
GET parameter of the HTTP endpoint only allowstrue
andfalse
in lower case. Other values are not allowed any more and will result in a parsing exception.The
_node
column onsys.shards
andsys.operations
has been renamed tonode
, is now visible by default and has been trimmed to only includenode['id']
andnode['name']
. In order to get all information a join query can be used withsys.nodes
.
Changes
CrateDB is now based on Elasticsearch 6.1.4 and Lucene 7.1.0.
Multiple Admin UI improvements.
Added a new tab for views in the Admin UI which lists available views and their properties.
Updated the bundled CrateDB Shell (
crash
) to version0.24.0
which adds support for default schema for connections.Added support in the PostgreSQL Wire Protocol’s SimpleQuery mode to process a query string which contains multiple queries delimited by semicolons.
Added support for
DEALLOCATE
statement which is used by certain PostgreSQL Wire Protocol clients (e.g. libpq) to deallocate a prepared statement and release its resources.Added support for ordering on analysed columns and partition columns.
Added support for views which can be created using the new
CREATE VIEW
statement and dropped using theDROP VIEW
statement. Views are listed ininformation_schema.views
and they show up ininformation_schema.tables
as well asinformation_schema.columns
.Enterprise: Added the VIEW privilege class which can be used to grant/deny access to views.
Added support for
INSERT INTO ... ON CONFLICT DO NOTHING
. The statement ignores insert values which would cause duplicate keys.Added support for
ON CONFLICT
clause in insert statements.INSERT INTO ... ON CONFLICT (pk_col) DO UPDATE SET col = val
is identical toINSERT INTO ... ON DUPLICATE KEY UPDATE col = val
. The specialEXCLUDED
table can be used to refer to the insert values:INSERT INTO ... ON CONFLICT (pk_col) DO UPDATE SET col = EXCLUDED.col
DEPRECATED: The
ON DUPLICATE KEY UPDATE
clause has been deprecated in favor of theON CONFLICT DO UPDATE SET
clause.Implemented the Block Hash Join algorithm which is now used for Equi-Joins.
Added new
sys.health
system information table to expose the health of all tables and table partitions.Added new
cluster.routing.allocation.disk.watermark.flood_stage
setting, that controls at which disk usage indices should become read-only to prevent running out of disk space. There is also a new node check that indicates whether the threshold is exceeded.Added a new
bengali
language analyzer and abengali_normalization
token filter.Add
max_token_length
parameter to whitespace tokenizer.Added new tokenizers
simple_pattern
andsimple_pattern_split
which allow to tokenize text for the fulltext index by a regular expression pattern.Added support for CSV file inputs in
COPY FROM
statements. Input type is inferred using the file’s extension or can be set using the optionalWITH
clause and specifying theformat
.Fully qualified column names including a schema name will no longer match on table aliases.
The default user if enterprise is disabled changed from
null
tocrate
. This causes entries insys.jobs
to show up withcrate
as username. Functions likeuser
will also returncrate
if enterprise is enabled but the user module is not available.Display the node information (name and id) of jobs in the
sys.jobs
table.Changed the primary key constraints of the information schema tables
table_constraints
,referential_constraints
,table_partitions
,key_column_usage
,columns
, andtables
to be SQL compliant.Arrays can now contain mixed types if they’re safely convertible. JSON libraries tend to encode values like
[0.0, 1.2]
as[0, 1.2]
, this caused an error because of the strict type match we enforced before.Implemented
constraint_schema
andtable_schema
ininformation_schema.key_column_usage
correctly and documented the full table schema.Statistics for jobs and operations are enabled by default. If you don’t need any statistics, please set
stats.enabled
tofalse
.Changed
BEGIN
andSET SESSION
to no longer requireDQL
permissions on theCLUSTER
level.Added
epoch
argument to theEXTRACT
function which returns the number of seconds since Jan 1, 1970. For example:extract(epoch from '1970-01-01T00:00:01')
returns1.0
seconds.Enable logging of JVM garbage collection times that help to debug memory pressure and garbage collection issues. GC log files are stored separately to the standard CrateDB logs and the files are log-rotated.
CrateDB will now by default create a heap dump in case of a crash caused by an out of memory error. This makes it necessary to account for the additional disk space requirements.
Implemented a
Ready
node status JMX metric expressing if the node is ready for processing SQL statements.Implemented a
NodeInfo
JMX MBean to expose useful information (id, name) about the node.Fixed path of logfile name in rotation pattern in
log4j2.properties
. It now writes into the correct logging directory instead of the parent directory.ALTER TABLE <name> OPEN
will now wait for all shards to become active before returning to be consistent with the behaviour of other statements.Added note about the newly available
JMX HTTP Exporter
to the monitoring documentation section.The first argument (
field
) of theEXTRACT
function has been limited to string literals and identifiers, as it was documented.
Upgrade Notes
Configuration Changes
There are a few configuration changes that you should be aware of before restarting the nodes.
Removed Settings
All store level throttle settings (under
indices.store.throttle.*
) have been removed, and should be removed from your node configuration.Similarly, the
recovery.initial_shards
configuration option has been removed, and should also be removed from your configuration.
Renamed Settings
The
discovery.type
setting which was previously used to specify whether a cluster should use DNS discovery or the EC2 API, has been removed. Configuring the use of the EC2 API has now been moved to thediscovery.zen.hosts_provider
setting.The
bootstrap.seccomp
setting, which controls system call filters, has been renamed tobootstrap.system_call_filter
.
Altered Settings
The
path.data
setting specifies the path or paths where the CrateDB node should store its table data and cluster metadata.In CrateDB 3.0.0 and later, this path must not contain the cluster name as a directory. For example, if you have set
cluster.name: abcdef
, the settingpath.data: /mnt/abcdef/data
would be incompatible. Moving or renaming the directory, such as to/mnt/data
, and altering yourpath.data
setting accordingly will allow you to continue using the node’s data.Data paths that are incompatible with 3.0.0 will be indicated visually in the admin UI if you are running the latest 2.2.x or 2.3.x release.
Other Changes
The
CREATE REPOSITORY
statement for creating backup repositories has been changed.Previously, when using Amazon S3 for backup storage, bucket regions had to be configured explicitly. Bucket regions are now inferred automatically.
If you want to override this, you can use the endpoint parameter.
Previously, the
X-User
HTTP header could be used to provide a username. This head is now deprecated in favour of the standard HTTP Authorization header.The
_node
column in thesys.shards
andsys.operations
tables has been renamed tonode
.Additionally,
node
object now only includesid
andname
of the node, i.e.node['id']
andnode['name']
.To get the full node information, use
node['id']
to join thesys.nodes
table.