Version 2.0.0
Released on 2017/05/16.
Warning
CrateDB 2.x versions prior 2.0.4 (including this version) contain a critical bug which leads to deletion of blob data upon node shutdown. It is recommended to not install those versions.
Table of Contents
Changelog
Breaking Changes
To accommodate user-defined functions, some new reserved keywords have been added to the CrateDB SQL dialect:
RETURNS
,CALLED
,REPLACE
,FUNCTION
,LANGUAGE
,INPUT
The Enterprise License setting is set to
true
by default. This enables the CrateDB Enterprise Edition.Enabling this setting requires a valid enterprise license for production use.
If you disable this setting, CrateDB will run with the standard feature set.
All custom
node.*
style attributes must now be written asnode.attr.*
to distinguish them from attributes that CrateDB uses internally. Consult the node attribute docs for information.The
node.client
setting has been removed.The default value of the
node.attr.max_local_storage_nodes
node setting has been changed to1
to prevent running multiple nodes on the same data path by default.Previous versions of CrateDB defaulted to allowing up to 50 nodes running on the same data path. This was confusing where users accidentally started multiple nodes and ended up thinking they have lost data because the second node will start with an empty directory.
Running multiple nodes on the same data path tends to be an exception, so this is a safer default.
Parsing support of time values has been changed:
- The unit
w
representing weeks is no longer supported.- Fractional time values (e.g.
0.5s
) are no longer supported. For example, this means when setting timeouts,0.5s
will be rejected and should instead be input as500ms
.The already unused
path.work
node setting has been removed.The node setting
bootstrap.mlockall
has been renamed tobootstrap.memory_lock
.The
keyword_repeat
andtype_as_payload
built-in token filter have been removed.The
classic
built-in analyzer has been removed.The shard balance related cluster settings
cluster.routing.allocation.balance.primary
andcluster.routing.allocation.balance.replica
have been removed.Some
recovery
related cluster settings have been removed or replaced:
- The
indices.recovery.concurrent_streams
cluster setting is now superseded bycluster.routing.allocation.node_concurrent_recoveries
.- The
indices.recovery.activity_timeout
cluster setting have been renamed toindices.recovery.recovery_activity_timeout
.- Following
recovery
cluster settings have been removed:
indices.recovery.file_chunk_size
indices.recovery.translog_ops
indices.recovery.translog_size
indices.recovery.compress
Logging is now configured by
log4j2.properties
instead oflogging.yml
.The plugin interface has changed, injecting classes on
shard
orindex
levels is no longer supported.It’s no longer possible to run CrateDB as the unix
root
user.Some translog related table settings have been removed or replaced:
- The
index.translog.interval
,translog.disable_flush
andtranslog.flush_threshold_period
table settings have been removed.- The
index.translog.sync_interval
table setting doesn’t accept a value less than100ms
which prevents fsyncing too often if async durability is enabled. The special value0
is no longer supported.- The
index.translog.flush_threshold_ops
table setting is not supported anymore. In order to control flushes based on the transaction log growth useindex.translog.flush_threshold_size
instead.The
COPY FROM
statement now requires column names to be quoted in the JSON file being imported.Queries on columns with
INDEX OFF
will now fail instead of always resulting in an empty result.Configuration support using system properties has been dropped.
It’s no longer possible to use
Hadoop 1.x
as arepository
for snapshots.Changed default bind and publish address from
0.0.0.0
to the systemloopback
addresses which will result in CrateDB listening only to local ports.The
discovery.ec2.ping_timeout
setting has been removed and thediscovery.zen.ping_timeout
setting is now also used for EC2 discovery.The
monitor.jvm.gc.[old|young].[debug|info|warn]
settings used to configure logging of garbage collection have been renamed (addingcollector
) tomonitor.jvm.gc.collector.[old|young].[debug|info|warn]
.Recovery timeout settings changes:
indices.recovery.retry_internal_action_timeout
has been renamed toindices.recovery.internal_action_timeout
indices.recovery.retry_internal_long_action_timeout
has been renamed toindices.recovery.internal_action_long_timeout
indices.recovery.retry_activity_timeout
has been renamed toindices.recovery.recovery_activity_timeout
Thread pool settings prefix have been changed from
threadpool
tothread_pool
. E.g.:thread_pool.<name>.type
.The
cluster name
is not part of the effective path where data is stored anymore.The blobs data directory layout has changed.
Changes
Extended the subselect support. See SELECT Reference for details.
Added support for host based authentication (HBA). Please see Host Based Authentication.
Added support for renaming tables using the
ALTER ... RENAME TO ...
statement.Added support for
CREATE USER
andDROP USER
.Added support for opening and closing a table or single partition.
Information on the state of tables/partitions is now exposed by a new column
closed
on theinformation_schema.tables
andinformation_schema.table_partitions
tables.Added full support for
DISTINCT
on queries whereGROUP BY
is present.UDC pings will send
licence.ident
if defined from now on.Added support for
GROUP BY
in combination with subselect. E.g.:
SELECT x, COUNT(*) FROM (SELECT x FROM t LIMIT 1) AS tt GROUP BY x;
Implemented hash sum scalar functions (MD5, SHA1). Please see sha1.
Various admin UI improvements.
Added support for
GROUP BY
on joins.Added support for user-defined functions.
Added JavaScript language for functions.
Added cluster check and warning for unlicensed usage of CrateDB Enterprise.
Added built-in
fingerprint
,keep_types
,min_hash
andserbian_normalization
token filter.Added a
fingerprint
built-in analyzer.Upgraded to Elasticsearch 5.0.2.
Improved performance of blob stats computation by calculating them in an incremental manner.
Optimized performance of negation queries on
NOT NULL
columns. E.g.:
SELECT * FROM t WHERE not_null_col != 10
Updated documentation to indicate that it’s not possible to use
object
,geo_point
,geo_shape
, orarray
in theORDER BY
clause.Removed
psql.enabled
andpsql.port
settings fromsys.cluster
because they where wrongly exposed in this table.Use the region of the EC2 instance for EC2 discovery when neither
cloud.aws.ec2.endpoint
norcloud.aws.region
are specified or do not resolve in a valid service endpoint.It is now possible to restore an empty partitioned table.
Added validation that
ORDER BY
symbols are included in theSELECT
list whenDISTINCT
is used.
Fixes
- Fixed an issue which could result in queries being stuck if the thread pools are exhausted.
- Fixed an issue which caused failing
sys.snapshot
queries if thedata.path
of an existing fs repository was not configured anymore.- Fixed that
sys.snapshot
queries hung instead of throwing an error if something went wrong.
Upgrade Notes
Daemon User
You can no longer run CrateDB as the superuser on Unix-like systems. You should create a new crate
user for running the CrateDB daemon.
Logging
The logging.yml
has been removed. You must migrate your Logging configuration to the new log4j2.properties
file.
System Properties
You can no longer use the JAVA_OPTIONS
or CRATE_JAVA_OPTS
environment variables to pass configuration to CrateDB itself, for example:
JAVA_OPTIONS=-Dcluster.name=crate
Or:
CRATE_JAVA_OPTS=-Dcluster.name=crate
Instead, you must pass these options in on the Running CrateDB.
You can continue to use the JAVA_OPTIONS
and CRATE_JAVA_OPTS
environment variables to set general JVM properties and CrateDB specific JVM properties, respectively.
Configuration Changes
Many configuration settings and files have been renamed or removed. You must review the Breaking Changes section above and update your setup as necessary.
SQL Changes
Several breaking changes were made to CrateDB’s SQL. This includes changes to time parsing, syntax changes, and new reserved keywords. You must review the Breaking Changes section above and update your client code as necessary.
Bind Address
The default bind address has been changed from 0.0.0.0
to the loopback address (meaning it will only be accessible on localhost
). See Hosts for more.
If you want to keep the original behaviour (i.e. bind to every available network interface) you must add the following line to your Configuration file:
network.host: 0.0.0.0
Note
If you bind to a network reachable IP address, you must follow the instructions in the new bootstrap checks guide.
Heap Size
If you have previously set or configured CRATE_MIN_MEM
or CRATE_MAX_MEM
in your startup scripts or environment, you must remove both, and replace them with a single variable CRATE_HEAP_SIZE
. The CRATE_HEAP_SIZE variable sets both the minimum and maximum memory to allocate, and should be set to whatever your previous CRATE_MAX_MEM
was set to.
Cluster name in path data
The computation of the effective data directory path has changed in a way that the cluster name is not part of the path anymore. In previous versions it was $PATH_DATA_DIR/$CLUSTER_NAME/nodes/
and now it is $PATH_DATA_DIR/nodes/
. There’s a fallback that still accepts the old data structure, which will be removed in future versions of CrateDB. It will be required that the data directory is either moved to the new location or the path.data
setting gets changed to point to the old location by appending the clustername to it (e.g /data/
becomes /data/yourclustername
). Therefore it’s not possible anymore for multiple clusters to share the exact same path.data
directory.
Boolean Data Type
Tables that have been created with CrateDB version 0.54.x
or smaller and that contain a column of type BOOLEAN
must be re-created to be able to perform all supported operations on that column.