Elasticsearch version 7.9.0

Elasticsearch version 7.9.0

Also see Breaking changes in 7.9.

Security updates

  • A field disclosure flaw was found in Elasticsearch when running a scrolling search with field level security. If a user runs the same query another more privileged user recently ran, the scrolling search can leak fields that should be hidden. This could result in an attacker gaining additional permissions against a restricted index. All versions of Elasticsearch before 7.9.0 and 6.8.12 are affected by this flaw. You must upgrade to Elasticsearch version 7.9.0 or 6.8.12 to obtain the fix. CVE-2020-7019

Known issues

  • Upgrading to 7.9.0 from an earlier version will result in incorrect mappings on the machine learning annotations index, and possibly also on the machine learning config index. This will lead to some pages in the machine learning UI not displaying correctly, and may prevent machine learning jobs being created or updated. The best way to avoid this problem if you read about this known issue before upgrading is to manually update the mappings on these indices in your old Elasticsearch version before upgrading to 7.9.0. If you find out about the issue after upgrading then reindexing is required to recover. Full details of the mitigations are in Upgrade to 7.9.0 causes incorrect mappings.
  • Lucene 8.6.0, on which Elasticsearch 7.9.0 is based, contains a memory leak. This memory leak manifests in Elasticsearch when a single document is updated repeatedly with a forced refresh. The cluster state storage layer in Elasticsearch is based on Lucene and does use single-document updates with forced refreshes, meaning that this memory leak manifests in Elasticsearch under normal conditions. It also manifests when user-controlled workloads update a single document in an index repeatedly with a forced refresh. In both cases, the memory leak is around 500 bytes per update, so it does take some time for the leak to show any meaningful impact on the system. Symptoms of this memory leak are the size of the used heap slowly rising over time, requests eventually being rejected by the real memory circuit breaker, and potentially out-of-memory errors. A workaround is to restart any nodes exhibiting these symptoms. We are actively working with the Lucene community to release a fix in Lucene 8.6.2 to deliver in Elasticsearch 7.9.1 that will address this memory leak.
  • SQL: If a WHERE clause contains at least two relational operators joined by AND, of which one is a comparison (<=, <, >=, >) and another one is an inequality (!=, <>), both against literals or foldable expressions, the inequality will be ignored. The workaround is to substitute the inequality with a NOT IN operator.

    We have fixed this issue in Elasticsearch 7.10.1 and later versions. For more details, see #65488.

  • Snapshot and restore: If an index is deleted while the cluster is concurrently taking more than one snapshot then there is a risk that one of the snapshots may never complete and also that some shard data may be lost from the repository, causing future restore operations to fail. To mitigate this problem, set snapshot.max_concurrent_operations: 1 to prevent concurrent snapshot operations:

    1. PUT _cluster/settings
    2. {
    3. "persistent" : {
    4. "snapshot.max_concurrent_operations" : 1
    5. }
    6. }

    This issue is fixed in Elasticsearch versions 7.13.1 and later. It is not possible to repair a repository once it is affected by this issue, so you must restore the repository from a backup, or clear the repository by executing DELETE _snapshot/<repository name>/*, or move to a fresh repository. For more details, see #73456.

  • Parsing a request when the last element in an array is filtered out (for instance using _source_includes) fails. This is due to a bug in Jackson parser. Fixed in Elasticsearch 8.6.1 (#91456)

  • The deprecated index.mapper.dynamic setting can break your cluster. It can only be set using the Update index settings API. Symptoms include nodes failing to start or shards failing to allocate. Do not use this setting in versions prior to 7.17.22. The bug is fixed in 7.17.22. (issue: #109160)

Breaking changes

Script Cache

  • Script cache size and rate limiting are per-context #55753 (issue: #50152)

Field capabilities API

  • Constant_keyword fields are now described by their family type keyword instead of constant_keyword #58483 (issue: #53175)

Snapshot restore throttling

  • Restoring from a snapshot (which is a particular form of recovery) is now properly taking recovery throttling into account (i.e. the indices.recovery.max_bytes_per_sec setting). The max_restore_bytes_per_sec setting is also now defaulting to unlimited, whereas previously it was set to 40mb, which is the default that’s used for indices.recovery.max_bytes_per_sec. This means that no behavioral change will be observed by clusters where the recovery and restore settings had not been adapted from the defaults. #58658

Thread pool write queue size

  • The WRITE thread pool default queue size (thread_pool.write.size) has been increased from 200 to 10000. A small queue size (200) caused issues when users wanted to send small indexing requests with a high client count. Additional memory-oriented back pressure has been introduced with the indexing_pressure.memory.limit setting. This setting configures a limit to the number of bytes allowed to be consumed by outstanding indexing requests. #59263

Dangling indices

  • Automatically importing dangling indices is now deprecated, disabled by default, and will be removed in Elasticsearch 8.0. See the migration notes. #58176 #58898 (issue: #48366)

Breaking Java changes

Aggregations

  • Improve cardinality measure used to build aggs #56533 (issue: #56487)

Features/Ingest

  • Add optional description parameter to ingest processors. #57906 (issue: #56000)

New features

Aggregations

  • Add moving percentiles pipeline aggregation #55441 (issue: #49452)
  • Add normalize pipeline aggregation #56399 (issue: #51005)
  • Add variable width histogram aggregation #42035 (issues: #9572, #50863)
  • Add pipeline inference aggregation #58193
  • Speed up time interval arounding around daylight savings time (DST) #56371 (issue: #55559)

Geo

  • Override doc_value parameter in Spatial XPack module #53286 (issue: #37206)

Machine Learning

  • Add update data frame analytics jobs API #58302 (issue: #45720)
  • Introduce model_plot_config.annotations_enabled setting for anomaly detection jobs #57539 (issue: #55781)
  • Report significant changes to anomaly detection models in annotations of the results #1247, #56342, #56417, #57144, #57278, #57539

Mapping

  • Merge mappings for composable index templates #58521 (issue: #53101)
  • Wildcard field optimised for wildcard queries #49993 (issue: #48852)

Search

  • Allow index filtering in field capabilities API #57276 (issue: #56195)

Enhancements

Aggregations

  • Add support for numeric range keys #56452 (issue: #56402)
  • Added standard deviation / variance sampling to extended stats #49782 (issue: #49554)
  • Give significance lookups their own home #57903
  • Increase search.max_buckets to 65,535 #57042 (issue: #51731)
  • Optimize date_histograms across daylight savings time #55559
  • Return clear error message if aggregation type is invalid #58255 (issue: #58146)
  • Save memory on numeric significant terms when not top #56789 (issue: #55873)
  • Save memory when auto_date_histogram is not on top #57304 (issue: #56487)
  • Save memory when date_histogram is not on top #56921 (issues: #55873, #56487)
  • Save memory when histogram agg is not on top #57277
  • Save memory when numeric terms agg is not top #55873
  • Save memory when parent and child are not on top #57892 (issue: #55873)
  • Save memory when rare_terms is not on top #57948 (issue: #55873)
  • Save memory when significant_text is not on top #58145 (issue: #55873)
  • Save memory when string terms are not on top #57758
  • Speed up reducing auto_date_histo with a time zone #57933 (issue: #56124)
  • Speed up rounding in auto_date_histogram #56384 (issue: #55559)

Allocation

  • Account for remaining recovery in disk allocator #58029

Analysis

  • Add max_token_length setting to the CharGroupTokenizer #56860 (issue: #56676)
  • Expose discard_compound_token option to kuromoji_tokenizer #57421
  • Support multiple tokens on LHS in stemmer_override rules (#56113) #56484 (issue: #56113)

Authentication

  • Add http proxy support for OIDC realm #57039 (issue: #53379)
  • Improve threadpool usage and error handling for API key validation #58090 (issue: #58088)
  • Support handling LogoutResponse from SAML idP #56316 (issues: #40901, #43264)

Authorization

  • Add cache for application privileges #55836 (issue: #54317)
  • Add monitor and view_index_metadata privileges to built-in kibana_system role #57755
  • Improve role cache efficiency for API key roles #58156 (issue: #53939)

CCR

  • Allow follower indices to override leader settings #58103

CRUD

  • Retry failed replication due to transient errors #55633

Engine

  • Don’t log on RetentionLeaseSync error handler after an index has been deleted #58098 (issue: #57864)

Features/Data streams

  • Add support for snapshot and restore to data streams #57675 (issues: #53100, #57127)
  • Data stream creation validation allows for prefixed indices #57750 (issue: #53100)
  • Disallow deletion of composable template if in use by data stream #57957 (issue: #57004)
  • Validate alias operations don’t target data streams #58327 (issue: #53100)

Features/ILM+SLM

  • Add data stream support to searchable snapshot action #57873 (issue: #53100)
  • Add data stream support to the shrink action #57616 (issue: #53100)
  • Add support for rolling over data streams #57295 (issues: #53100, #53488)
  • Check the managed index is not a data stream write index #58239 (issue: #53100)

Features/Indices APIs

  • Add default composable templates for new indexing strategy #57629 (issue: #56709)
  • Add index block api #58094
  • Add new flag to check whether alias exists on remove #58100
  • Add prefer_v2_templates parameter to reindex #56253 (issue: #53101)
  • Add template simulation API for simulating template composition #56842 (issues: #53101, #55686, #56255, #56390)

Features/Ingest

  • Add ignore_empty_value parameter in set ingest processor #57030 (issue: #54783)
  • Support if_seq_no and if_primary_term for ingest #55430 (issue: #41255)

Features/Java High Level REST Client

Features/Java Low Level REST Client

  • Add isRunning method to RestClient #57973 (issue: #42133)
  • Add RequestConfig support to RequestOptions #57972

Infra/Circuit Breakers

  • Enhance real memory circuit breaker with G1 GC #58674 (issue: #57202)

Infra/Core

  • Introduce node.roles setting #54998

Infra/Packaging

Infra/Plugins

  • Improved ExtensiblePlugin #58234

Infra/Resiliency

  • Adds resiliency to read-only filesystems #45286 #52680 (issue: #45286)

Machine Learning

  • Accounting for model size when models are not cached. #58670
  • Adds new for_export flag to GET _ml/inference API #57351
  • Adds WKT geometry detection in find_file_structure #57014 (issue: #56967)
  • Calculate cache misses for inference and return in stats #58252
  • Delete auto-generated annotations when job is deleted. #58169 (issue: #57976)
  • Delete auto-generated annotations when model snapshot is reverted #58240 (issue: #57982)
  • Delete expired data by job #57337
  • Introduce Annotation.event field #57144 (issue: #55781)
  • Add support for larger forecasts in memory via max_model_memory setting #1238, #57254
  • Don’t lose precision when saving model state #1274
  • Parallelize the feature importance calculation for classification and regression over trees #1277
  • Add an option to do categorization independently for each partition #1293, #1318, #1356, #57683
  • Memory usage is reported during job initialization #1294
  • More realistic memory estimation for classification and regression means that these analyses will require lower memory limits than before #1298
  • Checkpoint state to allow efficient failover during coarse parameter search for classification and regression #1300
  • Improve data access patterns to speed up classification and regression #1312
  • Performance improvements for classification and regression, particularly running multithreaded #1317
  • Improve runtime and memory usage training deep trees for classification and regression #1340
  • Improvement in handling large inference model definitions #1349
  • Add a peak_model_bytes field to model_size_stats #1389

Mapping

  • Add regex query support to wildcard field #55548 (issue: #54725)
  • Make keyword a family of field types #58315 (issue: #53175)
  • Store parsed mapping settings in IndexSettings #57492 (issue: #57395)
  • Wildcard field - add support for custom null values #57047

Network

  • Make the number of transport threads equal to the number of available CPUs #56488
  • Share Netty event loops between transports #46346

Recovery

  • Implement dangling indices API #50920 (issue: #48366)
  • Reestablish peer recovery after network errors #55274
  • Sending operations concurrently in peer recovery #58018 (issue: #58011)

Reindex

  • Throw an illegal_argument_exception when max_docs is less than slices #54901 (issue: #52786)

SQL

  • Implement TIME_PARSE function for parsing strings into TIME values #55223 (issues: #54963, #55095)
  • Implement TOP as an alternative to LIMIT #57428 (issue: #41195)
  • Implement TRIM function #57518 (issue: #41195)
  • Improve performances of LTRIM/RTRIM #57603 (issue: #57594)
  • Make CASTing string to DATETIME more lenient #57451
  • Redact credentials in connection exceptions #58650 (issue: #56474)
  • Relax parsing of date/time escaped literals #58336 (issue: #58262)
  • Add support for scalars within LIKE/RLIKE #56495 (issue: #55058)

Search

  • Add description to submit and get async search, as well as cancel tasks #57745
  • Add matchBoolPrefix static method in query builders #58637 (issue: #58388)
  • Add range query support to wildcard field #57881 (issue: #57816)
  • Group docIds by segment in FetchPhase to better use LRU cache #57273
  • Improve error handling when decoding async execution ids #56285
  • Specify reason whenever async search gets cancelled #57761
  • Use index sort range query when possible. #56657 (issue: #48665)

Security

  • Add machine learning admin permissions to the kibana_system role #58061
  • Just log 401 stacktraces #55774

Snapshot/Restore

  • Deduplicate Index Metadata in BlobStore #50278 (issues: #45736, #46250, #49800)
  • Default to zero replicas for searchable snapshots #57802 (issue: #50999)
  • Enable fully concurrent snapshot operations #56911
  • Support cloning of searchable snapshot indices #56595
  • Track GET/LIST Azure Storage API calls #56773
  • Track GET/LIST GoogleCloudStorage API calls #56585
  • Track PUT/PUT_BLOCK operations on AzureBlobStore. #56936
  • Track multipart/resumable uploads GCS API calls #56821
  • Track upload requests on S3 repositories #56826

Task Management

  • Add index name to refresh mapping task #57598
  • Cancel task and descendants on channel disconnects #56620 (issues: #56327, #56619)

Transform

  • Add support for terms agg in transforms #56696
  • Adds geotile_grid support in group_by #56514 (issue: #56121)

Bug fixes

Aggregations

  • Fix auto_date_histogram interval #56252 (issue: #56116)
  • Fix bug in faster interval rounding #56433 (issue: #56400)
  • Fix bug in parent and child aggregators when parent field not defined #57089 (issue: #42997)
  • Fix missing null values for std_deviation_bounds in ext. stats aggs #58000

Allocation

Authentication

  • Map only specific type of OIDC Claims #58524

Authorization

  • Change privilege of enrich stats API to monitor #52027 (issue: #51677)

Engine

  • Fix local translog recovery not updating safe commit in edge case #57350 (issue: #57010)
  • Hide AlreadyClosedException on IndexCommit release #57986 (issue: #57797)

Features/ILM+SLM

Features/Indices APIs

Features/Ingest

  • Fix ingest simulate verbose on failure with conditional #56478 (issue: #56004)

Geo

  • Check for degenerated lines when calculating the centroid #58027 (issue: #55851)
  • Fix bug in circuit-breaker check for geoshape grid aggregations #57962 (issue: #57847)

Infra/Scripting

Machine Learning

  • Fix wire serialization for flush acknowledgements #58413
  • Make waiting for renormalization optional for internally flushing job #58537 (issue: #58395)
  • Tail the C++ logging pipe before connecting other pipes #56632 (issue: #56366)
  • Fix numerical issues leading to blow up of the model plot bounds #1268
  • Fix causes for inverted forecast confidence interval bounds #1369 (issue: #1357)
  • Restrict growth of max matching string length for categories #1406

Mapping

  • Wildcard field fix for scripts - changed value type from BytesRef to String #58060 (issue: #58044)

SQL

  • Introduce JDBC option for meta pattern escaping #40661 (issue: #40640)

Search

Snapshot/Restore

  • Account for recovery throttling when restoring snapshot #58658 (issue: #57023)
  • Fix noisy logging during snapshot delete #56264
  • Fix S3ClientSettings leak #56703 (issue: #56702)

Upgrades

Search