Elasticsearch version 8.9.0
Elasticsearch version 8.9.0
Also see Breaking changes in 8.9.
Known issues
- Question Answering fails on long input text. If the context supplied to the task is longer than the model’s max_sequence_length and truncate is set to none then inference fails with the message
question answering result has invalid dimension
. (issue: #97917) High Memory Pressure due to a GC JVM setting change
This version of Elasticsearch is bundled with JDK 20. In JDK 20 Preventive GC is disabled by default. This may lead to increased memory pressure and an increased number of CircuitBreakerExceptions when retrieving large documents under some load patterns. (issue: #99592)
If this change affects your use of Elasticsearch, consider re-enabling the previous behaviour by adding the JVM arguments
-XX:+UnlockDiagnosticVMOptions -XX:+G1UsePreventiveGC
(reference: JDK 20 release notes). It is important to note that this workaround is temporary and works only with JDK 20, which is bundled with Elasticsearch up to version 8.10.2 inclusive. Successive versions are bundling JDK 21+, where this setting has been removed. Specifying those JVM arguments will prevent the JVM (and therefore Elasticsearch Nodes) from starting.
Breaking changes
Aggregations
- Switch TDigestState to use
HybridDigest
by default #96904
Bug fixes
Allocation
- Attempt to fix delay allocation #95921
- Fix NPE in Desired Balance API #97775
- Fix autoexpand during node replace #96281
Authorization
CRUD
- Fix
retry_on_conflict
parameter in update API to not retry indefinitely #96262 - Handle failure in
TransportUpdateAction#handleUpdateFailureWithRetry
#97290 (issue: #97286)
Cluster Coordination
- Avoid
getStateForMasterService
where possible #97304 - Become candidate on publication failure #96490 (issue: #96273)
- Fix cluster settings update task acknowledgment #97111
Data streams
- Accept timestamp as object at root level #97401
Geo
- Fix bug when creating empty
geo_lines
#97509 (issue: #97311) - Fix time-series geo_line to include reduce phase in MergedGeoLines #96953 (issue: #96983)
- Support for Byte and Short as vector tiles features #97619 (issue: #97612)
ILM+SLM
Infra/CLI
Infra/Core
- Capture max processors in static init #97119 (issue: #97088)
- Interpret microseconds cpu stats from cgroups2 properly as nanos #96924 (issue: #96089)
Infra/Logging
- Add slf4j-nop in order to prevent startup warnings #95459
Infra/REST API
- Fix tchar pattern in
RestRequest
#96406
Infra/Scripting
Infra/Settings
Ingest Node
- Fixing
DateProcessor
when the format isepoch_millis
#95996 - Fixing
GeoIpDownloaderStatsAction$NodeResponse
serialization by defensively copying inputs #96777 (issue: #96438) - Trim field references in reroute processor #96941 (issue: #96939)
Machine Learning
- Catch exceptions thrown during inference and report as errors #2542
- Fix
WordPiece
tokenization where stripping accents results in an empty string #97354 - Improve model downloader robustness #97274
- Prevent high memory usage by evaluating batch inference singularly #2538
Mapping
- Avoid stack overflow while parsing mapping #95705 (issue: #52098)
- Fix mapping parsing logic to determine synthetic source is active #97355 (issue: #97320)
Ranking
- Fix
sub_searches
serialization bug #97587
Recovery
Search
- Prevent instantiation of
top_metrics
when sub-aggregations are present #96180 (issue: #95663) - Set new providers before building
FetchSubPhaseProcessors
#97460 (issue: #96284)
Snapshot/Restore
- Fix blob cache races/assertion errors #96458
- Fix reused/recovered bytes for files that are only partially recovered from cache #95987 (issues: #95970, #95994)
- Fix reused/recovered bytes for files that are recovered from cache #97278 (issue: #95994)
- Refactor
RestoreClusterStateListener
to useClusterStateObserver
#96662 (issue: #96425)
TSDB
Task Management
- Improve cancellability in
TransportTasksAction
#96279
Transform
- Improve reporting status of the transform that is about to finish #95672
Enhancements
Aggregations
- Add cluster setting to
SearchExecutionContext
to configureTDigestExecutionHint
#96943 - Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields #92060
- Make TDigestState configurable #96794
- Skip
SortingDigest
when merging a large digest inHybridDigest
#97099 - Support value retrieval in
top_hits
#95828
Allocation
- Take into account
expectedShardSize
when initializing shard in simulation #95734
Analysis
- Create
.synonyms
system index #95548
Application
- Add template parameters to Search Applications #95674
- Chunk profiling stacktrace response #96340
- [Profiling] Add status API #96272
- [Profiling] Allow to upgrade managed ILM policy #96550
- [Profiling] Introduce ILM for K/V indices #96268
- [Profiling] Require POST to retrieve stacktraces #96790
- [Profiling] Tweak default ILM policy #96516
- [Search Applications] Support arrays in stored mustache templates #96197
Authentication
- Header validator with Security #95112
Authorization
- Add Search ALC filter index prefix to the enterprise search user #96885
- Ensure checking application privileges work with nested-limited roles #96970
Autoscaling
DLM
- Add auto force merge functionality to DLM #95204
- Adding
data_lifecycle
to the _xpack/usage API #96177 - Adding
manage_data_stream_lifecycle
index privilege and expandingview_index_metadata
for access to data stream lifecycle APIs #95512 - Allow for the data lifecycle and the retention to be explicitly nullified #95979
Data streams
- Add support for
logs@custom
component template for `logs-- data streams #95481 (issue: #95469) - Adding ECS dynamic mappings component and applying it to logs data streams by default #96171 (issue: #95538)
- Adjust ECS dynamic templates to support
subobjects: false
#96712 - Automatically parse log events in logs data streams, if their
message
field contains JSON content #96083 (issue: #95522) - Change default of
ignore_malformed
totrue
inlogs-*-*
data streams #95329 (issue: #95224) - Set
@timestamp
for documents in logs data streams if missing and add support for custom pipeline #95971 (issues: #95537, #95551) - Update data streams implicit timestamp
ignore_malformed
settings #96051
Engine
- Cache modification time of translog writer file #95107
- Trigger refresh when shard becomes search active #96321 (issue: #95544)
Geo
- Add brute force approach to
GeoHashGridTiler
#96863 - Asset tracking - geo_line in time-series aggregations #94954
ILM+SLM
- Chunk the GET _ilm/policy response #97251 (issue: #96569)
- Move get lifecycle API to Management thread pool and make cancellable #97248 (issue: #96568)
- Reduce WaitForNoFollowersStep requests indices shard stats #94510
Indices APIs
- Bootstrap profiling indices at startup #95666
Infra/Node Lifecycle
- SIGTERM node shutdown type #95430
Ingest Node
- Add mappings for enrich fields #96056
- Ingest: expose reroute inquiry/reset via Elastic-internal API bridge #96958
Machine Learning
- Improved compliance with memory limitations #2469
- Improve detection of calendar cyclic components with long bucket lengths #2493
- Improve detection of time shifts, for example for daylight saving #2479
Mapping
Ranking
- Add multiple queries for ranking to the search endpoint #96224
Recovery
- Implement
StartRecoveryRequest#getDescription
#95731
Search
- Add search shards endpoint #94534
- Don’t generate stacktrace in
EarlyTerminationException
andTimeExceededException
#95910 - Feature/speed up binary vector decoding #96716
- Improve brute force vector search speed by using Lucene functions #96617
- Include search idle info to shard stats #95740 (issue: #95727)
- Integrate CCS with new
search_shards
API #95894 (issue: #93730) - Introduce a filtered collector manager #96824
- Introduce minimum score collector manager #96834
- Skip shards when querying constant keyword fields #96161 (issue: #95541)
- Support CCS minimize round trips in async search #96012
- Support for patter_replace filter in keyword normalizer #96588
- Support null_value for rank_feature field type #95811
Security
- Add “_storage” internal user #95694
Snapshot/Restore
- Reduce overhead in blob cache service get #96399
Stats
- Add
ingest
information to the cluster info endpoint #96328 (issue: #95392) - Add
script
information to the cluster info endpoint #96613 (issue: #95394) - Add
thread_pool
information to the cluster info endpoint #96407 (issue: #95393)
TSDB
Vector Search
New features
Application
- Enable analytics geoip in behavioral analytics #96624
Authorization
- Support restricting access of API keys to only certain workflows #96744
Data streams
- Adding ability to auto-install ingest pipelines and refer to them from index templates #95782
Geo
- Geometry simplifier #94859
ILM+SLM
- Enhance ILM Health Indicator #96092
Infra/Node Lifecycle
- Gracefully shutdown elasticsearch #96363
Infra/Plugins
Machine Learning
- Add support for
xlm_roberta
tokenized models #94089 - Removes the technical preview admonition from query_vector_builder docs #96735
Snapshot/Restore
Stats
Upgrades
Infra/Transport API
- Bump
TransportVersion
to the first non-release version number. Transport protocol is now versioned independently of release version. #95286
Network
Search