Machine learning anomaly detection APIs - Update datafeeds - 《Elasticsearch v7.9 Reference》

Update datafeeds API

Update datafeeds API

Updates certain properties of a datafeed.

Request

POST _ml/datafeeds/<feed_id>/_update

Prerequisites

If Elasticsearch security features are enabled, you must have manage_ml, or manage cluster privileges to use this API. See Security privileges and Machine learning security privileges.

Description

If you update a datafeed property, you must stop and start the datafeed for the change to be applied.

When Elasticsearch security features are enabled, your datafeed remembers which roles the user who updated it had at the time of update and runs the query using those same roles. If you provide secondary authorization headers, those credentials are used instead.

Path parameters

<feed_id>

(Required, string) A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.

Request body

The following properties can be updated after the datafeed is created:

aggregations

(Optional, object) If set, the datafeed performs aggregation searches. Support for aggregations is limited and should be used only with low cardinality data. For more information, see Aggregating data for faster performance.

chunking_config

(Optional, object) Datafeeds might be required to search over long time periods, for several months or years. This search is split into time chunks in order to ensure the load on Elasticsearch is managed. Chunking configuration controls how the size of these time chunks are calculated and is an advanced configuration option.

Properties of chunking_config

mode

(string) There are three available modes:
- auto: The chunk size is dynamically calculated. This is the default and recommended value.
- manual: Chunking is applied according to the specified time_span.
- off: No chunking is applied.
time_span

(time units) The time span that each search will be querying. This setting is only applicable when the mode is set to manual. For example: 3h.

delayed_data_check_config

(Optional, object) Specifies whether the datafeed checks for missing data and the size of the window. For example: {"enabled": true, "check_window": "1h"}.

The datafeed can optionally search over indices that have already been read in an effort to determine whether any data has subsequently been added to the index. If missing data is found, it is a good indication that the query_delay option is set too low and the data is being indexed after the datafeed has passed that moment in time. See Working with delayed data.

This check runs only on real-time datafeeds.

Properties of delayed_data_check_config

check_window

(time units) The window of time that is searched for late data. This window of time ends with the latest finalized bucket. It defaults to null, which causes an appropriate check_window to be calculated when the real-time datafeed runs. In particular, the default check_window span calculation is based on the maximum of 2h or 8 * bucket_span.

enabled

(boolean) Specifies whether the datafeed periodically checks for delayed data. Defaults to true.

frequency

(Optional, time units) The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. For example: 150s. When frequency is shorter than the bucket span, interim results for the last (partial) bucket are written then eventually overwritten by the full bucket results. If the datafeed uses aggregations, this value must be divisible by the interval of the date histogram aggregation.

indices

(Optional, array) An array of index names. Wildcards are supported. For example: ["it_ops_metrics", "server*"].

If any indices are in remote clusters then node.remote_cluster_client must not be set to false on any machine learning nodes.

max_empty_searches

(Optional, integer) If a real-time datafeed has never seen any data (including during any initial training period) then it will automatically stop itself and close its associated job after this many real-time searches that return no documents. In other words, it will stop after frequency times max_empty_searches of real-time operation. If not set then a datafeed with no end time that sees no data will remain started until it is explicitly stopped. By default this setting is not set.

The special value -1 unsets this setting.

query

(Optional, object) The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {"boost": 1}}.

If you change the query, the analyzed data is also changed. Therefore, the required time to learn might be long and the understandability of the results is unpredictable. If you want to make significant changes to the source data, we would recommend you clone it and create a second job containing the amendments. Let both run in parallel and close one when you are satisfied with the results of the other job.

query_delay

(Optional, time units) The number of seconds behind real time that data is queried. For example, if data from 10:04 a.m. might not be searchable in Elasticsearch until 10:06 a.m., set this property to 120 seconds. The default value is randomly selected between 60s and 120s. This randomness improves the query performance when there are multiple jobs running on the same node. For more information, see Handling delayed data.

script_fields

(Optional, object) Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields. For more information, see Transforming data with script fields and Script fields.

scroll_size

(Optional, unsigned integer) The size parameter that is used in Elasticsearch searches. The default value is 1000.

indices_options

(Optional, object) Specifies index expansion options that are used during search.

For example:

{
   "expand_wildcards": ["all"],
   "ignore_unavailable": true,
   "allow_no_indices": "false",
   "ignore_throttled": true
}

For more information about these options, see Multi-target syntax.

Examples

POST _ml/datafeeds/datafeed-total-requests/_update
{
  "query": {
    "term": {
      "level": "error"
    }
  }
}

When the datafeed is updated, you receive the full datafeed configuration with with the updated values:

{
  "datafeed_id": "datafeed-total-requests",
  "job_id": "total-requests",
  "query_delay": "83474ms",
  "indices": ["server-metrics"],
  "query": {
    "term": {
      "level": {
        "value": "error",
        "boost": 1.0
      }
    }
  },
  "scroll_size": 1000,
  "chunking_config": {
    "mode": "auto"
  }
}