Data enrichment
Data enrichment
The ES|QL ENRICH processing command combines, at query-time, data from one or more source indexes with field-value combinations found in Elasticsearch enrich indexes.
For example, you can use ENRICH
to:
- Identify web services or vendors based on known IP addresses
- Add product information to retail orders based on product IDs
- Supplement contact information based on an email address
How the ENRICH
command works
The ENRICH
command adds new columns to a table, with data from Elasticsearch indices. It requires a few special components:
Enrich policy
A set of configuration options used to add the right enrich data to the input table.
An enrich policy contains:
- A list of one or more source indices which store enrich data as documents
- The policy type which determines how the processor matches the enrich data to incoming documents
- A match field from the source indices used to match incoming documents
- Enrich fields containing enrich data from the source indices you want to add to incoming documents
After creating a policy, it must be executed before it can be used. Executing an enrich policy uses data from the policy’s source indices to create a streamlined system index called the enrich index. The ENRICH
command uses this index to match and enrich an input table.
Source index
An index which stores enrich data that the ENRICH
command can add to input tables. You can create and manage these indices just like a regular Elasticsearch index. You can use multiple source indices in an enrich policy. You also can use the same source index in multiple enrich policies.
Enrich index
A special system index tied to a specific enrich policy.
Directly matching rows from input tables to documents in source indices could be slow and resource intensive. To speed things up, the ENRICH
command uses an enrich index.
Enrich indices contain enrich data from source indices but have a few special properties to help streamline them:
- They are system indices, meaning they’re managed internally by Elasticsearch and only intended for use with enrich processors and the ES|QL
ENRICH
command. - They always begin with
.enrich-*
. - They are read-only, meaning you can’t directly change them.
- They are force merged for fast retrieval.
Set up an enrich policy
To start using ENRICH
, follow these steps:
- Check the prerequisites.
- Add enrich data.
- Create an enrich policy.
- Execute the enrich policy.
- Use the enrich policy
Once you have enrich policies set up, you can update your enrich data and update your enrich policies.
The ENRICH
command performs several operations and may impact the speed of your query.
Prerequisites
To use enrich policies, you must have:
read
index privileges for any indices used- The
enrich_user
built-in role
Add enrich data
To begin, add documents to one or more source indices. These documents should contain the enrich data you eventually want to add to incoming data.
You can manage source indices just like regular Elasticsearch indices using the document and index APIs.
You also can set up Beats, such as a Filebeat, to automatically send and index documents to your source indices. See Getting started with Beats.
Create an enrich policy
After adding enrich data to your source indices, use the create enrich policy API or Index Management in Kibana to create an enrich policy.
Once created, you can’t update or change an enrich policy. See Update an enrich policy.
Execute the enrich policy
Once the enrich policy is created, you need to execute it using the execute enrich policy API or Index Management in Kibana to create an enrich index.
The enrich index contains documents from the policy’s source indices. Enrich indices always begin with .enrich-*
, are read-only, and are force merged.
Enrich indices should only be used by the enrich processor or the ES|QL ENRICH command. Avoid using enrich indices for other purposes.
Use the enrich policy
After the policy has been executed, you can use the ENRICH command to enrich your data.
The following example uses the languages_policy
enrich policy to add a new column for each enrich field defined in the policy. The match is performed using the match_field
defined in the enrich policy and requires that the input table has a column with the same name (language_code
in this example). ENRICH
will look for records in the enrich index based on the match field value.
ROW language_code = "1"
| ENRICH languages_policy
language_code:keyword | language_name:keyword |
---|---|
1 | English |
To use a column with a different name than the match_field
defined in the policy as the match field, use ON <column-name>
:
ROW a = "1"
| ENRICH languages_policy ON a
a:keyword | language_name:keyword |
---|---|
1 | English |
By default, each of the enrich fields defined in the policy is added as a column. To explicitly select the enrich fields that are added, use WITH <field1>, <field2>, ...
:
ROW a = "1"
| ENRICH languages_policy ON a WITH language_name
a:keyword | language_name:keyword |
---|---|
1 | English |
You can rename the columns that are added using WITH new_name=<field1>
:
ROW a = "1"
| ENRICH languages_policy ON a WITH name = language_name
a:keyword | name:keyword |
---|---|
1 | English |
In case of name collisions, the newly created columns will override existing columns.
Update an enrich index
Once created, you cannot update or index documents to an enrich index. Instead, update your source indices and execute the enrich policy again. This creates a new enrich index from your updated source indices. The previous enrich index will deleted with a delayed maintenance job. By default this is done every 15 minutes.
Update an enrich policy
Once created, you can’t update or change an enrich policy. Instead, you can:
- Create and execute a new enrich policy.
- Replace the previous enrich policy with the new enrich policy in any in-use enrich processors or ES|QL queries.
- Use the delete enrich policy API or Index Management in Kibana to delete the previous enrich policy.
Enrich Policy Types and Limitations
The ES|QL ENRICH
command supports all three enrich policy types:
geo_match
Matches enrich data to incoming documents based on a geo_shape query. For an example, see Example: Enrich your data based on geolocation.
match
Matches enrich data to incoming documents based on a term query. For an example, see Example: Enrich your data based on exact values.
range
Matches a number, date, or IP address in incoming documents to a range in the enrich index based on a term query. For an example, see Example: Enrich your data by matching a value to a range.
While all three enrich policy types are supported, there are some limitations to be aware of:
- The
geo_match
enrich policy type only supports theintersects
spatial relation. - It is required that the
match_field
in theENRICH
command is of the correct type. For example, if the enrich policy is of typegeo_match
, thematch_field
in theENRICH
command must be of typegeo_point
orgeo_shape
. Likewise, arange
enrich policy requires amatch_field
of typeinteger
,long
,date
, orip
, depending on the type of the range field in the original enrich index. - However, this constraint is relaxed for
range
policies when thematch_field
is of typeKEYWORD
. In this case the field values will be parsed during query execution, row by row. If any value fails to parse, the output values for that row will be set tonull
, an appropriate warning will be produced and the query will continue to execute.