Arrays
In Elasticsearch, there is no dedicated array
data type. Any field can contain zero or more values by default, however, all values in the array must be of the same data type. For instance:
- an array of strings: [
"one"
,"two"
] - an array of integers: [
1
,2
] - an array of arrays: [
1
, [2
,3
]] which is the equivalent of [1
,2
,3
] - an array of objects: [
{ "name": "Mary", "age": 12 }
,{ "name": "John", "age": 10 }
]
Arrays of objects
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested
data type instead of the object
data type.
This is explained in more detail in Nested.
When adding a field dynamically, the first value in the array determines the field type
. All subsequent values must be of the same data type or it must at least be possible to coerce subsequent values to the same data type.
Arrays with a mixture of data types are not supported: [ 10
, "some string"
]
An array may contain null
values, which are either replaced by the configured null_value
or skipped entirely. An empty array []
is treated as a missing field — a field with no values.
Nothing needs to be pre-configured in order to use arrays in documents, they are supported out of the box:
PUT my-index-000001/_doc/1
{
"message": "some arrays in this document...",
"tags": [ "elasticsearch", "wow" ],
"lists": [
{
"name": "prog_list",
"description": "programming list"
},
{
"name": "cool_list",
"description": "cool stuff list"
}
]
}
PUT my-index-000001/_doc/2
{
"message": "no arrays in this document...",
"tags": "elasticsearch",
"lists": {
"name": "prog_list",
"description": "programming list"
}
}
GET my-index-000001/_search
{
"query": {
"match": {
"tags": "elasticsearch"
}
}
}
The | |
The | |
The second document contains no arrays, but can be indexed into the same fields. | |
The query looks for |
Multi-value fields and the inverted index
The fact that all field types support multi-value fields out of the box is a consequence of the origins of Lucene. Lucene was designed to be a full text search engine. In order to be able to search for individual words within a big block of text, Lucene tokenizes the text into individual terms, and adds each term to the inverted index separately.
This means that even a simple text field must be able to support multiple values by default. When other data types were added, such as numbers and dates, they used the same data structure as strings, and so got multi-values for free.