Text field type

Introduced 1.0

A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches for multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, have stopwords removed, have synonyms applied, and so on.

If you need to use a field for exact-value search, map it as a keyword instead.

The match_only_text field is a space-optimized version of the text field. If you don’t need to query phrases or use positional queries, map the field as match_only_text instead of text. Positional queries are queries in which the position of the term in the phrase is important, such as interval or span queries.

Example

Create a mapping with a text field:

  1. PUT movies
  2. {
  3. "mappings" : {
  4. "properties" : {
  5. "title" : {
  6. "type" : "text"
  7. }
  8. }
  9. }
  10. }

copy

Parameters

The following table lists the parameters accepted by text field types. All parameters are optional.

ParameterDescription
analyzerThe analyzer to be used for this field. By default, it will be used at index time and at search time. To override it at search time, set the search_analyzer parameter. Default is the standard analyzer, which uses grammar-based tokenization and is based on the Unicode Text Segmentation algorithm.
boostA floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
eager_global_ordinalsSpecifies whether global ordinals should be loaded eagerly on refresh. If the field is often used for aggregations, this parameter should be set to true. Default is false.
fielddataA Boolean value that specifies whether to access analyzed tokens for this field for sorting, aggregation, and scripting. Default is false.
fielddata_frequency_filterA JSON object that specifies to load into memory only those analyzed tokens whose document frequency is between the min and max values (provided as either an absolute number or a percentage). Frequency is computed per segment. Parameters: min, max, min_segment_size. Default is to load all analyzed tokens.
fieldsTo index the same string in several ways (for example, as a keyword and text), provide the fields parameter. You can specify one version of the field to be used for search and another to be used for sorting and aggregations.
indexA Boolean value that specifies whether the field should be searchable. Default is true.
index_optionsSpecifies the information to be stored in the index for search and highlighting. Valid values: docs (doc number only), freqs (doc number and term frequencies), positions (doc number, term frequencies, and term positions), offsets (doc number, term frequencies, term positions, and start and end character offsets). Default is positions.
index_phrasesA Boolean value that specifies to index 2-grams separately. 2-grams are combinations of two consecutive words in this field’s string. Leads to faster exact phrase queries with no slop but a larger index. Works best when stopwords are not removed. Default is false.
index_prefixesA JSON object that specifies to index term prefixes separately. The number of characters in the prefix is between min_chars and max_chars, inclusive. Leads to faster prefix searches but a larger index. Optional parameters: min_chars, max_chars. Default min_chars is 2, max_chars is 5.
metaAccepts metadata for this field.
normsA Boolean value that specifies whether the field length should be used when calculating relevance scores. Default is false.
position_increment_gapWhen text fields are analyzed, they are assigned positions. If a field contained an array of strings, and these positions were consecutive, this would lead to potentially matching across different array elements. To prevent this, an artificial gap is inserted between consecutive array elements. You can change this gap by specifying an integer position_increment_gap. Note: If slop is greater than position_element_gap, matching across different array elements may occur. Default is 100.
similarityThe ranking algorithm for calculating relevance scores. Default is BM25.
term_vectorA Boolean value that specifies whether a term vector for this field should be stored. Default is no.

Term vector parameter

A term vector is produced during analysis. It contains:

  • A list of terms.
  • The ordinal position of each term.
  • The start and end character offsets of the search string within the field.
  • Payloads (if available). Each term can have custom binary data associated with the term’s position.

The term_vector field contains a JSON object that accepts the following parameters:

ParameterStored values
noNone. This is the default.
yesTerms in the field.
with_offsetsTerms and character offsets.
with_positions_offsetsTerms, positions, and character offsets.
with_positions_offsets_payloadsTerms, positions, character offsets, and payloads.
with_positionsTerms and positions.
with_positions_payloadsTerms, positions, and payloads.

Storing positions is useful for proximity queries. Storing character offsets is useful for highlighting.

Term vector parameter example

Create a mapping with a text field that stores character offsets in a term vector:

  1. PUT testindex
  2. {
  3. "mappings" : {
  4. "properties" : {
  5. "dob" : {
  6. "type" : "text",
  7. "term_vector": "with_positions_offsets"
  8. }
  9. }
  10. }
  11. }

copy

Index a document with a text field:

  1. PUT testindex/_doc/1
  2. {
  3. "dob" : "The patient's date of birth."
  4. }

copy

Query for “date of birth” and highlight it in the original field:

  1. GET testindex/_search
  2. {
  3. "query": {
  4. "match": {
  5. "text": "date of birth"
  6. }
  7. },
  8. "highlight": {
  9. "fields": {
  10. "text": {}
  11. }
  12. }
  13. }

copy

The words “date of birth” are highlighted in the response:

  1. {
  2. "took" : 854,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 0.8630463,
  16. "hits" : [
  17. {
  18. "_index" : "testindex",
  19. "_type" : "_doc",
  20. "_id" : "1",
  21. "_score" : 0.8630463,
  22. "_source" : {
  23. "text" : "The patient's date of birth."
  24. },
  25. "highlight" : {
  26. "text" : [
  27. "The patient's <em>date</em> <em>of</em> <em>birth</em>."
  28. ]
  29. }
  30. }
  31. ]
  32. }
  33. }