Keep types token filter

Keep types token filter

Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes to quick foxes by keeping only <ALPHANUM> (alphanumeric) tokens.

Token types

Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.

For example, the standard tokenizer can produce a variety of token types, including <ALPHANUM>, <HANGUL>, and <NUM>. Simpler analyzers, like the lowercase tokenizer, only produce the word token type.

Certain token filters can also add token types. For example, the synonym filter can add the <SYNONYM> token type.

This filter uses Lucene’s TypeTokenFilter.

Include example

The following analyze API request uses the keep_types filter to keep only <NUM> (numeric) tokens from 1 quick fox 2 lazy dogs.

  1. GET _analyze
  2. {
  3. "tokenizer": "standard",
  4. "filter": [
  5. {
  6. "type": "keep_types",
  7. "types": [ "<NUM>" ]
  8. }
  9. ],
  10. "text": "1 quick fox 2 lazy dogs"
  11. }

The filter produces the following tokens:

  1. [ 1, 2 ]

Exclude example

The following analyze API request uses the keep_types filter to remove <NUM> tokens from 1 quick fox 2 lazy dogs. Note the mode parameter is set to exclude.

  1. GET _analyze
  2. {
  3. "tokenizer": "standard",
  4. "filter": [
  5. {
  6. "type": "keep_types",
  7. "types": [ "<NUM>" ],
  8. "mode": "exclude"
  9. }
  10. ],
  11. "text": "1 quick fox 2 lazy dogs"
  12. }

The filter produces the following tokens:

  1. [ quick, fox, lazy, dogs ]

Configurable parameters

types

(Required, array of strings) List of token types to keep or remove.

mode

(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:

  • include

    (Default) Keep only the specified token types.

    exclude

    Remove the specified token types.

Customize and add to an analyzer

To customize the keep_types filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following create index API request uses a custom keep_types filter to configure a new custom analyzer. The custom keep_types filter keeps only <ALPHANUM> (alphanumeric) tokens.

  1. PUT keep_types_example
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "standard",
  8. "filter": [ "extract_alpha" ]
  9. }
  10. },
  11. "filter": {
  12. "extract_alpha": {
  13. "type": "keep_types",
  14. "types": [ "<ALPHANUM>" ]
  15. }
  16. }
  17. }
  18. }
  19. }