Letter

Github 来源:elastic 浏览 4 扫码分享 2025-01-13 22:24:16

Letter tokenizer
- Letter tokenizer
  - Example output
  - Configuration

Letter tokenizer

The letter tokenizer breaks text into terms whenever it encounters a character which is not a letter. It does a reasonable job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Example output

POST _analyze
{
  "tokenizer": "letter",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above sentence would produce the following terms:

[ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ]

Configuration

The letter tokenizer is not configurable.

« Keyword tokenizer
Lowercase tokenizer »

当前内容版权归 elastic 或其关联方所有，如需对内容或内容相关联开源项目进行关注与资助，请访问 elastic .

本文档使用 BookStack 构建