Character Filters
Regular Expression
The regular expression character filter is configured with a regular expression and a replacement array of bytes. All sequences of characters matching the regular expression are replaced with the replacement bytes.
Typically, characters that are undesirable for indexing are replaced with whitespace. This allows the original byte offsets in the original input to remain unaffected.
HTML
The html character filter attempts to identify HTML tags from the input text and replace them with spaces. The current implementation is an instance of the Regular Expression character filter.
Zero-width Non-Joiner
The zero-width non-joiner character filter replaces zero-width non-joiner characters with a space.
当前内容版权归 blevesearch 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 blevesearch .