similarity

similarity

Elasticsearch allows you to configure a text scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a text similarity algorithm other than the default BM25, such as boolean.

Only text-based field types like text and keyword support this configuration.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25

The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.

boolean

A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost.

The similarity can be set on the field level when a field is first created, as follows:

  1. resp = client.indices.create(
  2. index="my-index-000001",
  3. mappings={
  4. "properties": {
  5. "default_field": {
  6. "type": "text"
  7. },
  8. "boolean_sim_field": {
  9. "type": "text",
  10. "similarity": "boolean"
  11. }
  12. }
  13. },
  14. )
  15. print(resp)
  1. response = client.indices.create(
  2. index: 'my-index-000001',
  3. body: {
  4. mappings: {
  5. properties: {
  6. default_field: {
  7. type: 'text'
  8. },
  9. boolean_sim_field: {
  10. type: 'text',
  11. similarity: 'boolean'
  12. }
  13. }
  14. }
  15. }
  16. )
  17. puts response
  1. const response = await client.indices.create({
  2. index: "my-index-000001",
  3. mappings: {
  4. properties: {
  5. default_field: {
  6. type: "text",
  7. },
  8. boolean_sim_field: {
  9. type: "text",
  10. similarity: "boolean",
  11. },
  12. },
  13. },
  14. });
  15. console.log(response);
  1. PUT my-index-000001
  2. {
  3. "mappings": {
  4. "properties": {
  5. "default_field": {
  6. "type": "text"
  7. },
  8. "boolean_sim_field": {
  9. "type": "text",
  10. "similarity": "boolean"
  11. }
  12. }
  13. }
  14. }

The default_field uses the BM25 similarity.

The boolean_sim_field uses the boolean similarity.