Tutorial: semantic search with a deployed model

Tutorial: semantic search with a deployed model

  • For the easiest way to perform semantic search in the Elastic Stack, refer to the semantic_text end-to-end tutorial.
  • This tutorial was written before the inference endpoint and semantic_text field type was introduced. Today we have simpler options for performing semantic search.

This guide shows you how to implement semantic search with models deployed in Elasticsearch: from selecting an NLP model, to writing queries.

Select an NLP model

Elasticsearch offers the usage of a wide range of NLP models, including both dense and sparse vector models. Your choice of the language model is critical for implementing semantic search successfully.

While it is possible to bring your own text embedding model, achieving good search results through model tuning is challenging. Selecting an appropriate model from our third-party model list is the first step. Training the model on your own data is essential to ensure better search results than using only BM25. However, the model training process requires a team of data scientists and ML experts, making it expensive and time-consuming.

To address this issue, Elastic provides a pre-trained representational model called Elastic Learned Sparse EncodeR (ELSER). ELSER, currently available only for English, is an out-of-domain sparse vector model that does not require fine-tuning. This adaptability makes it suitable for various NLP use cases out of the box. Unless you have a team of ML specialists, it is highly recommended to use the ELSER model.

In the case of sparse vector representation, the vectors mostly consist of zero values, with only a small subset containing non-zero values. This representation is commonly used for textual data. In the case of ELSER, each document in an index and the query text itself are represented by high-dimensional sparse vectors. Each non-zero element of the vector corresponds to a term in the model vocabulary. The ELSER vocabulary contains around 30000 terms, so the sparse vectors created by ELSER contain about 30000 values, the majority of which are zero. Effectively the ELSER model is replacing the terms in the original query with other terms that have been learnt to exist in the documents that best match the original search terms in a training dataset, and weights to control how important each is.

Deploy the model

After you decide which model you want to use for implementing semantic search, you need to deploy the model in Elasticsearch.

ELSER Dense vector models

To deploy ELSER, refer to Download and deploy ELSER.

To deploy a third-party text embedding model, refer to Deploy a text embedding model.

Map a field for the text embeddings

Before you start using the deployed model to generate embeddings based on your input text, you need to prepare your index mapping first. The mapping of the index depends on the type of model.

ELSER Dense vector models

ELSER produces token-weight pairs as output from the input text and the query. The Elasticsearch sparse_vector field type can store these token-weight pairs as numeric feature vectors. The index must have a field with the sparse_vector field type to index the tokens that ELSER generates.

To create a mapping for your ELSER index, refer to the Create the index mapping section of the tutorial. The example shows how to create an index mapping for my-index that defines the my_embeddings.tokens field - which will contain the ELSER output - as a sparse_vector field.

  1. resp = client.indices.create(
  2. index="my-index",
  3. mappings={
  4. "properties": {
  5. "my_tokens": {
  6. "type": "sparse_vector"
  7. },
  8. "my_text_field": {
  9. "type": "text"
  10. }
  11. }
  12. },
  13. )
  14. print(resp)
  1. response = client.indices.create(
  2. index: 'my-index',
  3. body: {
  4. mappings: {
  5. properties: {
  6. my_tokens: {
  7. type: 'sparse_vector'
  8. },
  9. my_text_field: {
  10. type: 'text'
  11. }
  12. }
  13. }
  14. }
  15. )
  16. puts response
  1. const response = await client.indices.create({
  2. index: "my-index",
  3. mappings: {
  4. properties: {
  5. my_tokens: {
  6. type: "sparse_vector",
  7. },
  8. my_text_field: {
  9. type: "text",
  10. },
  11. },
  12. },
  13. });
  14. console.log(response);
  1. PUT my-index
  2. {
  3. "mappings": {
  4. "properties": {
  5. "my_tokens": {
  6. "type": "sparse_vector"
  7. },
  8. "my_text_field": {
  9. "type": "text"
  10. }
  11. }
  12. }
  13. }

The name of the field that will contain the tokens generated by ELSER.

The field that contains the tokens must be a sparse_vector field.

The name of the field from which to create the sparse vector representation. In this example, the name of the field is my_text_field.

The field type is text in this example.

The models compatible with Elasticsearch NLP generate dense vectors as output. The dense_vector field type is suitable for storing dense vectors of numeric values. The index must have a field with the dense_vector field type to index the embeddings that the supported third-party model that you selected generates. Keep in mind that the model produces embeddings with a certain number of dimensions. The dense_vector field must be configured with the same number of dimensions using the dims option. Refer to the respective model documentation to get information about the number of dimensions of the embeddings.

To review a mapping of an index for an NLP model, refer to the mapping code snippet in the Add the text embedding model to an ingest inference pipeline section of the tutorial. The example shows how to create an index mapping that defines the my_embeddings.predicted_value field - which will contain the model output - as a dense_vector field.

  1. resp = client.indices.create(
  2. index="my-index",
  3. mappings={
  4. "properties": {
  5. "my_embeddings.predicted_value": {
  6. "type": "dense_vector",
  7. "dims": 384
  8. },
  9. "my_text_field": {
  10. "type": "text"
  11. }
  12. }
  13. },
  14. )
  15. print(resp)
  1. response = client.indices.create(
  2. index: 'my-index',
  3. body: {
  4. mappings: {
  5. properties: {
  6. 'my_embeddings.predicted_value' => {
  7. type: 'dense_vector',
  8. dims: 384
  9. },
  10. my_text_field: {
  11. type: 'text'
  12. }
  13. }
  14. }
  15. }
  16. )
  17. puts response
  1. const response = await client.indices.create({
  2. index: "my-index",
  3. mappings: {
  4. properties: {
  5. "my_embeddings.predicted_value": {
  6. type: "dense_vector",
  7. dims: 384,
  8. },
  9. my_text_field: {
  10. type: "text",
  11. },
  12. },
  13. },
  14. });
  15. console.log(response);
  1. PUT my-index
  2. {
  3. "mappings": {
  4. "properties": {
  5. "my_embeddings.predicted_value": {
  6. "type": "dense_vector",
  7. "dims": 384
  8. },
  9. "my_text_field": {
  10. "type": "text"
  11. }
  12. }
  13. }
  14. }

The name of the field that will contain the embeddings generated by the model.

The field that contains the embeddings must be a dense_vector field.

The model produces embeddings with a certain number of dimensions. The dense_vector field must be configured with the same number of dimensions by the dims option. Refer to the respective model documentation to get information about the number of dimensions of the embeddings.

The name of the field from which to create the dense vector representation. In this example, the name of the field is my_text_field.

The field type is text in this example.

Generate text embeddings

Once you have created the mappings for the index, you can generate text embeddings from your input text. This can be done by using an ingest pipeline with an inference processor. The ingest pipeline processes the input data and indexes it into the destination index. At index time, the inference ingest processor uses the trained model to infer against the data ingested through the pipeline. After you created the ingest pipeline with the inference processor, you can ingest your data through it to generate the model output.

ELSER Dense vector models

This is how an ingest pipeline that uses the ELSER model is created:

  1. resp = client.ingest.put_pipeline(
  2. id="my-text-embeddings-pipeline",
  3. description="Text embedding pipeline",
  4. processors=[
  5. {
  6. "inference": {
  7. "model_id": ".elser_model_2",
  8. "input_output": [
  9. {
  10. "input_field": "my_text_field",
  11. "output_field": "my_tokens"
  12. }
  13. ]
  14. }
  15. }
  16. ],
  17. )
  18. print(resp)
  1. response = client.ingest.put_pipeline(
  2. id: 'my-text-embeddings-pipeline',
  3. body: {
  4. description: 'Text embedding pipeline',
  5. processors: [
  6. {
  7. inference: {
  8. model_id: '.elser_model_2',
  9. input_output: [
  10. {
  11. input_field: 'my_text_field',
  12. output_field: 'my_tokens'
  13. }
  14. ]
  15. }
  16. }
  17. ]
  18. }
  19. )
  20. puts response
  1. const response = await client.ingest.putPipeline({
  2. id: "my-text-embeddings-pipeline",
  3. description: "Text embedding pipeline",
  4. processors: [
  5. {
  6. inference: {
  7. model_id: ".elser_model_2",
  8. input_output: [
  9. {
  10. input_field: "my_text_field",
  11. output_field: "my_tokens",
  12. },
  13. ],
  14. },
  15. },
  16. ],
  17. });
  18. console.log(response);
  1. PUT _ingest/pipeline/my-text-embeddings-pipeline
  2. {
  3. "description": "Text embedding pipeline",
  4. "processors": [
  5. {
  6. "inference": {
  7. "model_id": ".elser_model_2",
  8. "input_output": [
  9. {
  10. "input_field": "my_text_field",
  11. "output_field": "my_tokens"
  12. }
  13. ]
  14. }
  15. }
  16. ]
  17. }

Configuration object that defines the input_field for the inference process and the output_field that will contain the inference results.

To ingest data through the pipeline to generate tokens with ELSER, refer to the Ingest the data through the inference ingest pipeline section of the tutorial. After you successfully ingested documents by using the pipeline, your index will contain the tokens generated by ELSER. Tokens are learned associations capturing relevance, they are not synonyms. To learn more about what tokens are, refer to this page.

This is how an ingest pipeline that uses a text embedding model is created:

  1. resp = client.ingest.put_pipeline(
  2. id="my-text-embeddings-pipeline",
  3. description="Text embedding pipeline",
  4. processors=[
  5. {
  6. "inference": {
  7. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  8. "target_field": "my_embeddings",
  9. "field_map": {
  10. "my_text_field": "text_field"
  11. }
  12. }
  13. }
  14. ],
  15. )
  16. print(resp)
  1. response = client.ingest.put_pipeline(
  2. id: 'my-text-embeddings-pipeline',
  3. body: {
  4. description: 'Text embedding pipeline',
  5. processors: [
  6. {
  7. inference: {
  8. model_id: 'sentence-transformers__msmarco-minilm-l-12-v3',
  9. target_field: 'my_embeddings',
  10. field_map: {
  11. my_text_field: 'text_field'
  12. }
  13. }
  14. }
  15. ]
  16. }
  17. )
  18. puts response
  1. const response = await client.ingest.putPipeline({
  2. id: "my-text-embeddings-pipeline",
  3. description: "Text embedding pipeline",
  4. processors: [
  5. {
  6. inference: {
  7. model_id: "sentence-transformers__msmarco-minilm-l-12-v3",
  8. target_field: "my_embeddings",
  9. field_map: {
  10. my_text_field: "text_field",
  11. },
  12. },
  13. },
  14. ],
  15. });
  16. console.log(response);
  1. PUT _ingest/pipeline/my-text-embeddings-pipeline
  2. {
  3. "description": "Text embedding pipeline",
  4. "processors": [
  5. {
  6. "inference": {
  7. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  8. "target_field": "my_embeddings",
  9. "field_map": {
  10. "my_text_field": "text_field"
  11. }
  12. }
  13. }
  14. ]
  15. }

The model ID of the text embedding model you want to use.

The field_map object maps the input document field name (which is my_text_field in this example) to the name of the field that the model expects (which is always text_field).

To ingest data through the pipeline to generate text embeddings with your chosen model, refer to the Add the text embedding model to an inference ingest pipeline section. The example shows how to create the pipeline with the inference processor and reindex your data through the pipeline. After you successfully ingested documents by using the pipeline, your index will contain the text embeddings generated by the model.

Now it is time to perform semantic search!

Search the data

Depending on the type of model you have deployed, you can query rank features with a sparse vector query, or dense vectors with a kNN search.

ELSER Dense vector models

ELSER text embeddings can be queried using a sparse vector query. The sparse vector query enables you to query a sparse vector field, by providing the inference ID associated with the NLP model you want to use, and the query text:

  1. resp = client.search(
  2. index="my-index",
  3. query={
  4. "sparse_vector": {
  5. "field": "my_tokens",
  6. "inference_id": "my-elser-endpoint",
  7. "query": "the query string"
  8. }
  9. },
  10. )
  11. print(resp)
  1. const response = await client.search({
  2. index: "my-index",
  3. query: {
  4. sparse_vector: {
  5. field: "my_tokens",
  6. inference_id: "my-elser-endpoint",
  7. query: "the query string",
  8. },
  9. },
  10. });
  11. console.log(response);
  1. GET my-index/_search
  2. {
  3. "query":{
  4. "sparse_vector": {
  5. "field": "my_tokens",
  6. "inference_id": "my-elser-endpoint",
  7. "query": "the query string"
  8. }
  9. }
  10. }

Text embeddings produced by dense vector models can be queried using a kNN search. In the knn clause, provide the name of the dense vector field, and a query_vector_builder clause with the model ID and the query text.

  1. resp = client.search(
  2. index="my-index",
  3. knn={
  4. "field": "my_embeddings.predicted_value",
  5. "k": 10,
  6. "num_candidates": 100,
  7. "query_vector_builder": {
  8. "text_embedding": {
  9. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  10. "model_text": "the query string"
  11. }
  12. }
  13. },
  14. )
  15. print(resp)
  1. response = client.search(
  2. index: 'my-index',
  3. body: {
  4. knn: {
  5. field: 'my_embeddings.predicted_value',
  6. k: 10,
  7. num_candidates: 100,
  8. query_vector_builder: {
  9. text_embedding: {
  10. model_id: 'sentence-transformers__msmarco-minilm-l-12-v3',
  11. model_text: 'the query string'
  12. }
  13. }
  14. }
  15. }
  16. )
  17. puts response
  1. const response = await client.search({
  2. index: "my-index",
  3. knn: {
  4. field: "my_embeddings.predicted_value",
  5. k: 10,
  6. num_candidates: 100,
  7. query_vector_builder: {
  8. text_embedding: {
  9. model_id: "sentence-transformers__msmarco-minilm-l-12-v3",
  10. model_text: "the query string",
  11. },
  12. },
  13. },
  14. });
  15. console.log(response);
  1. GET my-index/_search
  2. {
  3. "knn": {
  4. "field": "my_embeddings.predicted_value",
  5. "k": 10,
  6. "num_candidates": 100,
  7. "query_vector_builder": {
  8. "text_embedding": {
  9. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  10. "model_text": "the query string"
  11. }
  12. }
  13. }
  14. }

In some situations, lexical search may perform better than semantic search. For example, when searching for single words or IDs, like product numbers.

Combining semantic and lexical search into one hybrid search request using reciprocal rank fusion provides the best of both worlds. Not only that, but hybrid search using reciprocal rank fusion has been shown to perform better in general.

ELSER Dense vector models

Hybrid search between a semantic and lexical query can be achieved by using an rrf retriever as part of your search request. Provide a sparse_vector query and a full-text query as standard retrievers for the rrf retriever. The rrf retriever uses reciprocal rank fusion to rank the top documents.

  1. resp = client.search(
  2. index="my-index",
  3. retriever={
  4. "rrf": {
  5. "retrievers": [
  6. {
  7. "standard": {
  8. "query": {
  9. "match": {
  10. "my_text_field": "the query string"
  11. }
  12. }
  13. }
  14. },
  15. {
  16. "standard": {
  17. "query": {
  18. "sparse_vector": {
  19. "field": "my_tokens",
  20. "inference_id": "my-elser-endpoint",
  21. "query": "the query string"
  22. }
  23. }
  24. }
  25. }
  26. ]
  27. }
  28. },
  29. )
  30. print(resp)
  1. const response = await client.search({
  2. index: "my-index",
  3. retriever: {
  4. rrf: {
  5. retrievers: [
  6. {
  7. standard: {
  8. query: {
  9. match: {
  10. my_text_field: "the query string",
  11. },
  12. },
  13. },
  14. },
  15. {
  16. standard: {
  17. query: {
  18. sparse_vector: {
  19. field: "my_tokens",
  20. inference_id: "my-elser-endpoint",
  21. query: "the query string",
  22. },
  23. },
  24. },
  25. },
  26. ],
  27. },
  28. },
  29. });
  30. console.log(response);
  1. GET my-index/_search
  2. {
  3. "retriever": {
  4. "rrf": {
  5. "retrievers": [
  6. {
  7. "standard": {
  8. "query": {
  9. "match": {
  10. "my_text_field": "the query string"
  11. }
  12. }
  13. }
  14. },
  15. {
  16. "standard": {
  17. "query": {
  18. "sparse_vector": {
  19. "field": "my_tokens",
  20. "inference_id": "my-elser-endpoint",
  21. "query": "the query string"
  22. }
  23. }
  24. }
  25. }
  26. ]
  27. }
  28. }
  29. }

Hybrid search between a semantic and lexical query can be achieved by providing:

  • an rrf retriever to rank top documents using reciprocal rank fusion
  • a standard retriever as a child retriever with query clause for the full-text query
  • a knn retriever as a child retriever with the kNN search that queries the dense vector field
  1. resp = client.search(
  2. index="my-index",
  3. retriever={
  4. "rrf": {
  5. "retrievers": [
  6. {
  7. "standard": {
  8. "query": {
  9. "match": {
  10. "my_text_field": "the query string"
  11. }
  12. }
  13. }
  14. },
  15. {
  16. "knn": {
  17. "field": "text_embedding.predicted_value",
  18. "k": 10,
  19. "num_candidates": 100,
  20. "query_vector_builder": {
  21. "text_embedding": {
  22. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  23. "model_text": "the query string"
  24. }
  25. }
  26. }
  27. }
  28. ]
  29. }
  30. },
  31. )
  32. print(resp)
  1. const response = await client.search({
  2. index: "my-index",
  3. retriever: {
  4. rrf: {
  5. retrievers: [
  6. {
  7. standard: {
  8. query: {
  9. match: {
  10. my_text_field: "the query string",
  11. },
  12. },
  13. },
  14. },
  15. {
  16. knn: {
  17. field: "text_embedding.predicted_value",
  18. k: 10,
  19. num_candidates: 100,
  20. query_vector_builder: {
  21. text_embedding: {
  22. model_id: "sentence-transformers__msmarco-minilm-l-12-v3",
  23. model_text: "the query string",
  24. },
  25. },
  26. },
  27. },
  28. ],
  29. },
  30. },
  31. });
  32. console.log(response);
  1. GET my-index/_search
  2. {
  3. "retriever": {
  4. "rrf": {
  5. "retrievers": [
  6. {
  7. "standard": {
  8. "query": {
  9. "match": {
  10. "my_text_field": "the query string"
  11. }
  12. }
  13. }
  14. },
  15. {
  16. "knn": {
  17. "field": "text_embedding.predicted_value",
  18. "k": 10,
  19. "num_candidates": 100,
  20. "query_vector_builder": {
  21. "text_embedding": {
  22. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
  23. "model_text": "the query string"
  24. }
  25. }
  26. }
  27. }
  28. ]
  29. }
  30. }
  31. }