ML inference search response processor

Introduced 2.16

The ml_inference search response processor is used to invoke registered machine learning (ML) models in order to incorporate their outputs as new fields in documents within search results.

PREREQUISITE
Before using the ml_inference search response processor, you must have either a local ML model hosted on your OpenSearch cluster or an externally hosted model connected to your OpenSearch cluster through the ML Commons plugin. For more information about local models, see Using ML models within OpenSearch. For more information about externally hosted models, see Connecting to externally hosted models.

Syntax

The following is the syntax for the ml-inference search response processor:

  1. {
  2. "ml_inference": {
  3. "model_id": "<model_id>",
  4. "function_name": "<function_name>",
  5. "full_response_path": "<full_response_path>",
  6. "model_config":{
  7. "<model_config_field>": "<config_value>"
  8. },
  9. "model_input": "<model_input>",
  10. "input_map": [
  11. {
  12. "<model_input_field>": "<document_field>"
  13. }
  14. ],
  15. "output_map": [
  16. {
  17. "<new_document_field>": "<model_output_field>"
  18. }
  19. ],
  20. "override": "<override>",
  21. "one_to_one": false
  22. }
  23. }

copy

Request fields

The following table lists the required and optional parameters for the ml-inference search response processor.

ParameterData typeRequired/OptionalDescription
model_idStringRequiredThe ID of the ML model used by the processor.
function_nameStringOptional for externally hosted models

Required for local models
The function name of the ML model configured in the processor. For local models, valid values are sparse_encoding, sparse_tokenize, text_embedding, and text_similarity. For externally hosted models, valid value is remote. Default is remote.
model_configObjectOptionalCustom configuration options for the ML model. For more information, see The model_config object.
model_inputStringOptional for externally hosted models

Required for local models
A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is “{ \”parameters\”: ${ml_inference.parameters} }.
input_mapArrayOptional for externally hosted models

Required for local models
An array specifying how to map document fields in the search response to the model input fields. Each element of the array is a map in the “<model_input_field>”: “<document_field>” format and corresponds to one model invocation of a document field. If no input mapping is specified for an externally hosted model, then all document fields are passed to the model directly as input. The input_map size indicates the number of times the model is invoked (the number of Predict API requests).
<model_input_field>StringOptional for externally hosted models

Required for local models
The model input field name.
<document_field>StringOptional for externally hosted models

Required for local models
The name or JSON path of the document field in the search response used as the model input.
output_mapArrayOptional for externally hosted models

Required for local models
An array specifying how to map the model output fields to new fields in the search response document. Each element of the array is a map in the “<new_document_field>”: “<model_output_field>” format.
<new_document_field>StringOptional for externally hosted models

Required for local models
The name of the new field in the document in which the model’s output (specified by model_output) is stored. If no output mapping is specified for externally hosted models, then all fields from the model output are added to the new document field.
<model_output_field>StringOptional for externally hosted models

Required for local models
The name or JSON path of the field in the model output to be stored in the new_document_field.
full_response_pathBooleanOptionalSet this parameter to true if the model_output_field contains a full JSON path to the field instead of the field name. The model output will then be fully parsed to get the value of the field. Default is true for local models and false for externally hosted models.
ignore_missingBooleanOptionalIf true and any of the input fields defined in the input_map or output_map are missing, then the missing fields are ignored. Otherwise, a missing field causes a failure. Default is false.
ignore_failureBooleanOptionalSpecifies whether the processor continues execution even if it encounters an error. If true, then any failure is ignored and the search continues. If false, then any failure causes the search to be canceled. Default is false.
overrideBooleanOptionalRelevant if a document in the response already contains a field with the name specified in <new_document_field>. If override is false, then the input field is skipped. If true, then the existing field value is overridden by the new model output. Default is false.
max_prediction_tasksIntegerOptionalThe maximum number of concurrent model invocations that can run during document search. Default is 10.
one_to_oneBooleanOptionalSet this parameter to true to invoke the model once (make one Predict API request) for each document. Default value (false) specifies to invoke the model with all documents from the search response, making one Predict API request.
descriptionStringOptionalA brief description of the processor.
tagStringOptionalAn identifier tag for the processor. Useful for debugging to distinguish between processors of the same type.

The input_map and output_map mappings support standard JSON path notation for specifying complex data structures.

Setup

Create an index named my_index and index one document to explain the mappings:

  1. POST /my_index/_doc/1
  2. {
  3. "passage_text": "hello world"
  4. }

copy

Using the processor

Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. Before testing a pipeline using the processor, make sure that the model is successfully deployed. You can check the model state using the Get Model API.

For local models, you must provide a model_input field that specifies the model input format. Add any input fields in model_config to model_input.

For remote models, the model_input field is optional, and its default value is "{ \"parameters\": ${ml_inference.parameters} }.

Example: Externally hosted model

The following example shows you how to configure an ml_inference search response processor with an externally hosted model.

Step 1: Create a pipeline

The following example shows you how to create a search pipeline for an externally hosted text embedding model. The model requires an input field and generates results in a data field. It converts the text in the passage_text field into text embeddings and stores the embeddings in the passage_embedding field. The function_name is not explicitly specified in the processor configuration, so it defaults to remote, signifying an externally hosted model:

  1. PUT /_search/pipeline/ml_inference_pipeline
  2. {
  3. "description": "Generate passage_embedding when search documents",
  4. "processors": [
  5. {
  6. "ml_inference": {
  7. "model_id": "<your model id>",
  8. "input_map": [
  9. {
  10. "input": "passage_text"
  11. }
  12. ],
  13. "output_map": [
  14. {
  15. "passage_embedding": "data"
  16. }
  17. ]
  18. }
  19. }
  20. ]
  21. }

copy

When making a Predict API request to an externally hosted model, all necessary fields and parameters are usually contained within a parameters object:

  1. POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict
  2. {
  3. "parameters": {
  4. "input": [
  5. {
  6. ...
  7. }
  8. ]
  9. }
  10. }

When specifying the input_map for an externally hosted model, you can directly reference the input field instead of providing its dot path parameters.input:

  1. "input_map": [
  2. {
  3. "input": "passage_text"
  4. }
  5. ]

Step 2: Run the pipeline

Run the following query, providing the pipeline name in the request:

  1. GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
  2. {
  3. "query": {
  4. "match_all": {
  5. }
  6. }
  7. }

copy

The response confirms that the processor has generated text embeddings in the passage_embedding field. The document within _source now contains both the passage_text and passage_embedding fields:

  1. {
  2. "took": 288,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.00009405752,
  16. "hits": [
  17. {
  18. "_index": "my_index",
  19. "_id": "1",
  20. "_score": 0.00009405752,
  21. "_source": {
  22. "passage_text": "hello world",
  23. "passage_embedding": [
  24. 0.017304314,
  25. -0.021530833,
  26. 0.050184276,
  27. 0.08962978,
  28. ...]
  29. }
  30. }
  31. }
  32. ]
  33. }
  34. }

Example: Local model

The following example shows you how to configure an ml_inference search response processor with a local model.

Step 1: Create a pipeline

The following example shows you how to create a search pipeline for the huggingface/sentence-transformers/all-distilroberta-v1 local model. The model is a pretrained sentence transformer model hosted in your OpenSearch cluster.

If you invoke the model using the Predict API, then the request appears as follows:

  1. POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4
  2. {
  3. "text_docs":[ "today is sunny"],
  4. "return_number": true,
  5. "target_response": ["sentence_embedding"]
  6. }

Using this schema, specify the model_input as follows:

  1. "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }"

In the input_map, map the passage_text document field to the text_docs field expected by the model:

  1. "input_map": [
  2. {
  3. "text_docs": "passage_text"
  4. }
  5. ]

Because you specified the field to be converted into embeddings as a JSON path, you need to set the full_response_path to true. Then the full JSON document is parsed in order to obtain the input field:

  1. "full_response_path": true

The text in the passage_text field will be used to generate embeddings:

  1. {
  2. "passage_text": "hello world"
  3. }

The Predict API request returns the following response:

  1. {
  2. "inference_results" : [
  3. {
  4. "output" : [
  5. {
  6. "name" : "sentence_embedding",
  7. "data_type" : "FLOAT32",
  8. "shape" : [
  9. 768
  10. ],
  11. "data" : [
  12. 0.25517133,
  13. -0.28009856,
  14. 0.48519906,
  15. ...
  16. ]
  17. }
  18. ]
  19. }
  20. ]
  21. }

The model generates embeddings in the $.inference_results.*.output.*.data field. The output_map maps this field to the newly created passage_embedding field in the search response document:

  1. "output_map": [
  2. {
  3. "passage_embedding": "$.inference_results.*.output.*.data"
  4. }
  5. ]

To configure an ml_inference search response processor with a local model, specify the function_name explicitly. In this example, the function_name is text_embedding. For information about valid function_name values, see Request fields.

The following is the final configuration of the ml_inference search response processor with the local model:

  1. PUT /_search/pipeline/ml_inference_pipeline_local
  2. {
  3. "description": "search passage and generates embeddings",
  4. "processors": [
  5. {
  6. "ml_inference": {
  7. "function_name": "text_embedding",
  8. "full_response_path": true,
  9. "model_id": "<your model id>",
  10. "model_config": {
  11. "return_number": true,
  12. "target_response": ["sentence_embedding"]
  13. },
  14. "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }",
  15. "input_map": [
  16. {
  17. "text_docs": "passage_text"
  18. }
  19. ],
  20. "output_map": [
  21. {
  22. "passage_embedding": "$.inference_results.*.output.*.data"
  23. }
  24. ],
  25. "ignore_missing": true,
  26. "ignore_failure": true
  27. }
  28. }
  29. ]
  30. }

copy

Step 2: Run the pipeline

Run the following query, providing the pipeline name in the request:

  1. GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
  2. {
  3. "query": {
  4. "term": {
  5. "passage_text": {
  6. "value": "hello"
  7. }
  8. }
  9. }
  10. }

copy

Response

The response confirms that the processor has generated text embeddings in the passage_embedding field:

  1. {
  2. "took": 288,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.00009405752,
  16. "hits": [
  17. {
  18. "_index": "my_index",
  19. "_id": "1",
  20. "_score": 0.00009405752,
  21. "_source": {
  22. "passage_text": "hello world",
  23. "passage_embedding": [
  24. 0.017304314,
  25. -0.021530833,
  26. 0.050184276,
  27. 0.08962978,
  28. ...]
  29. }
  30. }
  31. ]
  32. }
  33. }