Batch predict

Batch predict

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue.

ML Commons can perform inference on large datasets in an offline asynchronous mode using a model deployed on external model servers. To use the Batch Predict API, you must provide the model_id for an externally hosted model. Amazon SageMaker, Cohere, and OpenAI are currently the only verified external servers that support this API.

For information about user access for this API, see Model access control considerations.

For information about externally hosted models, see Connecting to externally hosted models.

For instructions on how set up batch inference and connector blueprints, see the following:

Path and HTTP methods

POST /_plugins/_ml/models/<model_id>/_batch_predict

Prerequisites

Before using the Batch Predict API, you need to create a connector to the externally hosted model. For each action, specify the action_type parameter that describes the action:

batch_predict: Runs the batch predict operation.
batch_predict_status: Checks the batch predict operation status.
cancel_batch_predict: Cancels the batch predict operation.

For example, to create a connector to an OpenAI text-embedding-ada-002 model, send the following request. The cancel_batch_predict action is optional and supports canceling the batch job running on OpenAI:

POST /_plugins/_ml/connectors/_create
{
  "name": "OpenAI Embedding model",
  "description": "OpenAI embedding model for testing offline batch",
  "version": "1",
  "protocol": "http",
  "parameters": {
    "model": "text-embedding-ada-002",
    "input_file_id": "<your input file id in OpenAI>",
    "endpoint": "/v1/embeddings"
  },
  "credential": {
    "openAI_key": "<your openAI key>"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://api.openai.com/v1/embeddings",
      "headers": {
        "Authorization": "Bearer ${credential.openAI_key}"
      },
      "request_body": "{ \"input\": ${parameters.input}, \"model\": \"${parameters.model}\" }",
      "pre_process_function": "connector.pre_process.openai.embedding",
      "post_process_function": "connector.post_process.openai.embedding"
    },
    {
      "action_type": "batch_predict",
      "method": "POST",
      "url": "https://api.openai.com/v1/batches",
      "headers": {
        "Authorization": "Bearer ${credential.openAI_key}"
      },
      "request_body": "{ \"input_file_id\": \"${parameters.input_file_id}\", \"endpoint\": \"${parameters.endpoint}\", \"completion_window\": \"24h\" }"
    },
    {
      "action_type": "batch_predict_status",
      "method": "GET",
      "url": "https://api.openai.com/v1/batches/${parameters.id}",
      "headers": {
        "Authorization": "Bearer ${credential.openAI_key}"
      }
    },
    {
      "action_type": "cancel_batch_predict",
      "method": "POST",
      "url": "https://api.openai.com/v1/batches/${parameters.id}/cancel",
      "headers": {
        "Authorization": "Bearer ${credential.openAI_key}"
      }
    }
  ]
}

copy

The response contains a connector ID that you’ll use in the next steps:

{
  "connector_id": "XU5UiokBpXT9icfOM0vt"
}

Next, register an externally hosted model and provide the connector ID of the created connector:

POST /_plugins/_ml/models/_register?deploy=true
{
    "name": "OpenAI model for realtime embedding and offline batch inference",
    "function_name": "remote",
    "description": "OpenAI text embedding model",
    "connector_id": "XU5UiokBpXT9icfOM0vt"
}

copy

The response contains the task ID for the register operation:

{
  "task_id": "rMormY8B8aiZvtEZIO_j",
  "status": "CREATED",
  "model_id": "lyjxwZABNrAVdFa9zrcZ"
}

To check the status of the operation, provide the task ID to the Tasks API. Once the registration is complete, the task state changes to COMPLETED.

Example request

Once you have completed the prerequisite steps, you can call the Batch Predict API. The parameters in the batch predict request override those defined in the connector:

POST /_plugins/_ml/models/lyjxwZABNrAVdFa9zrcZ/_batch_predict
{
  "parameters": {
    "model": "text-embedding-3-large"
  }
}

copy

Example response

The response contains the task ID for the batch predict operation:

{
  "task_id": "KYZSv5EBqL2d0mFvs80C",
  "status": "CREATED"
}

To check the status of the batch predict job, provide the task ID to the Tasks API. You can find the job details in the remote_job field in the task. Once the prediction is complete, the task state changes to COMPLETED.

Example request

GET /_plugins/_ml/tasks/KYZSv5EBqL2d0mFvs80C

copy

Example response

The response contains the batch predict operation details in the remote_job field:

{
  "model_id": "JYZRv5EBqL2d0mFvKs1E",
  "task_type": "BATCH_PREDICTION",
  "function_name": "REMOTE",
  "state": "RUNNING",
  "input_type": "REMOTE",
  "worker_node": [
    "Ee5OCIq0RAy05hqQsNI1rg"
  ],
  "create_time": 1725491751455,
  "last_update_time": 1725491751455,
  "is_async": false,
  "remote_job": {
    "cancelled_at": null,
    "metadata": null,
    "request_counts": {
      "total": 3,
      "completed": 3,
      "failed": 0
    },
    "input_file_id": "file-XXXXXXXXXXXX",
    "output_file_id": "file-XXXXXXXXXXXXX",
    "error_file_id": null,
    "created_at": 1725491753,
    "in_progress_at": 1725491753,
    "expired_at": null,
    "finalizing_at": 1725491757,
    "completed_at": null,
    "endpoint": "/v1/embeddings",
    "expires_at": 1725578153,
    "cancelling_at": null,
    "completion_window": "24h",
    "id": "batch_XXXXXXXXXXXXXXX",
    "failed_at": null,
    "errors": null,
    "object": "batch",
    "status": "in_progress"
  }
}

For the definition of each field in the result, see OpenAI Batch API. Once the batch inference is complete, you can download the output by calling the OpenAI Files API and providing the file name specified in the id field of the response.

Canceling a batch predict job

You can also cancel the batch predict operation running on the remote platform using the task ID returned by the batch predict request. To add this capability, set the action_type to cancel_batch_predict in the connector configuration when creating the connector.

Example request

POST /_plugins/_ml/tasks/KYZSv5EBqL2d0mFvs80C/_cancel_batch

copy

Example response

{
  "status": "OK"
}