Asynchronous batch ingestion

Introduced 2.17

Use the Asynchronous Batch Ingestion API to ingest data into your OpenSearch cluster from your files on remote file servers, such as Amazon Simple Storage Service (Amazon S3) or OpenAI. For detailed configuration steps, see Asynchronous batch ingestion.

Path and HTTP methods

  1. POST /_plugins/_ml/_batch_ingestion

Request body fields

The following table lists the available request fields.

FieldData typeRequired/OptionalDescription
index_nameStringRequiredThe index name.
field_mapObjectRequiredMaps fields from the source file to specific fields in an OpenSearch index for ingestion.
ingest_fieldsArrayOptionalLists fields from the source file that should be ingested directly into the OpenSearch index without any additional mapping.
credentialObjectRequiredContains the authentication information for accessing external data sources, such as Amazon S3 or OpenAI.
data_sourceObjectRequiredSpecifies the type and location of the external file(s) from which the data is ingested.
data_source.typeStringRequiredSpecifies the type of the external data source. Valid values are s3 and openAI.
data_source.sourceArrayRequiredSpecifies one or more file locations from which the data is ingested. For s3, specify the file path to the Amazon S3 bucket (for example, [“s3://offlinebatch/output/sagemaker_batch.json.out”]). For openAI, specify the file IDs for input or output files (for example, [“file-<your output file id>”, “file-<your input file id>”, “file-<your other file>”]).

Example request: Ingesting a single file

  1. POST /_plugins/_ml/_batch_ingestion
  2. {
  3. "index_name": "my-nlp-index",
  4. "field_map": {
  5. "chapter": "$.content[0]",
  6. "title": "$.content[1]",
  7. "chapter_embedding": "$.SageMakerOutput[0]",
  8. "title_embedding": "$.SageMakerOutput[1]",
  9. "_id": "$.id"
  10. },
  11. "ingest_fields": ["$.id"],
  12. "credential": {
  13. "region": "us-east-1",
  14. "access_key": "<your access key>",
  15. "secret_key": "<your secret key>",
  16. "session_token": "<your session token>"
  17. },
  18. "data_source": {
  19. "type": "s3",
  20. "source": ["s3://offlinebatch/output/sagemaker_batch.json.out"]
  21. }
  22. }

copy

Example request: Ingesting multiple files

  1. POST /_plugins/_ml/_batch_ingestion
  2. {
  3. "index_name": "my-nlp-index-openai",
  4. "field_map": {
  5. "question": "source[1].$.body.input[0]",
  6. "answer": "source[1].$.body.input[1]",
  7. "question_embedding":"source[0].$.response.body.data[0].embedding",
  8. "answer_embedding":"source[0].$.response.body.data[1].embedding",
  9. "_id": ["source[0].$.custom_id", "source[1].$.custom_id"]
  10. },
  11. "ingest_fields": ["source[2].$.custom_field1", "source[2].$.custom_field2"],
  12. "credential": {
  13. "openAI_key": "<you openAI key>"
  14. },
  15. "data_source": {
  16. "type": "openAI",
  17. "source": ["file-<your output file id>", "file-<your input file id>", "file-<your other file>"]
  18. }
  19. }

copy

Example response

  1. {
  2. "task_id": "cbsPlpEBMHcagzGbOQOx",
  3. "task_type": "BATCH_INGEST",
  4. "status": "CREATED"
  5. }