Create trained model vocabulary API

Create trained model vocabulary API

New API reference

For the most up-to-date API details, refer to Machine learning trained model APIs.

Creates a trained model vocabulary. This is supported only for natural language processing (NLP) models.

Request

PUT _ml/trained_models/<model_id>/vocabulary/

Prerequisites

Requires the manage_ml cluster privilege. This privilege is included in the machine_learning_admin built-in role.

Description

The vocabulary is stored in the index as described in inference_config.*.vocabulary of the trained model definition.

Path parameters

<model_id>

(Required, string) The unique identifier of the trained model.

Request body

vocabulary

(array) The model vocabulary. Must not be empty.

merges

(Optional, array) The model merges used in byte-pair encoding. The merges must be sub-token pairs, space delimited, and in order of preference. Example: [“f o”, “fo o”]. Must be provided for RoBERTa and BART style models.

scores

(Optional, array) Vocabulary value scores used by sentence-piece tokenization. Must have the same length as vocabulary. Required for unigram sentence-piece tokenized models like XLMRoberta and T5.

Examples

The following example shows how to create a model vocabulary for a previously stored trained model configuration.

  1. PUT _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/vocabulary
  2. {
  3. "vocabulary": [
  4. "[PAD]",
  5. "[unused0]",
  6. ...
  7. ]
  8. }

The API returns the following results:

  1. {
  2. "acknowledged": true
  3. }