Azure AI studio inference service
Azure AI studio inference service
New API reference
For the most up-to-date API details, refer to Inference APIs.
Creates an inference endpoint to perform an inference task with the azureaistudio
service.
Request
PUT /_inference/<task_type>/<inference_id>
Path parameters
<inference_id>
(Required, string) The unique identifier of the inference endpoint.
<task_type>
(Required, string) The type of the inference task that the model will perform.
Available task types:
completion
,text_embedding
.
Request body
chunking_settings
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
max_chunking_size
(Optional, integer) Specifies the maximum size of a chunk in words. Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy).overlap
(Optional, integer) Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunking_size
.sentence_overlap
(Optional, integer) Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
.strategy
(Optional, string) Specifies the chunking strategy. It could be either
sentence
orword
.
service
(Required, string) The type of service supported for the specified task type. In this case, azureaistudio
.
service_settings
(Required, object) Settings used to install the inference model.
These settings are specific to the azureaistudio
service.
api_key
(Required, string) A valid API key of your Azure AI Studio model deployment. This key can be found on the overview page for your deployment in the management section of your Azure AI Studio account.
You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
target
(Required, string) The target URL of your Azure AI Studio model deployment. This can be found on the overview page for your deployment in the management section of your Azure AI Studio account.
provider
(Required, string) The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:
cohere
- available fortext_embedding
andcompletion
task typesdatabricks
- available forcompletion
task type onlymeta
- available forcompletion
task type onlymicrosoft_phi
- available forcompletion
task type onlymistral
- available forcompletion
task type onlyopenai
- available fortext_embedding
andcompletion
task types
endpoint_type
(Required, string) One of
token
orrealtime
. Specifies the type of endpoint that is used in your model deployment. There are two endpoint types available for deployment through Azure AI Studio. “Pay as you go” endpoints are billed per token. For these, you must specifytoken
for yourendpoint_type
. For “real-time” endpoints which are billed per hour of usage, specifyrealtime
.rate_limit
(Optional, object) By default, the
azureaistudio
service sets the number of requests allowed per minute to240
. This helps to minimize the number of rate limit errors returned from Azure AI Studio. To modify this, set therequests_per_minute
setting of this object in your service settings:"rate_limit": {
"requests_per_minute": <<number_of_requests>>
}
task_settings
(Optional, object) Settings to configure the inference task. These settings are specific to the <task_type>
you specified.
task_settings
for the completion
task type
do_sample
(Optional, float) Instructs the inference process to perform sampling or not. Has no effect unless
temperature
ortop_p
is specified.max_new_tokens
(Optional, integer) Provides a hint for the maximum number of output tokens to be generated. Defaults to 64.
temperature
(Optional, float) A number in the range of 0.0 to 2.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. Should not be used if
top_p
is specified.top_p
(Optional, float) A number in the range of 0.0 to 2.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. Should not be used if
temperature
is specified.
task_settings
for the text_embedding
task type
user
(optional, string) Specifies the user issuing the request, which can be used for abuse detection.
Azure AI Studio service example
The following example shows how to create an inference endpoint called azure_ai_studio_embeddings
to perform a text_embedding
task type. Note that we do not specify a model here, as it is defined already via our Azure AI Studio deployment.
The list of embeddings models that you can choose from in your deployment can be found in the Azure AI Studio model explorer.
resp = client.inference.put(
task_type="text_embedding",
inference_id="azure_ai_studio_embeddings",
inference_config={
"service": "azureaistudio",
"service_settings": {
"api_key": "<api_key>",
"target": "<target_uri>",
"provider": "<model_provider>",
"endpoint_type": "<endpoint_type>"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "azure_ai_studio_embeddings",
inference_config: {
service: "azureaistudio",
service_settings: {
api_key: "<api_key>",
target: "<target_uri>",
provider: "<model_provider>",
endpoint_type: "<endpoint_type>",
},
},
});
console.log(response);
PUT _inference/text_embedding/azure_ai_studio_embeddings
{
"service": "azureaistudio",
"service_settings": {
"api_key": "<api_key>",
"target": "<target_uri>",
"provider": "<model_provider>",
"endpoint_type": "<endpoint_type>"
}
}
The next example shows how to create an inference endpoint called azure_ai_studio_completion
to perform a completion
task type.
resp = client.inference.put(
task_type="completion",
inference_id="azure_ai_studio_completion",
inference_config={
"service": "azureaistudio",
"service_settings": {
"api_key": "<api_key>",
"target": "<target_uri>",
"provider": "<model_provider>",
"endpoint_type": "<endpoint_type>"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "completion",
inference_id: "azure_ai_studio_completion",
inference_config: {
service: "azureaistudio",
service_settings: {
api_key: "<api_key>",
target: "<target_uri>",
provider: "<model_provider>",
endpoint_type: "<endpoint_type>",
},
},
});
console.log(response);
PUT _inference/completion/azure_ai_studio_completion
{
"service": "azureaistudio",
"service_settings": {
"api_key": "<api_key>",
"target": "<target_uri>",
"provider": "<model_provider>",
"endpoint_type": "<endpoint_type>"
}
}
The list of chat completion models that you can choose from in your deployment can be found in the Azure AI Studio model explorer.