Amazon Bedrock inference service
Amazon Bedrock inference service
New API reference
For the most up-to-date API details, refer to Inference APIs.
Creates an inference endpoint to perform an inference task with the amazonbedrock
service.
Request
PUT /_inference/<task_type>/<inference_id>
Path parameters
<inference_id>
(Required, string) The unique identifier of the inference endpoint.
<task_type>
(Required, string) The type of the inference task that the model will perform.
Available task types:
completion
,text_embedding
.
Request body
chunking_settings
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
max_chunking_size
(Optional, integer) Specifies the maximum size of a chunk in words. Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy).overlap
(Optional, integer) Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunking_size
.sentence_overlap
(Optional, integer) Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
.strategy
(Optional, string) Specifies the chunking strategy. It could be either
sentence
orword
.
service
(Required, string) The type of service supported for the specified task type. In this case, amazonbedrock
.
service_settings
(Required, object) Settings used to install the inference model.
These settings are specific to the amazonbedrock
service.
access_key
(Required, string) A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
secret_key
(Required, string) A valid AWS secret key that is paired with the
access_key
. To create or manage access and secret keys, see Managing access keys for IAM users in the AWS documentation.
You need to provide the access and secret keys only once, during the inference model creation. The Get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys.
provider
(Required, string) The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:
amazontitan
- available fortext_embedding
andcompletion
task typesanthropic
- available forcompletion
task type onlyai21labs
- available forcompletion
task type onlycohere
- available fortext_embedding
andcompletion
task typesmeta
- available forcompletion
task type onlymistral
- available forcompletion
task type only
model
(Required, string) The base model ID or an ARN to a custom model based on a foundational model. The base model IDs can be found in the Amazon Bedrock model IDs documentation. Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model.
region
(Required, string) The region that your model or ARN is deployed in. The list of available regions per model can be found in the Model support by AWS region documentation.
rate_limit
(Optional, object) By default, the amazonbedrock
service sets the number of requests allowed per minute to 240
. This helps to minimize the number of rate limit errors returned from Amazon Bedrock. To modify this, set the requests_per_minute
setting of this object in your service settings:
"rate_limit": {
"requests_per_minute": <<number_of_requests>>
}
task_settings
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.
task_settings
for the completion
task type
max_new_tokens
(Optional, integer) Sets the maximum number for the output tokens to be generated. Defaults to 64.
temperature
(Optional, float) A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random. Should not be used if
top_p
ortop_k
is specified.top_p
(Optional, float) Alternative to
temperature
. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. Should not be used iftemperature
is specified.top_k
(Optional, float) Only available for
anthropic
,cohere
, andmistral
providers. Alternative totemperature
. Limits samples to the top-K most likely words, balancing coherence and variability. Should not be used iftemperature
is specified.
Amazon Bedrock service example
The following example shows how to create an inference endpoint called amazon_bedrock_embeddings
to perform a text_embedding
task type.
Choose chat completion and embeddings models that you have access to from the Amazon Bedrock base models.
resp = client.inference.put(
task_type="text_embedding",
inference_id="amazon_bedrock_embeddings",
inference_config={
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-embed-text-v2:0"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "amazon_bedrock_embeddings",
inference_config: {
service: "amazonbedrock",
service_settings: {
access_key: "<aws_access_key>",
secret_key: "<aws_secret_key>",
region: "us-east-1",
provider: "amazontitan",
model: "amazon.titan-embed-text-v2:0",
},
},
});
console.log(response);
PUT _inference/text_embedding/amazon_bedrock_embeddings
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-embed-text-v2:0"
}
}
The next example shows how to create an inference endpoint called amazon_bedrock_completion
to perform a completion
task type.
resp = client.inference.put(
task_type="completion",
inference_id="amazon_bedrock_completion",
inference_config={
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-text-premier-v1:0"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "completion",
inference_id: "amazon_bedrock_completion",
inference_config: {
service: "amazonbedrock",
service_settings: {
access_key: "<aws_access_key>",
secret_key: "<aws_secret_key>",
region: "us-east-1",
provider: "amazontitan",
model: "amazon.titan-text-premier-v1:0",
},
},
});
console.log(response);
PUT _inference/completion/amazon_bedrock_completion
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-text-premier-v1:0"
}
}