Configuring model guardrails

Configuring model guardrails

Introduced 2.13

Guardrails can guide a large language model (LLM) toward desired behavior. They act as a filter, preventing the LLM from generating output that is harmful or violates ethical principles and facilitating safer use of AI. Guardrails also cause the LLM to produce more focused and relevant output.

To configure guardrails for your LLM, you can provide a list of words to be prohibited in the input or output of the model. Alternatively, you can provide a regular expression against which the model input or output will be matched.

Prerequisites

Before you start, make sure you have fulfilled the prerequisites for connecting to an externally hosted model.

Step 1: Create a guardrail index

To start, create an index that will store the excluded words (stopwords). In the index settings, specify a title field, which will contain excluded words, and a query field of the percolator type. The percolator query will be used to match the LLM input or output:

PUT /words0
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "query": {
        "type": "percolator"
      }
    }
  }
}

copy

Step 2: Index excluded words or phrases

Next, index a query string query that will be used to match excluded words in the model input or output:

PUT /words0/_doc/1?refresh
{
  "query": {
    "query_string": {
      "query": "title: blacklist"
    }
  }
}

copy

PUT /words0/_doc/2?refresh
{
  "query": {
    "query_string": {
      "query": "title: \"Master slave architecture\""
    }
  }
}

copy

For more query string options, see Query string query.

Step 3: Register a model group

To register a model group, send the following request:

POST /_plugins/_ml/model_groups/_register
{
    "name": "bedrock",
    "description": "This is a public model group."
}

copy

The response contains the model group ID that you’ll use to register a model to this model group:

{
 "model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
 "status": "CREATED"
}

To learn more about model groups, see Model access control.

Step 4: Create a connector

Now you can create a connector for the model. In this example, you’ll create a connector to the Anthropic Claude model hosted on Amazon Bedrock:

POST /_plugins/_ml/connectors/_create
{
  "name": "BedRock test claude Connector",
  "description": "The connector to BedRock service for claude model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
      "region": "us-east-1",
      "service_name": "bedrock",
      "anthropic_version": "bedrock-2023-05-31",
      "endpoint": "bedrock.us-east-1.amazonaws.com",
      "auth": "Sig_V4",
      "content_type": "application/json",
      "max_tokens_to_sample": 8000,
      "temperature": 0.0001,
      "response_filter": "$.completion"
  },
  "credential": {
      "access_key": "<YOUR_ACCESS_KEY>",
      "secret_key": "<YOUR_SECRET_KEY>"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-v2/invoke",
      "headers": { 
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature},  \"anthropic_version\":\"${parameters.anthropic_version}\" }"
    }
  ]
}

copy

The response contains the connector ID for the newly created connector:

{
  "connector_id": "a1eMb4kBJ1eYAeTMAljY"
}

Step 5: Register and deploy the model with guardrails

To register an externally hosted model, provide the model group ID from step 3 and the connector ID from step 4 in the following request. To configure guardrails, include the guardrails object:

POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "Bedrock Claude V2 model",
  "function_name": "remote",
  "model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
  "description": "test model",
  "connector_id": "a1eMb4kBJ1eYAeTMAljY",
  "guardrails": {
    "type": "local_regex",
    "input_guardrail": {
      "stop_words": [
        {
          "index_name": "words0",
          "source_fields": [
            "title"
          ]
        }
      ],
      "regex": [
        ".*abort.*",
        ".*kill.*"
      ]
    },
    "output_guardrail": {
      "stop_words": [
        {
          "index_name": "words0",
          "source_fields": [
            "title"
          ]
        }
      ],
      "regex": [
        ".*abort.*",
        ".*kill.*"
      ]
    }
  }
}

copy

For more information, see The guardrails parameter.

OpenSearch returns the task ID of the register operation:

{
  "task_id": "cVeMb4kBJ1eYAeTMFFgj",
  "status": "CREATED"
}

To check the status of the operation, provide the task ID to the Tasks API:

GET /_plugins/_ml/tasks/cVeMb4kBJ1eYAeTMFFgj

copy

When the operation is complete, the state changes to COMPLETED:

{
  "model_id": "cleMb4kBJ1eYAeTMFFg4",
  "task_type": "DEPLOY_MODEL",
  "function_name": "REMOTE",
  "state": "COMPLETED",
  "worker_node": [
    "n-72khvBTBi3bnIIR8FTTw"
  ],
  "create_time": 1689793851077,
  "last_update_time": 1689793851101,
  "is_async": true
}

Step 6 (Optional): Test the model

To demonstrate how guardrails are applied, first run the predict operation that does not contain any excluded words:

POST /_plugins/_ml/models/p94dYo4BrXGpZpgPp98E/_predict
{
  "parameters": {
    "prompt": "\n\nHuman:this is a test\n\nnAssistant:"
  }
}

copy

The response contains inference results:

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": " Thank you for the test, I appreciate you taking the time to interact with me. I'm an AI assistant created by Anthropic to be helpful, harmless, and honest."
          }
        }
      ],
      "status_code": 200
    }
  ]
}

Then run the predict operation that contains excluded words:

POST /_plugins/_ml/models/p94dYo4BrXGpZpgPp98E/_predict
{
  "parameters": {
    "prompt": "\n\nHuman:this is a test of Master slave architecture\n\nnAssistant:"
  }
}

copy

The response contains an error message because guardrails were triggered:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "guardrails triggered for user input"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "guardrails triggered for user input"
  },
  "status": 400
}

Guardrails are also triggered when a prompt matches the supplied regular expression.

Next steps

For more information about configuring guardrails, see The guardrails parameter.