Using schemas to enhance the Rego type checker
You can provide one or more input schema files and/or data schema files to opa eval
to improve static type checking and get more precise error reports as you develop Rego code.
The -s
flag can be used to upload schemas for input and data documents in JSON Schema format. You can either load a single JSON schema file for the input document or directory of schema files.
-s, --schema string set schema file path or directory path
Passing a single file with -s
When a single file is passed, it is a schema file associated with the input document globally. This means that for all rules in all packages, the input
has a type derived from that schema. There is no constraint on the name of the file, it could be anything.
Example:
opa eval data.envoy.authz.allow -i opa-schema-examples/envoy/input.json -d opa-schema-examples/envoy/policy.rego -s opa-schema-examples/envoy/schemas/my-schema.json
Passing a directory with -s
When a directory path is passed, annotations will be used in the code to indicate what expressions map to what schemas (see below). Both input schema files and data schema files can be provided in the same directory, with different names. The directory of schemas may have any sub-directories. Notice that when a directory is passed the input document does not have a schema associated with it globally. This must also be indicated via an annotation.
Example:
opa eval data.kubernetes.admission -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego -s opa-schema-examples/kubernetes/schemas
Schemas can also be provided for policy and data files loaded via opa eval --bundle
Example:
opa eval data.kubernetes.admission -i opa-schema-examples/kubernetes/input.json -b opa-schema-examples/bundle.tar.gz -s opa-schema-examples/kubernetes/schemas
Samples provided at: https://github.com/aavarghese/opa-schema-examples/
Usage scenario with a single schema file
Consider the following Rego code, which assumes as input a Kubernetes admission review. For resources that are Pods, it checks that the image name starts with a specific prefix.
pod.rego
package kubernetes.admission
deny[msg] {
input.request.kind.kinds == "Pod"
image := input.request.object.spec.containers[_].image
not startswith(image, "hooli.com/")
msg := sprintf("image '%v' comes from untrusted registry", [image])
}
Notice that this code has a typo in it: input.request.kind.kinds
is undefined and should have been input.request.kind.kind
.
Consider the following input document:
input.json
{
"kind": "AdmissionReview",
"request": {
"kind": {
"kind": "Pod",
"version": "v1"
},
"object": {
"metadata": {
"name": "myapp"
},
"spec": {
"containers": [
{
"image": "nginx",
"name": "nginx-frontend"
},
{
"image": "mysql",
"name": "mysql-backend"
}
]
}
}
}
}
Clearly there are 2 image names that are in violation of the policy. However, when we evaluate the erroneous Rego code against this input we obtain:
% opa eval data.kubernetes.admission --format pretty -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego
[]
The empty value returned is indistinguishable from a situation where the input did not violate the policy. This error is therefore causing the policy not to catch violating inputs appropriately.
If we fix the Rego code and change input.request.kind.kinds
to input.request.kind.kind
, then we obtain the expected result:
[
"image 'nginx' comes from untrusted registry",
"image 'mysql' comes from untrusted registry"
]
With this feature, it is possible to pass a schema to opa eval
, written in JSON Schema. Consider the admission review schema provided at: https://github.com/aavarghese/opa-schema-examples/blob/main/kubernetes/schemas/input.json
We can pass this schema to the evaluator as follows:
% opa eval data.kubernetes.admission --format pretty -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego -s opa-schema-examples/kubernetes/schemas/input.json
With the erroneous Rego code, we now obtain the following type error:
1 error occurred: ../../aavarghese/opa-schema-examples/kubernetes/policy.rego:5: rego_type_error: undefined ref: input.request.kind.kinds
input.request.kind.kinds
^
have: "kinds"
want (one of): ["kind" "version"]
This indicates the error to the Rego developer right away, without having the need to observe the results of runs on actual data, thereby improving productivity.
Schema annotations
When passing a directory of schemas to opa eval
, schema annotations become handy to associate a Rego expression with a corresponding schema within a given scope:
# METADATA
# schemas:
# - <path-to-value>:<path-to-schema>
# ...
# - <path-to-value>:<path-to-schema>
allow {
...
}
See the annotations documentation for general information relating to annotations.
The schemas
field specifies an array associating schemas to data values. Paths must start with input
or data
(i.e., they must be fully-qualified.)
The type checker derives a Rego Object type for the schema and an appropriate entry is added to the type environment before type checking the rule. This entry is removed upon exit from the rule.
Example:
Consider the following Rego code which checks if an operation is allowed by a user, given an ACL data document:
package policy
import data.acl
default allow := false
# METADATA
# schemas:
# - input: schema.input
# - data.acl: schema["acl-schema"]
allow {
access := data.acl["alice"]
access[_] == input.operation
}
allow {
access := data.acl["bob"]
access[_] == input.operation
}
Consider a directory named mySchemasDir
with the following structure, provided via opa eval --schema opa-schema-examples/mySchemasDir
mySchemasDir/
├── input.json
└── acl-schema.json
For actual code samples, see https://github.com/aavarghese/opa-schema-examples/tree/main/acl.
In the first allow
rule above, the input document has the schema input.json
, and data.acl
has the schema acl-schema.json
. Note that we use the relative path inside the mySchemasDir
directory to identify a schema, omit the .json
suffix, and use the global variable schema
to stand for the top-level of the directory. Schemas in annotations are proper Rego references. So schema.input
is also valid, but schema.acl-schema
is not.
If we had the expression data.acl.foo
in this rule, it would result in a type error because the schema contained in acl-schema.json
only defines object properties "alice"
and "bob"
in the ACL data document.
On the other hand, this annotation does not constrain other paths under data
. What it says is that we know the type of data.acl
statically, but not that of other paths. So for example, data.foo
is not a type error and gets assigned the type Any
.
Note that the second allow
rule doesn’t have a METADATA comment block attached to it, and hence will not be type checked with any schemas.
On a different note, schema annotations can also be added to policy files part of a bundle package loaded via opa eval --bundle
alongwith the --schema
parameter for type checking a set of *.rego
policy files.
The scope of the schema
annotation can be controlled through the scope annotation
In case of overlap, schema annotations override each other as follows:
rule overrides document
document overrides package
package overrides subpackages
The following sections explain how the different scopes affect schema
annotation overriding for type checking.
Rule and Document Scopes
In the example above, the second rule does not include an annotation so type checking of the second rule would not take schemas into account. To enable type checking on the second (or other rules in the same file) we could specify the annotation multiple times:
# METADATA
# scope: rule
# schemas:
# - input: schema.input
# - data.acl: schema["acl-schema"]
allow {
access := data.acl["alice"]
access[_] == input.operation
}
# METADATA
# scope: rule
# schemas:
# - input: schema.input
# - data.acl: schema["acl-schema"]
allow {
access := data.acl["bob"]
access[_] == input.operation
}
This is obviously redundant and error-prone. To avoid this problem, we can define the annotation once on a rule with scope document
:
# METADATA
# scope: document
# schemas:
# - input: schema.input
# - data.acl: schema["acl-schema"]
allow {
access := data.acl["alice"]
access[_] == input.operation
}
allow {
access := data.acl["bob"]
access[_] == input.operation
}
In this example, the annotation with document
scope has the same affect as the two rule
scoped annotations in the previous example.
Package and Subpackage Scopes
Annotations can be defined at the package
level and then applied to all rules within the package:
# METADATA
# scope: package
# schemas:
# - input: schema.input
# - data.acl: schema["acl-schema"]
package example
allow {
access := data.acl["alice"]
access[_] == input.operation
}
allow {
access := data.acl["bob"]
access[_] == input.operation
}
package
scoped schema annotations are useful when all rules in the same package operate on the same input structure. In some cases, when policies are organized into many sub-packages, it is useful to declare schemas recursively for them using the subpackages
scope. For example:
# METADTA
# scope: subpackages
# schemas:
# - input: schema.input
package kubernetes.admission
This snippet would declare the top-level schema for input
for the kubernetes.admission
package as well as all subpackages. If admission control rules were defined inside packages like kubernetes.admission.workloads.pods
, they would be able to pick up that one schema declaration.
Overriding
JSON Schemas are often incomplete specifications of the format of data. For example, a Kubernetes Admission Review resource has a field object
which can contain any other Kubernetes resource. A schema for Admission Review has a generic type object
for that field that has no further specification. To allow more precise type checking in such cases, we support overriding existing schemas.
Consider the following example:
package kubernetes.admission
# METADATA
# scope: rule
# schemas:
# - input: schema.input
# - input.request.object: schema.kubernetes.pod
deny[msg] {
input.request.kind.kind == "Pod"
image := input.request.object.spec.containers[_].image
not startswith(image, "hooli.com/")
msg := sprintf("image '%v' comes from untrusted registry", [image])
}
In this example, the input
is associated with an Admission Review schema, and furthermore input.request.object
is set to have the schema of a Kubernetes Pod. In effect, the second schema annotation overrides the first one. Overriding is a schema transformation feature and combines existing schemas. In this case, we are combining the Admission Review schema with that of a Pod.
Notice that the order of schema annotations matter for overriding to work correctly.
Given a schema annotation, if a prefix of the path already has a type in the environment, then the annotation has the effect of merging and overriding the existing type with the type derived from the schema. In the example above, the prefix input
already has a type in the type environment, so the second annotation overrides this existing type. Overriding affects the type of the longest prefix that already has a type. If no such prefix exists, the new path and type are added to the type environment for the scope of the rule.
In general, consider the existing Rego type:
object{a: object{b: object{c: C, d: D, e: E}}}
If we override this type with the following type (derived from a schema annotation of the form a.b.e: schema-for-E1
):
object{a: object{b: object{e: E1}}}
It results in the following type:
object{a: object{b: object{c: C, d: D, e: E1}}}
Notice that b
still has its fields c
and d
, so overriding has a merging effect as well. Moreover, the type of expression a.b.e
is now E1
instead of E
.
We can also use overriding to add new paths to an existing type, so if we override the initial type with the following:
object{a: object{b: object{f: F}}}
we obtain the following type:
object{a: object{b: object{c: C, d: D, e: E, f: F}}}
We use schemas to enhance the type checking capability of OPA, and not to validate the input and data documents against desired schemas. This burden is still on the user and care must be taken when using overriding to ensure that the input and data provided are sensible and validated against the transformed schemas.
Multiple input schemas
It is sometimes useful to have different input schemas for different rules in the same package. This can be achieved as illustrated by the following example:
package policy
import data.acl
default allow := false
# METADATA
# scope: rule
# schemas:
# - input: schema["input"]
# - data.acl: schema["acl-schema"]
allow {
access := data.acl[input.user]
access[_] == input.operation
}
# METADATA for whocan rule
# scope: rule
# schemas:
# - input: schema["whocan-input-schema"]
# - data.acl: schema["acl-schema"]
whocan[user] {
access := acl[user]
access[_] == input.operation
}
The directory that is passed to opa eval
is the following:
mySchemasDir/
├── input.json
└── acl-schema.json
└── whocan-input-schema.json
In this example, we associate the schema input.json
with the input document in the rule allow
, and the schema whocan-input-schema.json
with the input document for the rule whocan
.
Translating schemas to Rego types and dynamicity
Rego has a gradual type system meaning that types can be partially known statically. For example, an object could have certain fields whose types are known and others that are unknown statically. OPA type checks what it knows statically and leaves the unknown parts to be type checked at runtime. An OPA object type has two parts: the static part with the type information known statically, and a dynamic part, which can be nil (meaning everything is known statically) or non-nil and indicating what is unknown.
When we derive a type from a schema, we try to match what is known and unknown in the schema. For example, an object
that has no specified fields becomes the Rego type Object{Any: Any}
. However, currently additionalProperties
and additionalItems
are ignored. When a schema is fully specified, we derive a type with its dynamic part set to nil, meaning that we take a strict interpretation in order to get the most out of static type checking. This is the case even if additionalProperties
is set to true
in the schema. In the future, we will take this feature into account when deriving Rego types.
When overriding existing types, the dynamicity of the overridden prefix is preserved.
Supporting JSON Schema composition keywords
JSON Schema provides keywords such as anyOf
and allOf
to structure a complex schema. For anyOf
, at least one of the subschemas must be true, and for allOf
, all subschemas must be true. The type checker is able to identify such keywords and derive a more robust Rego type through more complex schemas.
anyOf
Specifically, anyOf
acts as an Rego Or type where at least one (can be more than one) of the subschemas is true. Consider the following Rego and schema file containing anyOf
:
policy-anyOf.rego
package kubernetes.admission
# METADATA
# scope: rule
# schemas:
# - input: schema["input-anyOf"]
deny {
input.request.servers.versions == "Pod"
}
input-anyOf.json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"kind": {"type": "string"},
"request": {
"type": "object",
"anyOf": [
{
"properties": {
"kind": {
"type": "object",
"properties": {
"kind": {"type": "string"},
"version": {"type": "string" }
}
}
}
},
{
"properties": {
"server": {
"type": "object",
"properties": {
"accessNum": {"type": "integer"},
"version": {"type": "string"}
}
}
}
}
]
}
}
}
We can see that request
is an object with two options as indicated by the choices under anyOf
:
- contains property
kind
, which has propertieskind
andversion
- contains property
server
, which has propertiesaccessNum
andversion
The type checker finds the first error in the Rego code, suggesting that servers
should be either kind
or server
.
input.request.servers.versions
^
have: "servers"
want (one of): ["kind" "server"]
Once this is fixed, the second typo is highlighted, prompting the user to choose between accessNum
and version
.
input.request.server.versions
^
have: "versions"
want (one of): ["accessNum" "version"]
allOf
Specifically, allOf
keyword implies that all conditions under allOf
within a schema must be met by the given data. allOf
is implemented through merging the types from all of the JSON subSchemas listed under allOf
before parsing the result to convert it to a Rego type. Merging of the JSON subSchemas essentially combines the passed in subSchemas based on what types they contain. Consider the following Rego and schema file containing allOf
:
policy-allOf.rego
package kubernetes.admission
# METADATA
# scope: rule
# schemas:
# - input: schema["input-allof"]
deny {
input.request.servers.versions == "Pod"
}
input-allof.json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"kind": {"type": "string"},
"request": {
"type": "object",
"allOf": [
{
"properties": {
"kind": {
"type": "object",
"properties": {
"kind": {"type": "string"},
"version": {"type": "string" }
}
}
}
},
{
"properties": {
"server": {
"type": "object",
"properties": {
"accessNum": {"type": "integer"},
"version": {"type": "string"}
}
}
}
}
]
}
}
}
We can see that request
is an object with properties as indicated by the elements listed under allOf
:
- contains property
kind
, which has propertieskind
andversion
- contains property
server
, which has propertiesaccessNum
andversion
The type checker finds the first error in the Rego code, suggesting that servers
should be server
.
input.request.servers.versions
^
have: "servers"
want (one of): ["kind" "server"]
Once this is fixed, the second typo is highlighted, informing the user that versions
should be one of accessNum
or version
.
input.request.server.versions
^
have: "versions"
want (one of): ["accessNum" "version"]
Because the properties kind
, version
, and accessNum
are all under the allOf
keyword, the resulting schema that the given data must be validated against will contain the types contained in these properties children (string and integer).
Remote references in JSON schemas
It is valid for JSON schemas to reference other JSON schemas via URLs, like this:
{
"description": "Pod is a collection of containers that can run on a host.",
"type": "object",
"properties": {
"metadata": {
"$ref": "https://kubernetesjsonschema.dev/v1.14.0/_definitions.json#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta",
"description": "Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata"
}
}
}
OPA’s type checker will fetch these remote references by default. To control the remote hosts schemas will be fetched from, pass a capabilities file to your opa eval
or opa check
call.
Starting from the capabilities.json of your OPA version (which can be found in the repository), add an allow_net
key to it: its values are the IP addresses or host names that OPA is supposed to connect to for retrieving remote schemas.
{
"builtins": [ ... ],
"allow_net": [ "kubernetesjsonschema.dev" ]
}
Note
To forbid all network access in schema checking, set
allow_net
to[]
Host names are checked against the list as-is, so adding
127.0.0.1
toallow_net
, and referencing a schema fromhttp://localhost/
will fail.Metaschemas for different JSON Schema draft versions are not subject to this constraint, as they are already provided by OPA’s schema checker without requiring network access. These are:
http://json-schema.org/draft-04/schema
http://json-schema.org/draft-06/schema
http://json-schema.org/draft-07/schema
Limitations
Currently this feature admits schemas written in JSON Schema but does not support every feature available in this format. In particular the following features are not yet supported:
- additional properties for objects
- pattern properties for objects
- additional items for arrays
- contains for arrays
- oneOf, not
- enum
- if/then/else
A note of caution: overriding is a powerful capability that must be used carefully. For example, the user is allowed to write:
# METADATA
# scope: rule
# schema:
# - data: schema["some-schema"]
In this case, we are overriding the root of all documents to have some schema. Since all Rego code lives under data
as virtual documents, this in practice renders all of them inaccessible (resulting in type errors). Similarly, assigning a schema to a package name is not a good idea and can cause problems. Care must also be taken when defining overrides so that the transformation of schemas is sensible and data can be validated against the transformed schema.
References
For more examples, please see https://github.com/aavarghese/opa-schema-examples
This contains samples for Envoy, Kubernetes, and Terraform including corresponding JSON Schemas.
For a reference on JSON Schema please see: http://json-schema.org/understanding-json-schema/reference/index.html
For a tool that generates JSON Schema from JSON samples, please see: https://jsonschema.net/home