Secret discovery service (SDS)

TLS certificates, the secrets, can be specified in the bootstrap.static_resource secrets. But they can also be fetched remotely by secret discovery service (SDS).

The most important benefit of SDS is to simplify the certificate management. Without this feature, in k8s deployment, certificates must be created as secrets and mounted into the proxy containers. If certificates are expired, the secrets need to be updated and the proxy containers need to be re-deployed. With SDS, a central SDS server will push certificates to all Envoy instances. If certificates are expired, the server just pushes new certificates to Envoy instances, Envoy will use the new ones right away without re-deployment.

If a listener server certificate needs to be fetched by SDS remotely, it will NOT be marked as active, its port will not be opened before the certificates are fetched. If Envoy fails to fetch the certificates due to connection failures, or bad response data, the listener will be marked as active, and the port will be open, but the connection to the port will be reset.

Upstream clusters are handled in a similar way, if a cluster client certificate needs to be fetched by SDS remotely, it will NOT be marked as active and it will not be used before the certificates are fetched. If Envoy fails to fetch the certificates due to connection failures, or bad response data, the cluster will be marked as active, it can be used to handle the requests, but the requests routed to that cluster will be rejected.

If a static cluster is using SDS, and it needs to define a SDS cluster (unless Google gRPC is used which doesn’t need a cluster), the SDS cluster has to be defined before the static clusters using it.

The connection between Envoy proxy and SDS server has to be secure. One option is to run the SDS server on the same host and use Unix Domain Socket for the connection. Otherwise the connection requires TLS with authentication between the proxy and SDS server. Credential types in use today for authentication are:

  • mTLS – In this case, the client certificates for the SDS connection must be statically configured.

  • AWS IAM SigV4

SDS server

A SDS server needs to implement the gRPC service SecretDiscoveryService. It follows the same protocol as other xDS.

SDS Configuration

SdsSecretConfig is used to specify the secret. Its field name is a required field. If its sds_config field is empty, the name field specifies the secret in the bootstrap static_resource secrets. Otherwise, it specifies the SDS server as ConfigSource. Only gRPC is supported for the SDS service so its api_config_source must specify a grpc_service.

SdsSecretConfig is used in two fields in CommonTlsContext. The first field is tls_certificate_sds_secret_configs to use SDS to get TlsCertificate. The second field is validation_context_sds_secret_config to use SDS to get CertificateValidationContext.

Key rotation

It’s usually preferrable to perform key rotation via gRPC SDS, but when this is not possible or desired (e.g. during bootstrap of SDS credentials), SDS allows for filesystem rotation when secrets refer to filesystem paths. This currently is supported for the following secret types:

By default, directories containing secrets are watched for filesystem move events. For example, a key or trusted CA certificates at /foo/bar/baz/cert.pem will be watched at /foo/bar/baz. Explicit control over the watched directory is possible by specifying a watched_directory path in TlsCertificate and CertificateValidationContext. This allows watches to be established at path predecessors, e.g. /foo/bar; this capability is useful when implementing common key rotation schemes.

An example of key rotation is provided below.

Example one: static_resource

This example show how to configure secrets in the static_resource:

  1. static_resources:
  2. secrets:
  3. - name: server_cert
  4. tls_certificate:
  5. certificate_chain:
  6. filename: certs/servercert.pem
  7. private_key:
  8. filename: certs/serverkey.pem
  9. - name: client_cert
  10. tls_certificate:
  11. certificate_chain:
  12. filename: certs/clientcert.pem
  13. private_key:
  14. filename: certs/clientkey.pem
  15. - name: validation_context
  16. validation_context:
  17. trusted_ca:
  18. filename: certs/cacert.pem
  19. verify_certificate_hash:
  20. E0:F3:C8:CE:5E:2E:A3:05:F0:70:1F:F5:12:E3:6E:2E:97:92:82:84:A2:28:BC:F7:73:32:D3:39:30:A1:B6:FD
  21. clusters:
  22. - connect_timeout: 0.25s
  23. load_assignment:
  24. cluster_name: local_service_tls
  25. ...
  26. transport_socket:
  27. name: envoy.transport_sockets.tls
  28. typed_config:
  29. "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
  30. common_tls_context:
  31. tls_certificate_sds_secret_configs:
  32. - name: client_cert
  33. listeners:
  34. ....
  35. filter_chains:
  36. transport_socket:
  37. name: envoy.transport_sockets.tls
  38. typed_config:
  39. "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
  40. common_tls_context:
  41. tls_certificate_sds_secret_configs:
  42. - name: server_cert
  43. validation_context_sds_secret_config:
  44. name: validation_context

In this example, certificates are specified in the bootstrap static_resource, they are not fetched remotely. In the config, secrets static resource has 3 secrets: client_cert, server_cert and validation_context. In the cluster config, one of hosts uses client_cert in its tls_certificate_sds_secret_configs. In the listeners section, one of them uses server_cert in its tls_certificate_sds_secret_configs and validation_context for its validation_context_sds_secret_config.

Example two: SDS server

This example shows how to configure secrets fetched from remote SDS servers:

  1. clusters:
  2. - name: sds_server_mtls
  3. typed_extension_protocol_options:
  4. envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
  5. "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
  6. explicit_http_config:
  7. http2_protocol_options:
  8. connection_keepalive:
  9. interval: 30s
  10. timeout: 5s
  11. load_assignment:
  12. cluster_name: sds_server_mtls
  13. endpoints:
  14. - lb_endpoints:
  15. - endpoint:
  16. address:
  17. socket_address:
  18. address: 127.0.0.1
  19. port_value: 8234
  20. transport_socket:
  21. name: envoy.transport_sockets.tls
  22. typed_config:
  23. "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
  24. common_tls_context:
  25. - tls_certificate:
  26. certificate_chain:
  27. filename: certs/sds_cert.pem
  28. private_key:
  29. filename: certs/sds_key.pem
  30. - name: sds_server_uds
  31. typed_extension_protocol_options:
  32. envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
  33. "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
  34. explicit_http_config:
  35. http2_protocol_options: {}
  36. load_assignment:
  37. cluster_name: sds_server_uds
  38. endpoints:
  39. - lb_endpoints:
  40. - endpoint:
  41. address:
  42. pipe:
  43. path: /tmp/uds_path
  44. - name: example_cluster
  45. connect_timeout: 0.25s
  46. load_assignment:
  47. cluster_name: local_service_tls
  48. ...
  49. transport_socket:
  50. name: envoy.transport_sockets.tls
  51. typed_config:
  52. "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
  53. common_tls_context:
  54. tls_certificate_sds_secret_configs:
  55. - name: client_cert
  56. sds_config:
  57. resource_api_version: V3
  58. api_config_source:
  59. api_type: GRPC
  60. transport_api_version: V3
  61. grpc_services:
  62. google_grpc:
  63. target_uri: unix:/tmp/uds_path
  64. listeners:
  65. ....
  66. filter_chains:
  67. - transport_socket:
  68. name: envoy.transport_sockets.tls
  69. typed_config:
  70. "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
  71. common_tls_context:
  72. tls_certificate_sds_secret_configs:
  73. - name: server_cert
  74. sds_config:
  75. resource_api_version: V3
  76. api_config_source:
  77. api_type: GRPC
  78. transport_api_version: V3
  79. grpc_services:
  80. envoy_grpc:
  81. cluster_name: sds_server_mtls
  82. validation_context_sds_secret_config:
  83. name: validation_context
  84. sds_config:
  85. resource_api_version: V3
  86. api_config_source:
  87. api_type: GRPC
  88. transport_api_version: V3
  89. grpc_services:
  90. envoy_grpc:
  91. cluster_name: sds_server_uds

For illustration, above example uses three methods to access the SDS server. A gRPC SDS server can be reached by Unix Domain Socket path /tmp/uds_path and 127.0.0.1:8234 by mTLS. It provides three secrets, client_cert, server_cert and validation_context. In the config, cluster example_cluster certificate client_cert is configured to use Google gRPC with UDS to talk to the SDS server. The Listener needs to fetch server_cert and validation_context from the SDS server. The server_cert is using Envoy gRPC with cluster sds_server_mtls configured with client certificate to use mTLS to talk to SDS server. The validate_context is using Envoy gRPC with cluster sds_server_uds configured with UDS path to talk to the SDS server.

Example three: certificate rotation for xDS gRPC connection

Managing certificates for xDS gRPC connection between Envoy and xDS server introduces a bootstrapping problem: SDS server cannot manage certificates that are required to connect to the server.

This example shows how to set up xDS connection by sourcing SDS configuration from the filesystem. The certificate and key files are watched with inotify and reloaded automatically without restart. In contrast, Example two: SDS server requires a restart to reload xDS certificates and key after update.

  1. clusters:
  2. - name: control_plane
  3. type: LOGICAL_DNS
  4. connect_timeout: 1s
  5. load_assignment:
  6. cluster_name: control_plane
  7. endpoints:
  8. - lb_endpoints:
  9. - endpoint:
  10. address:
  11. socket_address:
  12. address: controlplane
  13. port_value: 8443
  14. typed_extension_protocol_options:
  15. envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
  16. "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
  17. explicit_http_config:
  18. http2_protocol_options: {}
  19. transport_socket:
  20. name: "envoy.transport_sockets.tls"
  21. typed_config:
  22. "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext"
  23. common_tls_context:
  24. tls_certificate_sds_secret_configs:
  25. name: tls_sds
  26. sds_config:
  27. path: /etc/envoy/tls_certificate_sds_secret.yaml
  28. validation_context_sds_secret_config:
  29. name: validation_context_sds
  30. sds_config:
  31. path: /etc/envoy/validation_context_sds_secret.yaml

Paths to client certificate, including client’s certificate chain and private key are given in SDS config file /etc/envoy/tls_certificate_sds_secret.yaml:

  1. resources:
  2. - "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"
  3. name: tls_sds
  4. tls_certificate:
  5. certificate_chain:
  6. filename: /certs/sds_cert.pem
  7. private_key:
  8. filename: /certs/sds_key.pem

Path to CA certificate bundle for validating the xDS server certificate is given in SDS config file /etc/envoy/validation_context_sds_secret.yaml:

  1. resources:
  2. - "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"
  3. name: validation_context_sds
  4. validation_context:
  5. trusted_ca:
  6. filename: /certs/cacert.pem

In the above example, a watch will be established on /certs. File movement in this directory will trigger an update. An alternative common key rotation scheme that provides improved atomicity is to establish an active symlink /certs/current and use an atomic move operation to replace the symlink. The watch in this case needs to be on the certificate’s grandparent directory. Envoy supports this scheme via the use of watched_directory. Continuing the above examples:

  1. resources:
  2. - "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"
  3. name: tls_sds
  4. tls_certificate:
  5. certificate_chain:
  6. filename: /certs/current/sds_cert.pem
  7. private_key:
  8. filename: /certs/current/sds_key.pem
  9. watched_directory:
  10. path: /certs
  1. resources:
  2. - "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"
  3. name: validation_context_sds
  4. validation_context:
  5. trusted_ca:
  6. filename: /certs/current/cacert.pem
  7. watched_directory:
  8. path: /certs

Secret rotation can be performed with:

  1. ln -s <path to new secrets> /certs/new && mv -Tf /certs/new /certs/current

Statistics

SSL socket factory outputs following SDS related statistics. They are all counter type.

For downstream listeners, they are in the listener.<LISTENER_IP>.server_ssl_socket_factory. namespace.

Name

Description

ssl_context_update_by_sds

Total number of ssl context has been updated.

downstream_context_secrets_not_ready

Total number of downstream connections reset due to empty ssl certificate.

For upstream clusters, they are in the cluster.<CLUSTER_NAME>.client_ssl_socket_factory. namespace.

Name

Description

ssl_context_update_by_sds

Total number of ssl context has been updated.

upstream_context_secrets_not_ready

Total number of upstream connections reset due to empty ssl certificate.

SDS has a statistics tree rooted in the sds.<SECRET_NAME>. namespace. In addition, the following statistics are tracked in this namespace:

Name

Description

key_rotation_failed

Total number of filesystem key rotations that failed outside of an SDS update.