Prometheus metrics

Typha can be configured to report a number of metrics through Prometheus. See the configuration reference for how to enable metrics reporting.

Metric reference

Typha specific

Typha exports a number of Prometheus metrics. The current set is as follows. Since some metrics are tied to particular implementation choices inside Typha we can’t make any hard guarantees that metrics will persist across releases. However, we aim not to make any spurious changes to existing metrics.

Terminology

Syncer: Many of Typha’s metrics are now parameterised by “syncer type”; Typha runs one “syncer” for each type of client that it supports. The “syncer” is the component that synchronises Typha’s local cache of the datastore with the upstream datastore. The syncer type is attached to the metrics via a Prometheus label syncer="...".

Breadcrumb: Typha’s internal cache stores a series of snapshots of the state of the datastore along with a list of changes when compared to the previous snapshot. We call the combination of a snapshot and the list of changes a “breadcrumb”. Breadcrumbs are linked together into a linked list as they are created. When a client connects, Typha sends the snapshot from the most recent breadcrumb to the client; then, it “follows the breadcrumbs” on behalf of that client, sending it the change list from each breadcrumb.

NameDescription
typha_cache_sizeThe total number of key/value pairs in Typha’s in-memory cache.
typha_snapshots_generatedThe total number of binary snapshots generated by Typha. Binary snapshots are generated once and then shared between multiple clients for performance.
typha_snapshots_reusedThe number of binary snapshots that Typha was able to reuse for multiple clients, thus reducing CPU usage.
typha_snapshot_raw_bytesThe size of the most recent binary snapshot in bytes pre-compression.
typha_snapshot_compressed_bytesThe size of the most recent binary snapshot in bytes post-compression.
typha_breadcrumb_blockCount of the number of times Typha got the next Breadcrumb after blocking.
typha_breadcrumb_non_blocktypha_breadcrumb_non_block Count of the number of times Typha got the next Breadcrumb without blocking.
typha_breadcrumb_seq_numberCurrent (server-local) sequence number; number of snapshot deltas processed.
typha_breadcrumb_sizeNumber of KVs recorded in each breadcrumb.
typha_client_latency_secsPer-client latency. I.e. how far behind the current state is each client.
typha_client_snapshot_send_secsHow long it took to send the initial snapshot to each client.
typha_client_write_latency_secsPer-client write. How long each write call is taking.
typha_connections_acceptedTotal number of connections accepted over time.
typha_connections_activeNumber of open client connections (including connections that have not completed the handshake).
typha_connections_streamingNumber of client connections that are actively streaming (i.e. connections that successfully completed the handshake).
typha_connections_droppedTotal number of connections dropped due to rebalancing.
typha_kvs_per_msgNumber of KV pairs sent in each message.
typha_log_errorsNumber of errors encountered while logging.
typha_logs_droppedNumber of logs dropped because the output stream was blocked.
typha_next_breadcrumb_latency_secsTime to retrieve next breadcrumb when already behind.
typha_ping_latencyRound-trip ping/pong latency to client. Typha’s protocol includes a regular ping/pong keepalive to verify that the connection is still up.
typha_updates_skippedTotal number of updates skipped because the datastore change was not relevant. (For example, an update to a Kubernetes Pod field that Calico does not read.)
typha_updates_totalTotal number of updates received from the datastore.
remote_cluster_connection_statusStatus of the remote cluster connection in federation. Represented as numeric values 0 (NotConnecting), 1 (Connecting), 2 (InSync), 3 (ReSyncInProgress), 4 (ConfigChangeRestartRequired), 5 (ConfigInComplete). Uses remote_cluster_name label to represent name of the remote cluster in federation.

Prometheus metrics are self-documenting, with metrics turned on, curl can be used to list the metrics along with their help text and type information.

  1. curl -s http://localhost:9091/metrics | head

Example response:

  1. # HELP typha_breadcrumb_block Count of the number of times Typha got the next Breadcrumb after blocking.
  2. # TYPE typha_breadcrumb_block counter
  3. typha_breadcrumb_block 57
  4. # HELP typha_breadcrumb_non_block Count of the number of times Typha got the next Breadcrumb without blocking.
  5. # TYPE typha_breadcrumb_non_block counter
  6. typha_breadcrumb_non_block 0
  7. # HELP typha_breadcrumb_seq_number Current (server-local) sequence number; number of snapshot deltas processed.
  8. # TYPE typha_breadcrumb_seq_number gauge
  9. typha_breadcrumb_seq_number 22215
  10. ...

CPU / memory metrics

Typha also exports the default set of metrics that Prometheus makes available. Currently, those include:

NameDescription
go_gc_duration_secondsA summary of the GC invocation durations.
go_goroutinesNumber of goroutines that currently exist.
go_memstats_alloc_bytesNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalTotal number of frees.
go_memstats_gc_sys_bytesNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesNumber of heap bytes that are in use.
go_memstats_heap_objectsNumber of allocated objects.
go_memstats_heap_released_bytes_totalTotal number of heap bytes released to OS.
go_memstats_heap_sys_bytesNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalTotal number of pointer lookups.
go_memstats_mallocs_totalTotal number of mallocs.
go_memstats_mcache_inuse_bytesNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesNumber of bytes obtained by system. Sum of all system allocations.
process_cpu_seconds_totalTotal user and system CPU time spent in seconds.
process_max_fdsMaximum number of open file descriptors.
process_open_fdsNumber of open file descriptors.
process_resident_memory_bytesResident memory size in bytes.
process_start_time_secondsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesVirtual memory size in bytes.
promhttp_metric_handler_requests_in_flightCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalTotal number of scrapes by HTTP status code.