vtorc
VTOrc is the automated fault detection and repair tool of Vitess.
Example Usage
Start VTOrc as follows:
export TOPOLOGY_FLAGS="--topo_implementation etcd2 --topo_global_server_address localhost:2379 --topo_global_root /vitess/global"
export VTDATAROOT="/tmp"
vtorc \
$TOPOLOGY_FLAGS \
--log_dir $VTDATAROOT/tmp \
--port 15000 \
--recovery-period-block-duration "10m" \
--instance-poll-time "1s" \
--topo-information-refresh-duration "30s" \
--alsologtostderr
Options
The following command line options apply to VTOrc:
Name | Type | Definition |
---|---|---|
—alsologtostderr | boolean | log to standard error as well as files |
—audit-file-location | string | File location where the audit logs are to be stored |
—audit-purge-duration | duration | Duration for which audit logs are held before being purged. Should be in multiples of days (default 168h0m0s) |
—audit-to-backend | boolean | Whether to store the audit log in the VTOrc database |
—audit-to-syslog | boolean | Whether to store the audit log in the syslog |
—catch-sigpipe | boolean | catch and ignore SIGPIPE on stdout and stderr if specified |
—clusters_to_watch | strings | Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: “ks1,ks2/-80” |
—config | string | config file name |
—consul_auth_static_file | string | JSON File to read the topos/tokens from. |
—grpc_auth_static_client_creds | string | When using grpc_static_auth in the server, this file provides the credentials to use to authenticate with server. |
—grpc_compression | string | Which protocol to use for compressing gRPC. Default: nothing. Supported: snappy |
—grpc_enable_tracing | boolean | Enable gRPC tracing. |
—grpc_initial_conn_window_size | int | gRPC initial connection window size |
—grpc_initial_window_size | int | gRPC initial window size |
—grpc_keepalive_time | duration | After a duration of this time, if the client doesn’t see any activity, it pings the server to see if the transport is still alive. (default 10s) |
—grpc_keepalive_timeout | duration | After having pinged for keepalive check, the client waits for a duration of Timeout and if no activity is seen even after that the connection is closed. (default 10s) |
—grpc_max_message_size | int | Maximum allowed RPC message size. Larger messages will be rejected by gRPC with the error ‘exceeding the max size’. (default 16777216) |
—grpc_prometheus | boolean | Enable gRPC monitoring with Prometheus. |
-h, —help | boolean | display usage and exit |
—instance-poll-time | duration | Timer duration on which VTOrc refreshes MySQL information (default 5s) |
—keep_logs | duration | keep logs for this long (using ctime) (zero to keep forever) |
—keep_logs_by_mtime | duration | keep logs for this long (using mtime) (zero to keep forever) |
—lameduck-period | duration | keep running at least this long after SIGTERM before stopping (default 50ms) |
—lock-timeout | duration | Maximum time for which a shard/keyspace lock can be acquired for (default 45s) |
—log_backtrace_at | traceLocation | when logging hits line file:N, emit a stack trace (default :0) |
—log_dir | string | If non-empty, write log files in this directory |
—log_err_stacks | boolean | log stack traces for errors |
—log_rotate_max_size | uint | size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800) |
—logtostderr | boolean | log to standard error instead of files |
—onclose_timeout | duration | wait no more than this for OnClose handlers before stopping (default 10s) |
—onterm_timeout | duration | wait no more than this for OnTermSync handlers before stopping (default 10s) |
—pid_file | string | If set, the process will write its pid to the named file, and delete it on graceful shutdown. |
—port | int | port for the server |
—pprof | strings | enable profiling |
—prevent-cross-cell-failover | boolean | Prevent VTOrc from promoting a primary in a different cell than the current primary in case of a failover |
—purge_logs_interval | duration | how often try to remove old logs (default 1h0m0s) |
—reasonable-replication-lag | duration | Maximum replication lag on replicas which is deemed to be acceptable (default 10s) |
—recovery-period-block-duration | duration | Duration for which a new recovery is blocked on an instance after running a recovery (default 30s) |
—recovery-poll-duration | duration | Timer duration on which VTOrc polls its database to run a recovery (default 1s) |
—remote_operation_timeout | duration | time to wait for a remote operation (default 15s) |
—security_policy | string | the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only) |
—shutdown_wait_time | duration | Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s) |
—snapshot-topology-interval | duration | Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours |
—sqlite-data-file | string | SQLite Datafile to use as VTOrc’s database (default “file::memory:?mode=memory&cache=shared”) |
—stderrthreshold | severity | logs at or above this threshold go to stderr (default 1) |
—tablet_manager_grpc_ca | string | the server ca to use to validate servers when connecting |
—tablet_manager_grpc_cert | string | the cert to use to connect |
—tablet_manager_grpc_concurrency | int | concurrency to use to talk to a vttablet server for performance-sensitive RPCs (like ExecuteFetchAs{Dba,AllPrivs,App}) (default 8) |
—tablet_manager_grpc_connpool_size | int | number of tablets to keep tmclient connections open to (default 100) |
—tablet_manager_grpc_crl | string | the server crl to use to validate server certificates when connecting |
—tablet_manager_grpc_key | string | the key to use to connect |
—tablet_manager_grpc_server_name | string | the server name to use to validate server certificate |
—tablet_manager_protocol | string | Protocol to use to make tabletmanager RPCs to vttablets. (default “grpc”) |
—topo-information-refresh-duration | duration | Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topology server (default 15s) |
—topo_consul_lock_delay | duration | LockDelay for consul session. (default 15s) |
—topo_consul_lock_session_checks | string | List of checks for consul session. (default “serfHealth”) |
—topo_consul_lock_session_ttl | string | TTL for consul session. |
—topo_consul_watch_poll_duration | duration | time of the long poll for watch queries. (default 30s) |
—topo_etcd_lease_ttl | int | Lease TTL for locks and leader election. The client will use KeepAlive to keep the lease going. (default 30) |
—topo_etcd_tls_ca | string | path to the ca to use to validate the server cert when connecting to the etcd topo server |
—topo_etcd_tls_cert | string | path to the client cert to use to connect to the etcd topo server, requires topo_etcd_tls_key, enables TLS |
—topo_etcd_tls_key | string | path to the client key to use to connect to the etcd topo server, enables TLS |
—topo_global_root | string | the path of the global topology data in the global topology server |
—topo_global_server_address | string | the address of the global topology server |
—topo_implementation | string | the topology implementation to use |
—topo_k8s_context | string | The kubeconfig context to use, overrides the ‘current-context’ from the config |
—topo_k8s_kubeconfig | string | Path to a valid kubeconfig file. When running as a k8s pod inside the same cluster you wish to use as the topo, you may omit this and the below arguments, and Vitess is capable of auto-discovering the correct values. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod |
—topo_k8s_namespace | string | The kubernetes namespace to use for all objects. Default comes from the context or in-cluster config |
—topo_zk_auth_file | string | auth to use when connecting to the zk topo server, file contents should be |
—topo_zk_base_timeout | duration | zk base timeout (see zk.Connect) (default 30s) |
—topo_zk_max_concurrency | int | maximum number of pending requests to send to a Zookeeper server. (default 64) |
—topo_zk_tls_ca | string | the server ca to use to validate servers when connecting to the zk topo server |
—topo_zk_tls_cert | string | the cert to use to connect to the zk topo server, requires topo_zk_tls_key, enables TLS |
—topo_zk_tls_key | string | the key to use to connect to the zk topo server, enables TLS |
—v | value | log level for V logs |
—version | boolean | print binary version |
—vmodule | value | comma-separated list of pattern=N settings for file-filtered logging |
—wait-replicas-timeout | duration | Duration for which to wait for replica’s to respond when issuing RPCs (default 30s) |