Orca Internals Overview
This article describes the internal workings of Orca, the Spinnaker pipeline orchestration service.
Spinnaker is known and often loved for enabling accessible, highly configurable and composable workflows through its pipelines and tasks. This is largely thanks to the code that makes up Orca.
From Orca’s README :
Orca is the orchestration engine for Spinnaker. It is responsible for taking a pipeline or task definition and managing the stages and tasks, coordinating the other Spinnaker services.
Orca is a stateless, horizontally scalable service. Spinnaker’s execution state is persisted to a backend store and work is distributed evenly through a work queue. So long as your backend persistence layer has capacity, you can freely add or remove Orca servers to match workload demands.
Internals
Domain
The primary domain model is an Execution, of which there are two types: PIPELINE
and ORCHESTRATION
. The PIPELINE
type is for pipelines while ORCHESTRATION
s is unscheduled, ad-hoc API-submitted actions and what you see in the UI under the “Tasks” tab in an Application. At this point in Spinnaker’s life, these two types are nearly programmatically identical, the key difference being that a Pipeline has a predefined persistent configuration, whereas an Orchestration is arbitarily configured at request-time and serial in execution.
For one Execution, there are one or many Stages organized as a DAG (Directed Acyclic Graph). An Execution supports concurrent branching logic of Stages. These Stages define higher-order, user-friendly actions and concepts, such as “Resize Server Group” or “Deploy” and can be chained together to form complex delivery workflows.
Within a Stage, there are one or more Tasks (not to be confused with the Tasks tab in the UI). A Task can be considered an atomic unit of work, focused on a single action. If you are familiar with the Spinnaker UI, you’ll note that for every Stage, there are multiple line items that correlate to a Stage’s runtime: These are often Tasks. Tasks are always executed sequentially and serially.
It’s important to note that a Stage is recursively composable. One stage can have zero to many Synthetic Stages, that is, stages that can occur either BEFORE
or AFTER
its parent stage. This is most easily illustrated in a Canary stage, which deploys multiple Server Groups (a baseline and canary). While to the end-user this appears to be a single stage, it’s actually a composition of multiple Deploy stages with some extra logic on top.
Lastly, an Execution’s stage graph does not need to be known ahead of time. An Execution can lay its tracks down ahead of itself as it is running, which is how some of the more advanced functionality is implemented, like automatic rollbacks or canary behaviors.
Runtime
Orca uses a distributed queue library, Keiko , to manage its work. Keiko uses atomic messages to progress work through its lifecycle and allows Orca to first be resilient to node failure, as well as spread load evenly across the deployment making Orca easier to scale and operate.
Keiko itself is an abstraction around a delay queue, meaning it can deliver Messages immediately or at specific times in the future, as well as reschedule messages that have already been added to the queue for a new delivery time. This delay queue functionality is pivotal to Orca’s performance and operational characteristics when dealing with long running operations.
A Message is a fine-grained unit of work. As an example, a StartStage
message will verify that all upstream branches are complete and then perform any logic associated with preparing for the stage’s execution. It won’t actually perform the stage’s core actions. A RunTask
is really the only message that actually performs stage work that a user would be familiar with.
Some message types are redeliverable and can be duplicated, whereas others are not. For example, in a pipeline that has two concurrent stage branches that do not join at the end, there will be two CompleteExecution
messages sent. This is fine, because the CompleteExecutionHandler
has logic to verify that indeed all stages are in a completed state before marking the entire execution as complete, whether it ultimately be a failure or success.
For every message type, there is a single correlated Handler
that contains all of the logic for processing this Message
type. A Message’s contents are small and the bare minimum of information to process the request: For instance, an Execution is referenced by ID rather than containing its entire state. Orca builds atop these foundational Handlers in the form of StageDefinitionBuilder
and other classes to build the DAG and implement actual Spinnaker orchestration logic.
The queue can be categorized into two parts: The QueueProcessor, and its Handlers. Orca uses a single thread to run QueueProcessor
, which polls the oldest messages “ready” off the queue. A ready message is one whose delivery time is now or in the past: It is normal to have many more messages in the queue than those that are actually ready.
A separate worker thread pool is used for Handlers. Orca depends on threads in that pool in order to scale sufficiently. You can tune threads either by adjusting the pool size or by increasing the number of Orca instances. By ignoring downstream bottlenecks Orca can scale horizontally to meet Spinnaker work demand.
If the queue starts to back up, Executions and their Tasks will begin to take longer to start or complete. Sometimes the queue will back up because of downstream service pressure (for example, Clouddriver), but may also be due to Orca not having enough threads or persistence capacity.
One further note about the QueueProcessor
and its thread pool. The QueueProcessor polls the queue on a regular interval (default 10ms) and tries to fill its thread pool if there’s work ready to process. If the thread pool has a core size of 20, and 5 are already busy, it will attempt to take 15 messages off the queue immediately rather than one. This is because in most cases a message takes less than 1ms to complete and we don’t want Orca to idle unnecessarily.
Last modified June 24, 2021: Redesign Progress (#83) (8231bcf)