Resource Management Primitives

Reserving resources to support multi-tenancy

Resources in DC/OS can be reserved and prioritized using a combination of roles, reservations, quotas, and weights. These features are provided by Apache Mesos, at the core of DC/OS and are referred to as Primitives, as most of them are only accessible via the API and have not yet been integrated into the DC/OS UI or CLI. A user requires good monitoring in place of available/used resources when working with quotas, reservations, and weights.

Resource management in this context refers to concepts such as reservations of resources on agents, resource quotas, and weights (priorities) for frameworks. These are useful for a number of scenarios, such as configuring multi-tenant environments, where multiple teams or projects co-exist on the same DC/OS cluster, and the available resources (CPU, RAM, disk, and ports) must be carved up and guaranteed for each cluster with guaranteed quotas. Secondly, with mixed workloads on a single cluster where one class of frameworks may have a higher weight (priority) than another, resources with high priority should be able to deploy faster than a lower weight framework.

This page covers the multi-tenancy primitives: Multi-Tenant quota management primitives, two examples of real-world scenarios, implementation instructions, and reference links.

Multi-Tenant Resource Management Primitives

The key concepts of multi-tenancy primitives include the following:

Roles

Roles refer to a resource consumer within the cluster. The resource consumer could represent a user within an organization, but it could also represent a team, a group, or a service. It most commonly refers to a class or type of activity which is running. In DC/OS, schedulers subscribe to one or more roles in order to receive resources and schedule work on behalf of the resource consumer(s) they are servicing. Examples of schedulers include Marathon, Kubernetes, and a number of the certified frameworks in the DC/OS Catalog such as Kafka and Cassandra, which have been built to include their own scheduler.

There are two default roles which frameworks will subscribe to:

  • * on private agents
  • slave_public on public.

Frameworks from the Catalog deploy with their own roles and unique roles can be created on demand.

Reservations

Reservations refer to where the resources are reserved on targeted public and private agents for a specific role. Statically reserved resources are applied on agent (public/private) startup and cannot be amended for other roles without restarting the agent. Dynamically reserved resources enable operators and authorized frameworks to reserve and un-reserve resources after agent startup and on demand. All SDK based frameworks, like Kafka and Cassandra (certified frameworks listed in the DC/OS Catalog) leverage dynamic reservations for reserving the resources they intend to use with a deployment.

Quotas

Quotas refer to a mechanism for guaranteeing that a role will receive a specific amount of resources. Today, quotas are a maximal; if a quota is defined and the task for the role is deployed, then those resources will be reserved immediately, whether the task scales up to use them or not. Other tasks will not be able to make use of those resources even though they may not be used by the task they are provided for. Dynamic quotas, where the task will only use what it needs at the time but is guaranteed to reach its quota, revocable resources and over-subscription are planned for a future release.

Weights

Weights refer to a mechanism for prioritizing one role over another, to allow all tasks assigned to that role to receive more offers (of resources) over other roles with a lower weight. This can provide faster deployment time, scaling and replacement of tasks.

Examples

These concepts are described based on two real-world scenarios of existing customer use cases.

Analytics platform with weighted Spark roles

This example is based on a customer’s use case of an analytics pipeline. The primary workload is Spark with three tiers of Spark jobs, tagged with roles; “low” - 1, “medium” - 2, and “high” - 3, representing the priority and weights accordingly.

In practice, the “high” role is allocated three times the fair share of offers (resources) than “medium”, which will be provided twice the fair share of “low”. Alongside weights, the “high” priority Spark role is provided a quota of x CPU shares and y RAM.

As Spark jobs are deployed, the “high” priority Spark jobs receive their offers over the “medium” and “low” roles. Given that the “medium” and “low” priority roles do not have a quota applied, “medium” roles will be provided offers sooner than “low” priority roles, but there is no quota for “medium”, so if “medium” requires z cores and they are not available, it will receive however many are available at that time.

Jenkins in Marathon on Marathon

In this example, a customer runs Jenkins (CI/CD pipeline) as a service with hundreds of instances, one instance for each development team that requires a service run.

On the DC/OS cluster, there are other applications-as-a-service deployed as Marathon tasks. Each application, including Jenkins, is grouped in its own instance of Marathon (referred to as Marathon on Marathon or MoM) and in DC/OS documentation as non-native Marathon - where native Marathon is the default Marathon that ships with DC/OS. Conceptually, there is a native Marathon and non-native Marathon on Marathon that are dedicated for grouping other tasks.

Each MoM hosts one of the groups of the application, and each has a role and quota attached. Each role and quota provides a method to guarantee that where one of them scales frequently, like Jenkins does as it spins up its agents on demand for a new build, it can get the resources it requires. If Jenkins requires more resources, the quota can be amended on the fly to provide them. Another common use of MoMs is for grouping environments such as Development, Testing, and Staging on one DC/OS cluster with robust resource and access management.

In summary, Jenkins-as-a-service is a very dynamic workload, with hundreds of Jenkins agents being run on demand. Having good visibility of the resources available and understanding when the quota is reached are important parameters for tuning, availability and growth. The Spark example measures how much sooner the “high” role tasks ran than the “low”, to inform the tuning of the weights.

Implementation

You can use the following resources to learn how to implement both Marathon on Marathon and Spark quotas:

In the examples below, it is recommended to run the application from a host with DC/OS CLI installed.

Roles

Roles refer to a tag or a label which is assigned to a framework, task, or an agent. The default role is called * and all existing roles in a cluster can be viewed through the Mesos UI: https://<cluster-name-or-IP>/mesos/#/roles.

In the following example, a role called high is assigned to a Spark task at runtime. Multiple instances of the Spark task can be executed, ensuring they all benefit from the resource management associated with high.

  • spark.mesos.role=high Applications in the DC/OS Catalog, like Kafka and Cassandra, are automatically deployed with a common role name which is not user configurable.

  • confluent_kafka_role Roles do not require explicit management, like configuring a new role and assigning it to a task. They are created on demand when deploying a task or configuring a weight or quota. Likewise, you should not delete roles, they exist for the duration of the cluster.

Reservations

Reservations can be manually configured and are used by SDK frameworks. In both cases, an authorized user must be declared which is referred to as the principal/framework or an operator. In the case of SDK frameworks in DC/OS this is also known as the service account.

Adding

Adding reserves resources on a specific agent with ID 312dc1dc-9b39-474f-8295-87fc43872e7c-S0 for role “low”, guaranteeing four CPU shares and 512MB of RAM. When any task with a role of “low” requests offers that match what this agent has reserved then the task will be guaranteed to the agent itself.

NOTE: The principal of bootstrapuser differs for each user. In this example, the principal of bootstrapuser is my superuser account.

You must change the agent_id for the agent ID on your cluster. Use dcos node to find the agent ID.

NOTE: All double quotes in the JSON examples below require sanitizing before use when copying and pasting into editors or a terminal.

  1. tee add-reservation.json << EOF
  2. {
  3. "type": "RESERVE_RESOURCES",
  4. "reserve_resources": {
  5. "agent_id": {
  6. "value": "312dc1dc-9b39-474f-8295-87fc43872e7c-S0"
  7. },
  8. "resources": [
  9. {
  10. "type": "SCALAR",
  11. "name": "cpus",
  12. "reservation": {
  13. "principal": "bootstrapuser"
  14. },
  15. "role": "confluent-kafka-role",
  16. "scalar": {
  17. "value": 4.0
  18. }
  19. },
  20. {
  21. "type": "SCALAR",
  22. "name": "mem",
  23. "reservation": {
  24. "principal": "bootstrapuser"
  25. },
  26. "role": "confluent-kafka-role",
  27. "scalar": {
  28. "value": 512.0
  29. }
  30. },
  31. {
  32. "type": "RANGES",
  33. "name": "ports",
  34. "reservation": {
  35. "principal": "bootstrapuser"
  36. },
  37. "role": "confluent-kafka-role",
  38. "ranges": {
  39. "range": [
  40. {
  41. "begin": 8112,
  42. "end": 8114
  43. }
  44. ]
  45. }
  46. }
  47. ]
  48. }
  49. }
  50. EOF
  51. curl -i -k \
  52. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  53. -H "Content-Type: application/json" \
  54. -H "Accept: application/json" \
  55. -d @add-reservation.json \
  56. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"

If successful, an HTTP 202 response is expected.

If resources are not available for reservation, an HTTP 409 response is expected and the reservation cannot be made on that agent. There may already be tasks running that have consumed those resources.

Reviewing

Reviewing is best achieved through the Mesos UI against the specific agent which you applied the reservation or by parsing the state.json through jq.

  1. https://<cluster-URL>/mesos/#/agents/<agent-id>

Removing

Removing requires amending the input JSON to reference only the resources in the following format:

NOTE: Change the agent_id to match the agent ID on your cluster (as in the previous example).

  1. tee remove-reservation.json << EOF
  2. {
  3. "type": "UNRESERVE_RESOURCES",
  4. "unreserve_resources": {
  5. "agent_id": {
  6. "value": "312dc1dc-9b39-474f-8295-87fc43872e7c-S0"
  7. },
  8. "resources": [
  9. {
  10. "type": "SCALAR",
  11. "name": "cpus",
  12. "reservation": {
  13. "principal": "bootstrapuser"
  14. },
  15. "role": "confluent-kafka-role",
  16. "scalar": {
  17. "value": 4.0
  18. }
  19. },
  20. {
  21. "type": "SCALAR",
  22. "name": "mem",
  23. "reservation": {
  24. "principal": "bootstrapuser"
  25. },
  26. "role": "confluent-kafka-role",
  27. "scalar": {
  28. "value": 512.0
  29. }
  30. },
  31. {
  32. "type": "RANGES",
  33. "name": "ports",
  34. "reservation": {
  35. "principal": "bootstrapuser"
  36. },
  37. "role": "confluent-kafka-role",
  38. "ranges": {
  39. "range": [
  40. {
  41. "begin": 8112,
  42. "end": 8114
  43. }
  44. ]
  45. }
  46. }
  47. ]
  48. }
  49. }
  50. EOF
  51. curl -i -k \
  52. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  53. -H "Content-Type: application/json" \
  54. -H "Accept: application/json" \
  55. -d @remove-reservation.json \
  56. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"

There are further options related to dynamic and static operations and amending existing reservations that can be found in the reference links.

Quotas

Quotas specify a minimum amount of resources that the role is guaranteed to receive (unless the total resources in the cluster are less than the configured quota resources, which often indicates a misconfiguration).

Adding

Quotas cannot be updated once applied; they must be removed and added again. The following example applies a quota of two CPU shares and 4GB of RAM to a role called high.

  1. tee set-quota.json << EOF
  2. {
  3. "type": "SET_QUOTA",
  4. "set_quota": {
  5. "quota_request": {
  6. "force": true,
  7. "guarantee": [
  8. {
  9. "name": "cpus",
  10. "role": "*",
  11. "scalar": {
  12. "value": 2.0
  13. },
  14. "type": "SCALAR"
  15. },
  16. {
  17. "name": "mem",
  18. "role": "*",
  19. "scalar": {
  20. "value": 4096.0
  21. },
  22. "type": "SCALAR"
  23. }
  24. ],
  25. "role": "high"
  26. }
  27. }
  28. }
  29. EOF
  30. curl -i -k \
  31. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  32. -H "Content-Type: application/json" \
  33. -H "Accept: application/json" \
  34. -d @set-quota.json \
  35. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"

If successful, expect a HTTP/1.1 200 OK response.

Reviewing

  1. tee get-quota.json << EOF
  2. {
  3. "type": "GET_QUOTA"
  4. }
  5. EOF
  6. curl -i -k \
  7. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  8. -H "Content-Type: application/json" \
  9. -H "Accept: application/json" \
  10. -d @get-quota.json \
  11. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"
  12. HTTP/1.1 200 OK
  13. Server: openresty
  14. Date: Fri, 21 Sep 2018 15:35:09 GMT
  15. Content-Type: application/json
  16. Content-Length: 224
  17. Connection: keep-alive
  18. {"type":"GET_QUOTA","get_quota":{"status":{"infos":[{"role":"high","principal":"bootstrapuser","guarantee":[{"name":"cpus","type":"SCALAR","scalar":{"value":2.0}},{"name":"mem","type":"SCALAR","scalar":{"value":128.0}}]}]}}}

Removing

  1. tee remove-quota.json << EOF
  2. {
  3. "type": "REMOVE_QUOTA",
  4. "remove_quota": {
  5. "role": "high"
  6. }
  7. }
  8. EOF
  9. curl -i -k \
  10. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  11. -H "Content-Type: application/json" \
  12. -H "Accept: application/json" \
  13. -d @remove-quota.json \
  14. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"
  15. HTTP/1.1 200 OK
  16. Server: openresty
  17. Date: Fri, 21 Sep 2018 15:38:15 GMT
  18. Content-Length: 0
  19. Connection: keep-alive

If successful, expect a HTTP/1.1 200 OK response.

Weights

Weights are used to control the relative share of cluster resources that is offered to different roles.

Applying

This applies a weight of five to role perf.

  1. tee set-weight.json << EOF
  2. {
  3. "type": "UPDATE_WEIGHTS",
  4. "update_weights": {
  5. "weight_infos": [
  6. {
  7. "role": "perf",
  8. "weight": 5.0
  9. }
  10. ]
  11. }
  12. }
  13. EOF
  14. curl -i -k \
  15. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  16. -H "Content-Type: application/json" \
  17. -H "Accept: application/json" \
  18. -d @set-weight.json \
  19. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"

If successful, expect a HTTP/1.1 200 OK response.

Reviewing

  1. tee get-weight.json << EOF
  2. {
  3. "type": "GET_WEIGHTS"
  4. }
  5. EOF
  6. curl -i -k \
  7. -H "Authorization: token=`dcos config show core.dcos_acs_token`" \
  8. -H "Content-Type: application/json" \
  9. -H "Accept: application/json" \
  10. -d @get-weight.json \
  11. -X POST "`dcos config show core.dcos_url`/mesos/api/v1"
  12. HTTP/1.1 200 OK
  13. Server: openresty
  14. Date: Fri, 21 Sep 2018 15:25:25 GMT
  15. Content-Type: application/json
  16. Content-Length: 84
  17. Connection: keep-alive
  18. {"type":"GET_WEIGHTS","get_weights":{"weight_infos":[{"weight":5.0,"role":"perf"}]}}

Removing

Weights cannot be removed once set; they can be amended using the same method as applying to update the weight. If you wish to reset the weight for a role, you could set it back to two which is the same weight as the default role *.

Marathon on Marathon

The DC/OS Catalog includes Marathon, which can be used to deploy a MoM. It should be noted that this is only useful for DC/OS OSS installations, as it does not provide support for Strict mode, Secrets or ACLs. See the Marathon on Marathon documentation.

To install DC/OS Enterprise MoM, you must contact Mesosphere Support for the Enterprise MoM tarball, then deploy it using the root Marathon. See the custom non-native Marathon documentation.

Additional Resources

You can use the following additional resources to learn more about: