Multi-Region DC/OS on AWS using the Universal Installer

ENTERPRISE

Guide for DC/OS on AWS using the Universal Installer adding a remote region.

This guide expects that you already have a running DC/OS cluster based on Universal Installer 0.3. To learn more about running DC/OS with the Universal Installer have a look into the Guide for DC/OS on AWS using the Universal Installer.

You will learn how to place additional infrastructure into a AWS remote region. Remote regions will be connected to each other by using the AWS VPC Peering Feature.

Prerequisites

  • A running DC/OS Enterprise cluster set up with Universal Installer 0.3 modules
  • A subnet range for your remote region

Getting started with remote region

We expect your already running DC/OS clusters main.tf will look similar to this example. To deploy a remote region we have to do some changes to your main.tf

  1. provider "aws" {
  2. # Change your default region here
  3. region = "us-east-1"
  4. }
  5. # Used to determine your public IP for forwarding rules
  6. data "http" "whatismyip" {
  7. url = "http://whatismyip.akamai.com/"
  8. }
  9. module "dcos" {
  10. source = "dcos-terraform/dcos/aws"
  11. version = "~> 0.3.0"
  12. providers = {
  13. aws = aws
  14. }
  15. cluster_name = "my-dcos-demo"
  16. ssh_public_key_file = "<path-to-public-key-file>"
  17. admin_ips = ["${data.http.whatismyip.body}/32"]
  18. num_masters = "3"
  19. num_private_agents = "2"
  20. num_public_agents = "1"
  21. dcos_version = "2.1"
  22. dcos_variant = "ee"
  23. dcos_license_key_contents = "${file("./license.txt")}"
  24. # Make sure to set your credentials if you do not want the default EE
  25. # dcos_superuser_username = "superuser-name"
  26. # dcos_superuser_password_hash = "${file("./dcos_superuser_password_hash.sha512")}"
  27. dcos_instance_os = "centos_7.5"
  28. bootstrap_instance_type = "m5.large"
  29. masters_instance_type = "m5.2xlarge"
  30. private_agents_instance_type = "m5.xlarge"
  31. public_agents_instance_type = "m5.xlarge"
  32. }
  33. output "masters-ips" {
  34. value = module.dcos.masters-ips
  35. }
  36. output "cluster-address" {
  37. value = module.dcos.masters-loadbalancer
  38. }
  39. output "public-agents-loadbalancer" {
  40. value = module.dcos.public-agents-loadbalancer
  41. }

Remote region provider

The first change we have to apply to your main.tf is adding a specific provider statement for the remote region. In this example we will use us-west-2 with the alias usw2 as our remote region. We also add an alias statement to the provider used deploying our region holding the master instances.

This needs to be done so our modules know which account credentials to use.

Note: Some resources have name length limitations which is the reason we shorten our region name.

  1. provider "aws" {
  2. # Change your default region here
  3. region = "us-east-1"
  4. alias = "master"
  5. }
  6. provider "aws" {
  7. # Change your default region here
  8. region = "us-west-2"
  9. alias = "usw2"
  10. }
  11. # ...

Shared config options

To create the remote region and its infrastructure we will use the same underlying modules as in our master region. This also means there will be some information needed for both infrastructures like cluster_name, admin_ips and ssh_public_key_file. To make the operation easier you should define local variables in your main.tf that will be used in every module.

  1. #...
  2. // lets define variables which are shared between all regions
  3. locals {
  4. ssh_public_key_file = "~/.ssh/id_rsa.pub"
  5. cluster_name = "my-dcos-demo"
  6. admin_ips = ["${data.http.whatismyip.body}/32"]
  7. }
  8. #...

Internal subnetworks

Part of the shared information is which internal subnets are used in your infrastructure. This information needs to be known by all parts of the DC/OS so traffic can be routed and is allowed by the security groups. If you did not specify subnet_range, terraform uses the default which is 172.16.0.0/16. The remote region we want to specify needs its own subnet.

IMPORTANT: You should not take the next free network of 172.16/12 as 172.17.0.0/16 is dockers internal network default which will lead to problems.

To have a clear separation between our master and our remote regions we will take 10.128.0.0/16 as our remote regions subnet. Also, we will use a map variable to assign the networks to regions. This will make it easier when adding additional regions in the future.

The locals section will now look like this

  1. #...
  2. // lets define variables which are shared between all regions
  3. locals {
  4. ssh_public_key_file = "~/.ssh/id_rsa.pub"
  5. cluster_name = "my-dcos-demo"
  6. admin_ips = ["${data.http.whatismyip.body}/32"]
  7. region_networks = {
  8. // dont use 172.17/26 as its used by docker.
  9. "master" = "172.16.0.0/16" // this is the default
  10. "usw2" = "10.128.0.0/16"
  11. }
  12. }
  13. #...

Allowed internal networks

To let our main region allow traffic from the remote region and vice versa we have to specify the accepted_internal_networks variable in both. This variable will inform the security group which allows the agents and masters to communicate to each other. accepted_internal_networks can contain the regions network which makes it extremly easy to let terraform calculate the value for this variable. We will use the values method to retrieve the subnets from locals.region_networks which we previously defined.

The locals section will now look like this:

  1. #...
  2. // lets define variables which are shared between all regions
  3. locals {
  4. ssh_public_key_file = "<path-to-public-key-file>"
  5. cluster_name = "my-dcos-demo"
  6. region_networks = {
  7. // dont use 172.17/26 as its used by docker.
  8. "master" = "172.16.0.0/16" // this is the default
  9. "usw2" = "10.128.0.0/16"
  10. }
  11. accepted_internal_networks = values(local.region_networks)
  12. }
  13. #...

The remote region

Before we start changing values within the dcos module we will append the infrastructure definition of the remote region to your main.tf. In our example case we only want to have private agents in our remote region, and both private and public agents can be put in a remote region.

IMPORTANT: Running master instances in remote regions is not supported.

To only start private agents we will set num_masters = 0 and num_public_agents = 0. Due to some internal limitation we also have to tell the infrastructure module not preparing load balancers for masters and public agents with lb_disable_public_agents and lb_disable_masters

Another important topic to mention is naming. To distinguish between instances of your main and your remote region we introduced the name_prefix variable which allows you to add a prefix to the name of every resource. In this example we set the name_prefix to the short name of the remote region.

In the following example you will also find the shared config options being used in the module call referenced by e.g. local.admin_ips and the provider we specified for the region

  1. #...
  2. module "dcos-usw2" {
  3. source = "dcos-terraform/infrastructure/aws"
  4. version = "~> 0.3.0"
  5. admin_ips = ["${local.admin_ips}"]
  6. name_prefix = "usw2"
  7. cluster_name = local.cluster_name
  8. accepted_internal_networks = values(local.region_networks)
  9. num_masters = 0
  10. num_private_agents = 1
  11. num_public_agents = 0
  12. lb_disable_public_agents = true
  13. lb_disable_masters = true
  14. ssh_public_key_file = local.ssh_public_key_file
  15. subnet_range = local.region_networks["usw2"]
  16. providers = {
  17. aws = aws.usw2
  18. }
  19. }

Peering to the main DC/OS region

We now need to establish a connection between the two infrastructures. The Universal Installer provides a module for this task. In this module we reference data from both infrastructures the main region holding DC/OS masters and the remote region holding your remote private agents.

The only information this module needs to receive is the output of our dcos and dcos-usw2 modules. We will append this module to the end of your main.tf

Here is the example vpc-peering-section

  1. #...
  2. module "vpc-connection-master-usw2" {
  3. source = "dcos-terraform/vpc-peering/aws" // module init the peering
  4. version = "~> 0.3.0"
  5. providers = {
  6. aws.local = aws.master
  7. aws.remote = aws.usw2
  8. }
  9. local_vpc_id = module.dcos.infrastructure_vpc_id
  10. local_subnet_range = local.region_networks["master"]
  11. remote_vpc_id = module.dcos-usw2.vpc_id
  12. remote_subnet_range = local.region_networks["usw2"]
  13. }

Changes to dcos module

At this point its time to do changes to your dcos module so it knows about the remote region and is able to install the remote agents.

  1. Choose a subnet range. In general this change is not needed but we wanted to make your example pretty specific.

subnet_range = local.region_networks["master"]

  1. Add accepted internal networks. Same as in the remote region we need to specify the internal networks to allow internal traffic flow.

accepted_internal_networks = values(local.region_networks)

  1. Change the cluster_name. As this is a shared resource we will make use of the local variable.

cluster_name = local.cluster_name

  1. List the SSH key. This is also a shared resource and we can make use of the local variable

ssh_public_key_file = local.ssh_public_key_file

  1. Add the admin IPs following the same pattern.

`admin_ips = local.admin_ips

  1. Add the private agents. This nearly the most important new variable. This tells the DC/OS installation module which other agents need to be installed.

`additional_private_agent_ips = module.dcos-usw2.private_agents_private_ips

  1. Update the providers section. As we change to explicit alias providers we have to point our dcos module to this specific provider.
  1. providers = {
  2. aws = aws.master
  3. }

Example dcos module

After the changes above have been applied, your dcos module should look like this

  1. module "dcos" {
  2. source = "dcos-terraform/dcos/aws"
  3. version = "~> 0.3.0"
  4. cluster_name = local.cluster_name
  5. ssh_public_key_file = local.ssh_public_key_file
  6. admin_ips = local.admin_ips
  7. subnet_range = local.region_networks["master"]
  8. num_masters = "1"
  9. num_private_agents = "2"
  10. num_public_agents = "1"
  11. dcos_version = "2.1"
  12. dcos_instance_os = "centos_7.5"
  13. bootstrap_instance_type = "m5.large"
  14. masters_instance_type = "m5.2xlarge"
  15. private_agents_instance_type = "m5.xlarge"
  16. public_agents_instance_type = "m5.xlarge"
  17. accepted_internal_networks = values(local.region_networks)
  18. additional_private_agent_ips = module.dcos-usw2.private_agents_private_ips
  19. providers = {
  20. aws = aws.master
  21. }
  22. dcos_variant = "ee"
  23. dcos_license_key_contents = "${file("./license.txt")}"
  24. }

Full main.tf example

Here is the complete main.tf you should see once you completed this guide.

  1. provider "aws" {
  2. # Change your default region here
  3. region = "us-east-1"
  4. alias = "master"
  5. }
  6. provider "aws" {
  7. # Change your default region here
  8. region = "us-west-2"
  9. alias = "usw2"
  10. }
  11. // lets define variables which are shared between all regions
  12. locals {
  13. ssh_public_key_file = "~/.ssh/id_rsa.pub"
  14. cluster_name = "my-dcos-demo"
  15. admin_ips = ["${data.http.whatismyip.body}/32"]
  16. region_networks = {
  17. // dont use 172.17/26 as its used by docker.
  18. "master" = "172.16.0.0/16" // this is the default
  19. "usw2" = "10.128.0.0/16"
  20. }
  21. }
  22. module "dcos" {
  23. source = "dcos-terraform/dcos/aws"
  24. version = "~> 0.3.0"
  25. cluster_name = local.cluster_name
  26. ssh_public_key_file = local.ssh_public_key_file
  27. admin_ips = local.admin_ips
  28. subnet_range = local.region_networks["master"]
  29. num_masters = "1"
  30. num_private_agents = "2"
  31. num_public_agents = "1"
  32. dcos_version = "2.1"
  33. dcos_instance_os = "centos_7.5"
  34. bootstrap_instance_type = "m5.large"
  35. masters_instance_type = "m5.2xlarge"
  36. private_agents_instance_type = "m5.xlarge"
  37. public_agents_instance_type = "m5.xlarge"
  38. accepted_internal_networks = values(local.region_networks)
  39. additional_private_agent_ips = module.dcos-usw2.private_agents_private_ips
  40. providers = {
  41. aws = aws.master
  42. }
  43. dcos_variant = "ee"
  44. dcos_license_key_contents = "${file("./license.txt")}"
  45. }
  46. # Used to determine your public IP for forwarding rules
  47. data "http" "whatismyip" {
  48. url = "http://whatismyip.akamai.com/"
  49. }
  50. output "masters-ips" {
  51. value = module.dcos.masters-ips
  52. }
  53. output "cluster-address" {
  54. value = module.dcos.masters-loadbalancer
  55. }
  56. output "public-agents-loadbalancer" {
  57. value = module.dcos.public-agents-loadbalancer
  58. }
  59. module "dcos-usw2" {
  60. source = "dcos-terraform/infrastructure/aws"
  61. version = "~> 0.3.0"
  62. admin_ips = local.admin_ips
  63. name_prefix = "usw2"
  64. cluster_name = local.cluster_name
  65. accepted_internal_networks = values(local.region_networks)
  66. num_masters = 0
  67. num_private_agents = 1
  68. num_public_agents = 0
  69. lb_disable_public_agents = true
  70. lb_disable_masters = true
  71. ssh_public_key_file = local.ssh_public_key_file
  72. subnet_range = local.region_networks["usw2"]
  73. providers = {
  74. aws = aws.usw2
  75. }
  76. }
  77. module "vpc-connection-master-usw2" {
  78. source = "dcos-terraform/vpc-peering/aws" // module init the peering
  79. version = "~> 0.3.0"
  80. providers = {
  81. aws.local = aws.master
  82. aws.remote = aws.usw2
  83. }
  84. local_vpc_id = module.dcos.infrastructure_vpc_id
  85. local_subnet_range = local.region_networks["master"]
  86. remote_vpc_id = module.dcos-usw2.vpc_id
  87. remote_subnet_range = local.region_networks["usw2"]
  88. }