NOTE: This topic’s examples are for a CentOS 7 platform. For a full list of supported platforms, see the Supported Platforms topic.
With this option you create a base virtual machine from an existing CentOS 7 virtual machine, use Terraform from the jumpbox virtual machine to generate copies of the base virtual machine which will comprise the Greenplum Database cluster, and deploy a Greenplum Database cluster.
Creating the Base Virtual Machine
In this section, you clone a virtual machine from an existing CentOS 7 virtual machine, perform a series of configuration changes, and create a base virtual machine from it. Finally, you verify that it was configured correctly.
Preparing the Virtual Machine
Create a base virtual machine from an existing virtual machine. You must have a running CentOS 7 virtual machine in the datastore and cluster where you deploy the Greenplum environment.
- Log in to vCenter and navigate to Hosts and Clusters.
- Right click your existing CentOS 7 virtual machine.
- Select Clone → Clone to Virtual Machine.
- Enter
greenplum-db-base-vm
as the virtual machine name, then click Next. - Select your cluster, then click Next.
- Select the vSAN datastore and select Keep existing VM storage policies for VM Storage Policy, then click Next.
- Under Select clone options, check the boxes Power on virtual machine after creation and Customize this virtual machine’s hardware and click Next.
- Under Customize hardware, check the number of hard disks configured for this virtual machine. If there is only one, add a second one by clicking Add new device → Hard Disk.
- Edit the existing network adapter New Network so it connects to the
gp-virtual-external
port group.- If you are using DHCP, a new IP address will be assigned to this interface. If you are using static IP assignment, you must manually set up the IP address in a later step.
- Review your configuration, then click Finish.
Once the virtual machine is powered on, launch the Web Console and log in as
root
. Check the virtual machine IP address by runningip a
. If you are using static IP assignment, you must manually set it up:Edit the file
/etc/sysconfig/network-scripts/ifcfg-<interface-name>
.Enter the network information provided by your network administrator for the
gp-virtual-external
network. For example:BOOTPROTO=none
IPADDR=10.202.89.10
NETMASK=255.255.255.0
GATEWAY=10.202.89.1
DNS1=1.0.0.1
DNS2=1.1.1.1
Performing System Configuration
Configure the newly cloned virtual machine in order to support a Greenplum Database system.
Log in to the cloned virtual machine
greenplum-db-base-vm
as userroot
.Verify that VMware Tools is installed. Refer to Installing VMware Tools for instructions.
Disable the following services:
Disable SELinux by editing the
/etc/selinux/config
file. Change the value of theSELINUX
parameter in the configuration file as follows:SELINUX=disabled
Check that the System Security Services Daemon (SSSD) is installed:
$ yum list sssd | grep -i "Installed Packages"
If the SSSD is installed, edit the SSSD configuration file and set the
selinux_provider
parameter tonone
to prevent SELinux related SSH authentication denials which could occur even if SELinux is disabled. Edit/etc/sssd/sssd.conf
and add the following line. If SSSD is not installed, skip this step.selinux_provider=none
Disable the Firewall service:
$ systemctl stop firewalld
$ systemctl disable firewalld
$ systemctl mask --now firewalld
Disable the Tuned daemon:
$ systemctl stop tuned
$ systemctl disable tuned
$ systemctl mask --now tuned
Disable Chrony:
$ systemctl stop chronyd
$ systemctl disable chronyd
$ systemctl mask --now chronyd
Back up the boot files:
$ cp /etc/default/grub /etc/default/grub-backup
$ cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg-backup
Add the following boot parameters:
Disable Transparent Huge Page (THP):
$ grubby --update-kernel=ALL --args="transparent_hugepage=never"
Add the parameter
elevator=deadline
:$ grubby --update-kernel=ALL --args="elevator=deadline"
Install and enable the
ntp
daemon:$ yum install -y ntp
$ systemctl enable ntpd
Configure the NTP servers:
Remove all unwanted servers from
/etc/ntp.conf
. For example:...
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.centos.pool.ntp.org iburst
...
Add an entry for each server to
/etc/ntp.conf
:server <data center's NTP time server 1>
server <data center's NTP time server 2>
...
server <data center's NTP time server N>
Add the master and standby to the list of servers after datacenter NTP servers in
/etc/ntp.conf
:server <data center's NTP time server N>
...
server mdw
server smdw
Configure kernel settings so the system is optimized for Greenplum Database.
Create the configuration file
/etc/sysctl.d/10-gpdb.conf
and paste in the following kernel optimization parameters:kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.msgmni = 2048
kernel.sem = 500 2048000 200 40960
kernel.shmmni = 1024
kernel.sysrq = 1
net.core.netdev_max_backlog = 2000
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304
net.core.rmem_default = 4194304
net.core.wmem_default = 4194304
net.ipv4.tcp_rmem = 4096 4224000 16777216
net.ipv4.tcp_wmem = 4096 4224000 16777216
net.core.optmem_max = 4194304
net.core.somaxconn = 10000
net.ipv4.ip_forward = 0
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_tw_recycle = 0
net.core.default_qdisc = fq_codel
net.ipv4.tcp_mtu_probing = 0
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.ip_local_port_range = 10000 65535
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1
vm.overcommit_memory = 2
vm.overcommit_ratio = 95
vm.swappiness = 10
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.zone_reclaim_mode = 0
Add the following parameters, some of the values will depend on the virtual machine settings calculated on the Sizing section.
Determine the value of the RAM in bytes by creating the variable
$RAM_IN_BYTES
. For example, for a 30GB RAM virtual machine, run the following:$ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
Define the following parameters that depend on the variable
$RAM_IN_BYTES
that you just created, and append them to the file/etc/sysctl.d/10-gpdb.conf
by running the following commands:$ echo "vm.min_free_kbytes = $(($RAM_IN_BYTES * 3 / 100 / 1024))" >> /etc/sysctl.d/10-gpdb.conf
$ echo "kernel.shmall = $(($RAM_IN_BYTES / 2 / 4096))" >> /etc/sysctl.d/10-gpdb.conf
$ echo "kernel.shmmax = $(($RAM_IN_BYTES / 2))" >> /etc/sysctl.d/10-gpdb.conf
If your virtual machine RAM is less than or equal to 64 GB, run the following commands:
$ echo "vm.dirty_background_ratio = 3" >> /etc/sysctl.d/10-gpdb.conf
$ echo "vm.dirty_ratio = 10" >> /etc/sysctl.d/10-gpdb.conf
If your virtual machine RAM is greater than 64 GB, run the following commands:
$ echo "vm.dirty_background_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
$ echo "vm.dirty_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
$ echo "vm.dirty_background_bytes = 1610612736 # 1.5GB" >> /etc/sysctl.d/10-gpdb.conf
$ echo "vm.dirty_bytes = 4294967296 # 4GB" >> /etc/sysctl.d/10-gpdb.conf
Configure
ssh
to allow password-less login.Edit
/etc/ssh/sshd_config
file and update following options:PasswordAuthentication yes
ChallengeResponseAuthentication yes
UsePAM yes
MaxStartups 100
MaxSessions 100
Create
ssh
keys to allow passwordless login withroot
by running the following commands:# make sure to generate ssh keys without password. Press Enter for defaults
$ ssh-keygen
$ chmod 700 /root/.ssh
# copy public key to authorized_keys
$ cd /root/.ssh/
$ cat id_rsa.pub > authorized_keys
$ chmod 600 authorized_keys
# it will add host signature to known_hosts
$ ssh-keyscan -t rsa localhost > known_hosts
# duplicate host signature for all hosts in the cluster
$ key=$(cat known_hosts)
# Replace `64` with your number of total segment virtual machines as necessary.
$ for i in mdw $(seq -f "sdw%g" 1 64); do
echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
done
$ echo ${key} | sed -e "s/localhost/smdw" >> known_hosts
$ chmod 644 known_hosts
Configure the system resource limits to control the amount of resources used by Greenplum by creating the file
/etc/security/limits.d/20-nproc.conf
.Ensure that the directory exists before creating the file:
$ mkdir -p /etc/security/limits.d
Append the following contents to the end of
/etc/security/limits.d/20-nproc.conf
:* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
Create the base mount point
/gpdata
for the virtual machine data drive:$ mkdir -p /gpdata
$ mkfs.xfs /dev/sdb
$ mount -t xfs -o rw,noatime,nodev,inode64 /dev/sdb /gpdata/
$ df -kh
$ echo /dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0 >> /etc/fstab
$ mkdir -p /gpdata/primary
$ mkdir -p /gpdata/mirror
$ mkdir -p /gpdata/master
Configure the file
/etc/rc.d/rc.local
to make the following settings persistent:Update the file content:
# Configure readahead for the `/dev/sdb` to 16384 512-byte sectors, i.e. 8MiB
/sbin/blockdev --setra 16384 /dev/sdb
# Configure gp-virtual-internal network settings with MTU 9000
/sbin/ip link set ens192 mtu 9000
# Configure jumbo frame RX ring buffer to 4096
/sbin/ethtool --set-ring ens192 rx-jumbo 4096
Make the file executable:
$ chmod +x /etc/rc.d/rc.local
Create the group and user
gpadmin:gpadmin
required by the Greenplum Database.Execute the following steps in order to create the user
gpadmin
in the groupgpadmin
:$ groupadd gpadmin
$ useradd -g gpadmin -m gpadmin
$ passwd gpadmin
# Enter the desired password at the prompt
(Optional) Change the root password to a preferred password:
$ passwd root
# Enter the desired password at the prompt
Create the file
/home/gpadmin/.bashrc
forgpadmin
with the following content:### .bashrc
### Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
### User specific aliases and functions
### If Greenplum has been installed, then add Greenplum-specific commands to the path
if [ -f /usr/local/greenplum-db/greenplum_path.sh ]; then
source /usr/local/greenplum-db/greenplum_path.sh
fi
Change the ownership of
/home/gpadmin/.bashrc
togpadmin:gpadmin
:$ chown gpadmin:gpadmin /home/gpadmin/.bashrc
Change the ownership of the
/gpdata
directory togpadmin:gpadmin
:$ chown -R gpadmin:gpadmin /gpdata
Create
ssh
keys for passwordless login asgpadmin
user:$ su - gpadmin
# make sure to generate ssh keys without password. Press Enter for defaults
$ ssh-keygen
$ chmod 700 /home/gpadmin/.ssh
# copy public key to authorized_keys
$ cd /home/gpadmin/.ssh/
$ cat id_rsa.pub > authorized_keys
$ chmod 600 authorized_keys
# it will add host signature to known_hosts
$ ssh-keyscan -t rsa localhost > known_hosts
# duplicate host signature for all hosts in the cluster
$ key=$(cat known_hosts)
# Replace `64` with your number of total segment virtual machines as necessary.
$ for i in mdw $(seq -f "sdw%g" 1 64); do
echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
done
$ echo ${key} | sed -e "s/localhost/smdw" >> known_hosts
$ chmod 644 known_hosts
Log out of
gpadmin
to go back toroot
before you proceed to the next step.
Configure
cgroups
for Greenplum.For security and resource management, Greenplum Database makes use of the Linux
cgroups
.Install the
cgroup
configuration package:$ yum install -y libcgroup-tools
Verify that the directory
/etc/cgconfig.d
exists:$ mkdir -p /etc/cgconfig.d
Create the
cgroups
configuration file/etc/cgconfig.d/10-gpdb.conf
for Greenplum:group gpdb {
perm {
task {
uid = gpadmin;
gid = gpadmin;
}
admin {
uid = gpadmin;
gid = gpadmin;
}
}
cpu {
}
cpuacct {
}
cpuset {
}
memory {
}
}
Prepare the configuration file and enable
cgconfig
viasystemctl
:$ cgconfigparser -l /etc/cgconfig.d/10-gpdb.conf
$ systemctl enable cgconfig.service
Update the
/etc/hosts
file with all of the IP addresses and hostnames in the networkgp-virtual-internal
.Verify that you have following parameters defined:
- Total number of segment virtual machines you wish to deploy, the default is
64
. - The starting IP address of the master virtual machine in the
gp-virtual-internal
port group, the default is250
. - The leading octets for the
gp-virtual-internal
network IP range, the default is192.168.1.
. - The segment IP will start from 192.168.1.2
- The master IP will be 192.168.1.250
- The standby master IP will be 192.168.1.251
- Total number of segment virtual machines you wish to deploy, the default is
Create the file
/root/update-etc-hosts.sh
and insert the following commands:if [ $# -ne 2 ] ; then
echo "Usage: $0 internal_cidr segment_count"
exit 1
fi
if [ ! -f /etc/hosts.bak ]; then
cp /etc/hosts /etc/hosts.bak
else
cp /etc/hosts.bak /etc/hosts
fi
internal_ip_cidr=${1}
segment_host_count=${2}
internal_network_ip=$(echo ${internal_ip_cidr} | cut -d"/" -f1)
internal_netmask=$(echo ${internal_ip_cidr} | cut -d"/" -f2)
if [ ${internal_netmask} -lt 20 ] && [ ${internal_netmask} -gt 24 ]; then
echo "The CIDR should contain a netmask between 20 and 24."
exit 1
fi
max_segment_hosts=$(( 2**(32 - internal_netmask) - 8 ))
if [ ${max_segment_hosts} -lt ${segment_host_count} ]; then
echo "ERROR: The CIDR does not have enough IPs available (${max_segment_hosts}) to meet the VM count (${segment_host_count})."
exit 1
fi
octet3=$(echo ${internal_ip_cidr} | cut -d"." -f3)
ip_prefix=$(echo ${internal_ip_cidr} | cut -d"." -f1-2)
octet3_mask=$(( 256-2**(24 - internal_netmask) ))
octet3_base=$(( octet3_mask&octet3 ))
master_octet3=$(( octet3_base + 2**(24 - internal_netmask) - 1 ))
master_ip="${ip_prefix}.${master_octet3}.250"
standby_ip="${ip_prefix}.${master_octet3}.251"
printf "\n${master_ip}\tmdw\n${standby_ip}\tsmdw\n" >> /etc/hosts
i=2
for hostname in $(seq -f "sdw%g" 1 ${segment_host_count}); do
segment_internal_ip="${ip_prefix}.$(( octet3_base + i / 256 )).$(( i % 256 ))"
printf "${segment_internal_ip}\t${hostname}\n" >> /etc/hosts
let i=i+1
done
Run the script passing in two parameters, internal CIDR and segment host count. For example:
bash /root/update-etc-hosts.sh 192.168.1.1/24 64
Create two files
hosts-all
andhosts-segments
under/home/gpadmin
. Replace64
with your number of primary segment virtual machines as necessary.$ echo mdw > /home/gpadmin/hosts-all
$ echo smdw >> /home/gpadmin/hosts-all
$ > /home/gpadmin/hosts-segments
$ for i in {1..64}; do
echo "sdw${i}" >> /home/gpadmin/hosts-all
echo "sdw${i}" >> /home/gpadmin/hosts-segments
done
$ chown gpadmin:gpadmin /home/gpadmin/hosts*
Installing the Greenplum Database Software
Download the latest version of the Greenplum Database Server 6 for RHEL 7 from VMware Tanzu Network.
Move the downloaded binary in to the virtual machine and install Greenplum:
$ scp greenplum-db-6.*.rpm root@greenplum-db-base-vm:/tmp
$ ssh root@greenplum-db-base-vm
$ yum install -y /tmp/greenplum-db-6.*.rpm
Install the following
yum
packages for better supportability:dstat
to monitor system statistics, like network and I/O performance.sos
to generate an sosreport, a best practice to collect system information for support purposes.tree
to visualize folder structure.wget
to easily get artifacts from the Internet.
$ yum install -y dstat
$ yum install -y sos
$ yum install -y tree
$ yum install -y wget
Power down the virtual machine:
$ shutdown now
Enable vApp options in vCenter:
- Select the VM greenplum-db-base-vm
- In the VM view, click on Configure tab at the top of the page
- If vApp Option is disabled, then click EDIT…
- click Enable vApp options
- click OK
Add vApp option guestinfo.primary_segment_count:
- Select Settings → vApp Options
- Under Properties, click ADD
- In the General tab, enter the following:
- For Category, enter Greenplum
- For Label, enter Number of Primary Segments
- For Key ID, enter guestinfo.primary_segment_count
- In the Type tab, enter the following:
- For Type, select Integer
- For Range, enter range 1-1000
- Click on Save
- Select the new property
- Click Set Value, and enter an appropriate value, for example: 32
Add vApp option guestinfo.internal_ip_cidr:
- Under Properties, click ADD again
- In the General tab, enter the following:
- For Category, enter Internal Network
- For Label, enter Internal Network CIDR (with netmask /24)
- For Key ID, enter guestinfo.internal_ip_cidr
- In the Type tab, enter the following:
- For Type, select String
- For Length, enter range 12-18
- Click on Save
- Select the new property
- Click Set Value, and enter an appropriate value: for example: 192.168.10.1/24
Add vApp option guestinfo.deployment_type:
- Under Properties, click ADD again
- In the General tab, enter the following:
- For Category, enter Greenplum
- For Label, enter Deployment type
- For Key ID, enter guestinfo.deployment_type
- In the Type tab, enter the following:
- For Type, select String
- Click on Save
- Select the new property
- Click Set Value, and enter mirrored
Validating the Base Virtual Machine
Validate that the newly created base virtual machine is configured correctly.
Verifying the Base Virtual Machine Settings
Reboot the base virtual machine.
Log in to the virtual machine as
root
.Verify that the following services are disabled:
SELinux
$ sestatus
SELinux status: disabled
Firewall
$ systemctl status firewalld
firewalld.service
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
Tune
$ systemctl status tuned
tuned.service
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
Chrony
$ systemctl status chronyd
chronyd.service
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
Verify that
ntpd
is installed and enabled:$ systemctl status ntpd
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-05-04 18:47:25 EDT; 4s ago
Verify that the NTP servers are configured correctly and the remote servers are ordered properly:
$ ntpq -pn
remote refid st t when poll reach delay offset jitter
=================================================================================
-xx.xxx.xxx.xxx xx.xxx.xxx.xxx 3 u 246 256 377 0.186 2.700 0.993
+xx.xxx.xxx.xxx xx.xxx.xxx.xxx 3 u 223 256 377 26.508 0.247 0.397
Verify that the filesystem configuration is correct:
$ lsblk /dev/sdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 250G 0 disk /gpdata/
$ grep sdb /etc/fstab
/dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0
$ df -Th | grep sdb
/dev/sdb xfs 250G 167M 250G 1% /gpdata
$ ls -l /gpdata
total 0
drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 master
drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 mirror
drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 primary
Verify that the parameters
transparent_hugepage=never
andelevator=deadline
exist:$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 transparent_hugepage=never elevator=deadline
Verify that the
ulimit
settings match your specification by running the following command:$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 119889
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 524288
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Verify that the necessary
yum
packages are installed, by runningrpm -qa
:$ rpm -qa | grep apr
$ rpm -qa | grep apr-util
$ rpm -qa | grep dstat
$ rpm -qa | grep greenplum-db-6
$ rpm -qa | grep krb5-devel
$ rpm -qa | grep libcgroup-tools
$ rpm -qa | grep libevent
$ rpm -qa | grep libyaml
$ rpm -qa | grep net-tools
$ rpm -qa | grep ntp
$ rpm -qa | grep perl
$ rpm -qa | grep rsync
$ rpm -qa | grep sos
$ rpm -qa | grep tree
$ rpm -qa | grep wget
$ rpm -qa | grep which
$ rpm -qa | grep zip
Verify that you configured the Greenplum Database
cgroups
correctly by running the commands below.Identify the
cgroup
directory mount point:$ grep cgroup /proc/mounts
The first line from the above output identifies the
cgroup
mount point. For example,/sys/fs/cgroup
.Run the following commands, replacing
<cgroup_mount_point>
with the mount point which you identified in the previous step:$ ls -l <cgroup_mount_point>/cpu/gpdb
$ ls -l <cgroup_mount_point>/cpuacct/gpdb
$ ls -l <cgroup_mount_point>/cpuset/gpdb
$ ls -l <cgroup_mount_point>/memory/gpdb
The above directories must exist and must be owned by
gpadmin:gpadmin
.Verify that the
cgconfig
service is running by executing the following command:$ systemctl status cgconfig.service
Verify that the
sysctl
settings have been applied correctly based on your virtual machine settings.First define the variable
$RAM_IN_BYTES
again on this virtual machine. For example, for a 30 GB RAM:$ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
Retrieve the values listed below by running
sysctl <kernel setting>
and confirm that the values match the verifier specified for each setting.Kernel Setting Value vm.min_free_kbytes $(($RAM_IN_BYTES * 3 / 100 / 1024)) vm.overcommit_memory 2 vm.overcommit_ratio 95 net.ipv4.ip_local_port_range 10000 65535 kernel.shmall $(($RAM_IN_BYTES / 2 / 4096)) kernel.shmmax $(($RAM_IN_BYTES / 2)) For a virtual machine with 64 GB of RAM or less:
Kernel Setting Value vm.dirty_background_ratio 3 vm.dirty_ratio 10 For a virtual machine with more than 64 GB of RAM:
Kernel Setting Value vm.dirty_background_ratio 0 vm.dirty_ratio 0 vm.dirty_background_bytes 1610612736 vm.dirty_bytes 4294967296
Verify that
ssh
command allows passwordless login asgpadmin
user without prompting for a password:$ su - gpadmin
$ ssh localhost
$ exit
$ exit
Verify the readahead value:
$ /sbin/blockdev --getra /dev/sdb
16384
Verify the RX Jumbo buffer ring setting:
$ /sbin/ethtool -g ens192 | grep Jumbo
RX Jumbo: 4096
RX Jumbo: 4096
Verify the MTU size:
$ /sbin/ip a | grep 9000
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
Power off the VM.
Allocating the Virtual Machines with Terraform
Provisioning the Virtual Machines
Use the Terraform software you installed in Creating the Jumpbox Virtual Machine to generate copies of the base virtual machine you just created. Next, configure the copies based on the number of virtual machines in your environment, IP address ranges, and other settings you specify in the installation script.
Create a file named
main.tf
and copy the contents described as below,For deploying in vSAN datastore ( vSAN storage ), copy the contents from OVA Script.
or
For deploying in Datastore Cluster ( PowerFlex or any other storage provisioner ), copy the contents from Datastore Cluster OVA Script.
- Note: We suggest to Turn ON vSphere Storage DRS on the Datastore Cluster and set Cluster automation level to No Automation (Manual Mode)
Log in to the jumpbox virtual machine as
root
.Update the following variables under the Terraform variables section of the
main.tf
script with the correct values for your environment. You collected the required information in the Prerequisites section.Variable Description vsphere_user Name of the VMware vSphere administrator level user. vsphere_password Password of the VMware vSphere administrator level user. vsphere_server The IP address or, preferably, the Fully-Qualified Domain Name (FQDN) of your vCenter server. vsphere_datacenter The name of the data center for Greenplum in your vCenter environment. vsphere_compute_cluster The name of the compute cluster for Greenplum in your data center. vsphere_datastore The name of the vSAN datastore which will contain your Greenplum data. (e.g vSAN) vsphere_datastore_cluster The name of the PowerFlex datastore cluster which will contain your Greenplum data. (e.g PowerFlex) vsphere_storage_policy The name of the storage policy defined during Setting Up VMware vSphere Storage or Setting Up VMware vSphere Encryption. (e.g vSAN) prefix A customizable prefix name for the resource pool, Greenplum VMs, and DRS affinity rules which will be created by Terraform gp_virtual_external_ipv4_addresses The routable IP addresses for mdw and smdw, in that order; for example: [“10.0.0.111”, “10.0.0.112”]
.gp_virtual_external_ipv4_netmask The number of bits in the netmask for gp-virtual-external
; for example:24
.gp_virtual_external_gateway The gateway IP address for the gp-virtual-external
network.dns_servers The DNS servers for the gp-virtual-external
network, listed as an array; for example:[“8.8.8.8”, “8.8.4.4”]
.gp_virtual_etl_bar_ipv4_cidr The leading octets for the ETL, backup and restore network, non-routable network gp-virtual-etl-bar
; for example:192.168.2.0/24
.Initialize Terraform:
$ terraform init
You should get the following output:
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
re-run this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Verify that your Terraform configuration is correct by running the following command:
$ terraform plan
Deploy the cluster:
$ terraform apply
Answer Yes to the following prompt:
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
The virtual machines will be created and configured to deploy your Greenplum cluster. You can check the progress under the Recent Tasks panel on your VMware vSphere client.
Once Terraform has completed, it generates a file named terraform.tfstate
. This file must not be deleted, as it keeps a record of all the virtual machines and their states. Terraform also uses this file when modifying any virtual machines. We also recommend that you retain a snapshot of the jumpbox virtual machine.
Terraform timeout
Occasionally, Terraform may time out when deploying the virtual machines. If a virtual machine cannot be cloned within the timeout value, by default 30 minutes, Terraform will fail and the cluster setup will be incomplete. Terraform will report the following error:
error cloning virtual machine: timeout waiting for clone to complete
You must review the root cause of the issue which resides within the vCenter environment, check host and storage performance in order to find out why a virtual machine is taking over 30 minutes to be cloned. There are two ways of working around this issue by editing Terraform settings:
Reduce the parallelism of Terraform from 10 to 5 and redeploy the cluster by running the following command:
terraform apply --parallelism 5
Increase the Terraform timeout property, set in minutes. See more about this property in the Terraform documentation.
Modify the
main.tf
script in two places, one for thesegment_hosts
and another one for themaster_hosts
, add the propertytimeout
under theclone
section:...
resource "vsphere_virtual_machine" "segment_hosts" {
...
clone {
...
timeout = 40
...
}
}
resource "vsphere_virtual_machine" "master_hosts" {
...
clone {
...
timeout = 40
...
}
}
After saving the changes, rerun
terraform apply
to redeploy the cluster.
Validating the Deployment
Once Terraform has provisioned the virtual machines, perform the following validation steps:
Validate the Resource Pool for the Greenplum cluster.
Log in to vCenter and navigate to Hosts and Clusters.
Select the newly created resource pool and verify that the Resource Settings are as below:
Note that the Worst Case Allocation fields will differ depending on what is currently running in your environment.Click the expanding arrow next to the resource pool name, you should see all the newly created virtual machines:
gp-1-mdw
,gp-1-smdw
,gp-1-sdw1
, etc.
Validate that the
gp-virtual-internal
network is working.Log in to the master node as
root
.Switch to
gpadmin
user.$ su - gpadmin
Make sure that the file
/home/gpadmin/hosts-all
exists.Use the
gpssh
command to verify connectivity to all nodes in thegp-virtual-internal
network.$ gpssh -f hosts-all -e hostname
Validate the MTU settings on all virtual machines.
Log in to the master node as
root
.Use the
gpssh
command to verify the value of the MTU.$ source /usr/local/greenplum-db/greenplum_path.sh
$ gpssh -f /home/gpadmin/hosts-all -e "ifconfig ens192 | grep -i mtu"
Clean Up the Temporary VMware vSphere Admin Account
If you created a temporary VMware vSphere administrator level user such as greenplum
, it is safe to remove it now.
Deploying Greenplum
You are now ready to deploy Greenplum Database on the newly deployed cluster. Perform the steps below from the Greenplum master node.
Deploying a Greenplum Database Cluster
Initialize the Greenplum cluster.
Log in to the Greenplum master node as
gpadmin
user.Create the Greenplum GUC (global user configuration) file
gp_guc_config
and paste in the following contents:.### Interconnect Settings
gp_interconnect_queue_depth=16
gp_interconnect_snd_queue_depth=16
# Since you have one segment per VM and less competing workloads per VM,
# you can set the memory limit for resource group higher than the default
gp_resource_group_memory_limit=0.85
# This value should be 5% of the total RAM on the VM
statement_mem=1536MB
# This value should be set to 25% of the total RAM on the VM
max_statement_mem=7680MB
# This value should be set to 85% of the total RAM on the VM
gp_vmem_protect_limit=26112
# Since you have less I/O bandwidth, you can turn this parameter on
gp_workfile_compression=on
Create the Greenplum configuration script
create_gpinitsystem_config.sh
and paste in the following contents:#!/bin/bash
# setup the gpinitsystem config
primary_array() {
num_primary_segments=$1
array=""
newline=$'\n'
# master has db_id 0, primary starts with db_id 1, primaries are always odd
for i in $( seq 0 $(( num_primary_segments - 1 )) ); do
content_id=${i}
db_id=$(( 2 * i + 1 ))
array+="sdw${db_id}~sdw${db_id}~6000~/gpdata/primary/gpseg${content_id}~${db_id}~${content_id}${newline}"
done
echo "${array}"
}
mirror_array() {
num_primary_segments=$1
array=""
newline=$'\n'
# mirror starts with db_id 2, mirrors are always even
for i in $( seq 0 $(( num_primary_segments - 1 )) ); do
content_id=${i}
db_id=$(( 2 * i + 2 ))
array+="sdw${db_id}~sdw${db_id}~7000~/gpdata/mirror/gpseg${content_id}~${db_id}~${content_id}${newline}"
done
echo "${array}"
}
create_gpinitsystem_config() {
num_primary_segments=$1
echo "Generate gpinitsystem"
cat <<EOF> ./gpinitsystem_config
ARRAY_NAME="Greenplum Data Platform"
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
SEG_PREFIX=gpseg
HEAP_CHECKSUM=on
HBA_HOSTNAMES=0
QD_PRIMARY_ARRAY=mdw~mdw~5432~/gpdata/master/gpseg-1~0~-1
declare -a PRIMARY_ARRAY=(
$( primary_array ${num_primary_segments} )
)
declare -a MIRROR_ARRAY=(
$( mirror_array ${num_primary_segments} )
)
EOF
}
num_primary_segments=$1
if [ -z "$num_primary_segments" ]; then
echo "Usage: bash create_gpinitsystem_config.sh <num_primary_segments>"
else
create_gpinitsystem_config ${num_primary_segments}
fi
Run the script to generate the configuration file for
gpinitsystem
. Replace32
with the number of primary segments as necessary.$ bash create_gpinitsystem_config.sh 32
You should now see a file called
gpinitsystem_config
.Run the following command to initialize the Greenplum Database:
$ gpinitsystem -a -I gpinitsystem_config -p gp_guc_config
$ gpinitstandby -s smdw
Configure the Greenplum master and standby master environment variables, and load the master variables:
$ echo export MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1 >> ~/.bashrc
$ ssh smdw 'echo export MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1 >> ~/.bashrc'
$ source ~/.bashrc
Restart the Greenplum cluster for the newly configured settings to take effect:
$ gpstop -r
Next Steps
Now that the Greenplum Database has been deployed, follow the steps provided in Validating the Greenplum Installation to ensure Greenplum Database has been installed correctly.