Validating Your Systems

Validate your operating system settings, hardware, and network.

Greenplum provides the following utilities to validate the configuration and performance of your systems:

These utilities can be found in $GPHOME/bin of your Greenplum installation.

The following tests should be run prior to initializing your Greenplum Database system.

Parent topic: Greenplum Database Installation Guide

Validating OS Settings

Greenplum provides a utility called gpcheck that can be used to verify that all hosts in your array have the recommended OS settings for running a production Greenplum Database system. To run gpcheck:

  1. Log in on the master host as the gpadmin user.

  2. Source the greenplum_path.sh path file from your Greenplum installation. For example:

    1. $ source /usr/local/greenplum-db/greenplum_path.sh
  3. Create a file called hostfile_gpcheck that has the machine-configured host names of each Greenplum host (master, standby master and segments), one host name per line. Make sure there are no blank lines or extra spaces. This file should just have a single host name per host. For example:

    1. mdw
    2. smdw
    3. sdw1
    4. sdw2
    5. sdw3
  4. Run the gpcheck utility using the host file you just created. For example:

    1. $ gpcheck -f hostfile_gpcheck -m mdw -s smdw
  5. After gpcheck finishes verifying OS parameters on all hosts (masters and segments), you might be prompted to modify certain OS parameters before initializing your Greenplum Database system.

Parent topic: Validating Your Systems

Validating Hardware Performance

Greenplum provides a management utility called gpcheckperf, which can be used to identify hardware and system-level issues on the machines in your Greenplum Database array. gpcheckperf starts a session on the specified hosts and runs the following performance tests:

  • Network Performance (gpnetbench*)
  • Disk I/O Performance (dd test)
  • Memory Bandwidth (stream test)

Before using gpcheckperf, you must have a trusted host setup between the hosts involved in the performance test. You can use the utility gpssh-exkeys to update the known host files and exchange public keys between hosts if you have not done so already. Note that gpcheckperf calls to gpssh and gpscp, so these Greenplum utilities must be in your $PATH.

Parent topic: Validating Your Systems

Validating Network Performance

To test network performance, run gpcheckperf with one of the network test run options: parallel pair test (-r N), serial pair test (-r n), or full matrix test (-r M). The utility runs a network benchmark program that transfers a 5 second stream of data from the current host to each remote host included in the test. By default, the data is transferred in parallel to each remote host and the minimum, maximum, average and median network transfer rates are reported in megabytes (MB) per second. If the summary transfer rate is slower than expected (less than 100 MB/s), you can run the network test serially using the -r n option to obtain per-host results. To run a full-matrix bandwidth test, you can specify -r M which will cause every host to send and receive data from every other host specified. This test is best used to validate if the switch fabric can tolerate a full-matrix workload.

Most systems in a Greenplum Database array are configured with multiple network interface cards (NICs), each NIC on its own subnet. When testing network performance, it is important to test each subnet individually. For example, considering the following network configuration of two NICs per host:

Greenplum HostSubnet1 NICsSubnet2 NICs
Segment 1sdw1-1sdw1-2
Segment 2sdw2-1sdw2-2
Segment 3sdw3-1sdw3-2

You would create four distinct host files for use with the gpcheckperf network test:

hostfile_gpchecknet_ic1hostfile_gpchecknet_ic2
sdw1-1sdw1-2
sdw2-1sdw2-2
sdw3-1sdw3-2

You would then run gpcheckperf once per subnet. For example (if testing an even number of hosts, run in parallel pairs test mode):

  1. $ gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp > subnet1.out
  2. $ gpcheckperf -f hostfile_gpchecknet_ic2 -r N -d /tmp > subnet2.out

If you have an odd number of hosts to test, you can run in serial test mode (-r n).

Parent topic: Validating Hardware Performance

Validating Disk I/O and Memory Bandwidth

To test disk and memory bandwidth performance, run gpcheckperf with the disk and stream test run options (-r ds). The disk test uses the dd command (a standard UNIX utility) to test the sequential throughput performance of a logical disk or file system. The memory test uses the STREAM benchmark program to measure sustainable memory bandwidth. Results are reported in MB per second (MB/s).

To run the disk and stream tests

  1. Log in on the master host as the gpadmin user.

  2. Source the greenplum_path.sh path file from your Greenplum installation. For example:

    1. $ source /usr/local/greenplum-db/greenplum_path.sh
  3. Create a host file named hostfile_gpcheckperf that has one host name per segment host. Do not include the master host. For example:

    1. sdw1
    2. sdw2
    3. sdw3
    4. sdw4
  4. Run the gpcheckperf utility using the hostfile_gpcheckperf file you just created. Use the -d option to specify the file systems you want to test on each host (you must have write access to these directories). You will want to test all primary and mirror segment data directory locations. For example:

    1. $ gpcheckperf -f hostfile_gpcheckperf -r ds -D \
    2. -d /data1/primary -d /data2/primary \
    3. -d /data1/mirror -d /data2/mirror
  5. The utility may take a while to perform the tests as it is copying very large files between the hosts. When it is finished you will see the summary results for the Disk Write, Disk Read, and Stream tests.

Parent topic: Validating Your Systems