Overview of Greenplum Database High Availability
A Greenplum Database cluster can be made highly available by providing a fault-tolerant hardware platform, by enabling Greenplum Database high-availability features, and by performing regular monitoring and maintenance procedures to ensure the health of all system components.
Hardware components will eventually fail, whether due to normal wear or an unexpected circumstance. Loss of power can lead to temporarily unavailable components. A system can be made highly available by providing redundant standbys for components that can fail so that services can continue uninterrupted when a failure does occur. In some cases, the cost of redundancy is higher than users’ tolerance for interruption in service. When this is the case, the goal is to ensure that full service is able to be restored, and can be restored within an expected timeframe.
With Greenplum Database, fault tolerance and data availability is achieved with:
- Hardware level RAID storage protection
- Data storage checksums
- Greenplum segment mirroring
- Master mirroring
- Dual clusters
- Database backup and restore
Hardware level RAID
A best practice Greenplum Database deployment uses hardware level RAID to provide high performance redundancy for single disk failure without having to go into the database level fault tolerance. This provides a lower level of redundancy at the disk level.
Data storage checksums
Greenplum Database uses checksums to verify that data loaded from disk to memory has not been corrupted on the file system.
Greenplum Database has two kinds of storage for user data: heap and append-optimized. Both storage models use checksums to verify data read from the file system and, with the default settings, they handle checksum verification errors in a similar way.
Greenplum Database primary and segment database processes update data on pages in the memory they manage. When a memory page is updated and flushed to disk, checksums are computed and saved with the page. When a page is later retrieved from disk, the checksums are verified and the page is only permitted to enter managed memory if the verification succeeds. A failed checksum verification is an indication of corruption in the file system and causes Greenplum Database to generate an error, aborting the transaction.
The default checksum settings provide the best level of protection from undetected disk corruption propagating into the database and to mirror segments.
Heap checksum support is enabled by default when the Greenplum Database cluster is initialized with the gpinitsystem
management utility. Although it is strongly discouraged, a cluster can be initialized without heap checksum support by setting the HEAP_CHECKSUM
parameter to off in the gpinitsystem
cluster configuration file. See gpinitsystem.
Once initialized, it is not possible to change heap checksum support for a cluster without reinitializing the system and reloading databases.
You can check the read-only server configuration parameter data_checksums to see if heap checksums are enabled in a cluster:
$ gpconfig -s data_checksums
When a Greenplum Database cluster starts up, the gpstart
utility checks that heap checksums are consistently activated or deactivated on the master and all segments. If there are any differences, the cluster fails to start. See gpstart.
In cases where it is necessary to ignore heap checksum verification errors so that data can be recovered, setting the ignore_checksum_failure system configuration parameter to on causes Greenplum Database to issue a warning when a heap checksum verification fails, but the page is then permitted to load into managed memory. If the page is updated and saved to disk, the corrupted data could be replicated to the mirror segment. Because this can lead to data loss, setting ignore_checksum_failure
to on should only be done to enable data recovery.
For append-optimized storage, checksum support is one of several storage options set at the time an append-optimized table is created with the CREATE TABLE
statement. The default storage options are specified in the gp_default_storage_options
server configuration parameter. The checksum
storage option is enabled by default and deactivating it is strongly discouraged.
If you choose to deactivate checksums for an append-optimized table, you can either
- change the
gp_default_storage_options
configuration parameter to includechecksum=false
before creating the table, or - add the
checksum=false
option to theWITH storage\_options
clause of theCREATE TABLE
statement.
Note that the CREATE TABLE
statement allows you to set storage options, including checksums, for individual partition files.
See the CREATE TABLE command reference and the gp_default_storage_options configuration parameter reference for syntax and examples.
Segment Mirroring
Greenplum Database stores data in multiple segments, each of which is a Greenplum Database Postgres instance. The data for each table is spread between the segments based on the distribution policy that is defined for the table in the DDL at the time the table is created. When segment mirroring is enabled, for each segment there is a primary and mirror pair. The primary and mirror perform the same IO operations and store copies of the same data.
The mirror instance for each segment is usually initialized with the gpinitsystem
utility or the gpexpand
utility. The mirror runs on a different host than the primary instance to protect from a single machine failure. There are different strategies for assigning mirrors to hosts. When choosing the layout of the primaries and mirrors, it is important to consider the failure scenarios to ensure that processing skew is minimized in the case of a single machine failure.
Master Mirroring
There are two masters in a highly available cluster, a primary and a standby. As with segments, the master and standby should be deployed on different hosts so that the cluster can tolerate a single host failure. Clients connect to the primary master and queries can be executed only on the primary master. The secondary master is kept up-to-date by replicating the write-ahead log (WAL) from the primary to the secondary.
If the master fails, the administrator runs the gpactivatestandby
utility to have the standby master take over as the new primary master. You can configure a virtual IP address for the master and standby so that client programs do not have to switch to a different network address when the current master changes. If the master host fails, the virtual IP address can be swapped to the actual acting master.
Dual Clusters
An additional level of redundancy can be provided by maintaining two Greenplum Database clusters, both storing the same data.
Two methods for keeping data synchronized on dual clusters are “dual ETL” and “backup/restore.”
Dual ETL provides a complete standby cluster with the same data as the primary cluster. ETL (extract, transform, and load) refers to the process of cleansing, transforming, validating, and loading incoming data into a data warehouse. With dual ETL, this process is executed twice in parallel, once on each cluster, and is validated each time. It also allows data to be queried on both clusters, doubling the query throughput. Applications can take advantage of both clusters and also ensure that the ETL is successful and validated on both clusters.
To maintain a dual cluster with the backup/restore method, create backups of the primary cluster and restore them on the secondary cluster. This method takes longer to synchronize data on the secondary cluster than the dual ETL strategy, but requires less application logic to be developed. Populating a second cluster with backups is ideal in use cases where data modifications and ETL are performed daily or less frequently.
Backup and Restore
Making regular backups of the databases is recommended except in cases where the database can be easily regenerated from the source data. Backups should be taken to protect from operational, software, and hardware errors.
Use the gpcrondump
utility to backup Greenplum databases. gpcrondomp
performs the backup in parallel across segments, so backup performance scales up as hardware is added to the cluster.
When designing a backup strategy, a primary concern is where to store the backup data. The data each segment manages can be backed up on the segment’s local storage, but should not be stored there permanently—the backup reduces disk space available to the segment and, more importantly, a hardware failure could simultaneously destroy the segment’s live data and the backup. After performing a backup, the backup files should be moved from the primary cluster to separate, safe storage. Alternatively, the backup can be made directly to separate storage.
Additional options are available to backup datatabases, including the following:
Data Domain
Through native API integration backups can be streamed to a Dell EMC Data Domain appliance.
NetBackup
Through native API integration, backups can be streamed to a Veritas NetBackup cluster.
NFS
If an NFS mount is created on each Greenplum Database host in the cluster, backups can be written directly to the NFS mount. A scale out NFS solution is recommended to ensure that backups do not bottleneck on IO throughput of the NFS device. Dell EMC Isilon is an example that can scale out along side the Greenplum cluster.
Incremental Backups
Greenplum Database allows incremental backup at the partition level for append-optimized and column-oriented tables. When you perform an incremental backup, only the partitions for append-optimized and column-oriented tables that have changed since the previous backup are backed up. (Heap tables are always backed up.) Restoring an incremental backup requires restoring the previous full backup and subsequent incremental backups.
Incremental backup is beneficial only when the database contains large, partitioned tables where all but one or a few partitions remain unchanged between backups. An incremental backup saves just the changed partitions and the heap tables. By not backing up the unchanged partitions, the backup size and time can be significantly reduced.
If a large fact table is not partitioned, and a single row is added or changed, the entire table is backed up, and there is no savings in backup size or time. Therefore, incremental backup is only recommended for databases with large, partitioned tables and relatively small dimension tables.
If you maintain dual clusters and use incremental backup, you can populate the second cluster with the incremental backups. Use the --noplan
option to achieve this, allowing backups from the primary site to be applied faster.
Parent topic: Enabling High Availability and Data Consistency Features