gpdbrestore

Restores a database from a set of dump files generated by gpcrondump.

Note: This utility is deprecated and will not be supported after the end of Greenplum Database 5.x Support Life.

Synopsis

  1. gpdbrestore { -t <timestamp_key> { [-L] |
  2. [--netbackup-service-host <netbackup_server>
  3. [--netbackup-block-size <size>] ] }
  4. -b <YYYYMMDD> | -R <hostname>:<path_to_dumpset> | -s <database_name> }
  5. [--noplan] [--noanalyze] [-u backup_directory] [--list-backup]
  6. [--prefix prefix_string] [--report-status-dir <report_directory>]
  7. [-S <schema_name >]
  8. [-T <schema.table >] [--table-file <file_name>] [--truncate] [-e]
  9. [-m]
  10. [-restore-stats [include | only]]
  11. [-G [include | only]] [--change-schema <schema_name>]
  12. [-B <parallel_processes>] [-d <master_data_directory>] [-a] [-q]
  13. [-l <logfile_directory>] [-v]
  14. [--ddboost [--ddboost-storage-unit=<unit-ID>] ]
  15. [--redirect <database_name> ]
  16. gpdbrestore -?
  17. gpdbrestore --version

Description

The gpdbrestore utility recreates the data definitions (schema) and user data in a Greenplum database using the script files created by gpcrondump operations.

When you restore from an incremental backup, the gpdbrestore utility assumes the complete backup set is available. The complete backup set includes the following backup files:

  • The last full backup before the specified incremental backup
  • All incremental backups created between the time of the full backup the specified incremental backup

The gpdbrestore utility provides the following functionality:

  • Automatically reconfigures for compression.

  • Validates the number of dump files are correct (for primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files).

  • If a failed segment is detected, restores to active segment instances.

  • Except when restoring data from a NetBackup server, you do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore.

  • The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance.

  • Identifies the database name automatically from the backup set.

  • Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore.

    Performs an ANALYZE operation on the tables that are restored. You can deactivate the ANALYZE operation by specifying the option --noanalyze.

  • Can restore global objects such as roles and tablespaces (-G option).

  • Detects if the backup set is primary segments only or primary and mirror segments and performs the appropriate restore operation.

  • Allows you to drop the target database before a restore in a single operation.

The backups contain the database-level settings for the server configuration parameters gp_default_storage_options, optimizer, and search_path, the settings are restored when you perform a restore operation and specify the -e option to create an empty target database before performing a restore operation.

Important: When restoring table data to an existing table, the utility assumes that the database table definition is the same as the table that was backed up. The utility does not check the table definitions.

Database, Schema, and Table Names

You can specify names of databases, schemas, and tables that contain these special characters.

" ' ~ # $ % ^ & * ( ) _ - + [ ] > < \ | ; : / ?` and the space character.

Note: The characters !, comma (,), and period (.) are not supported. Also, the tab (\t) and newline (\n) characters are not supported.

When the name contains special characters and is specified on the command line, the name must be enclosed in double quotes ("). Double quotes are optional for names that do not contain special characters. For example, either use of quotes is valid on the command line "my#1schema".mytable or "my#1schema"."mytable". Within the name, these special characters must be escaped with a backslash (\) : " $ ` .

When the name is specified in an input file, the name must not be enclosed in double quotes. Special characters do not require escaping.

Restoring from a Data Domain System with DD Boost

When you create a backup with gpcrondump using DD Boost, the backup is stored on a Data Domain system storage unit. When restore the backup, you must use the same storage unit ID that was used when you backed up the data. You can use the gpcrondump option --ddboost-show-config to display the current DD Boost configuration information that includes the storage unit ID.

For information about using DD Boost and Data Domain systems with Greenplum Database, see “Backing Up and Restoring Databases” in the Greenplum Database Administrator Guide.

NetBackup is not compatible with DD Boost. Both NetBackup and DD Boost cannot be used in a single back up operation.

Restoring a Database from NetBackup

Greenplum Database must be configured to communicate with the Veritas NetBackup master server that is used to restore database data. See the Greenplum Database System Administrator Guide for information about configuring Greenplum Database and NetBackup.

When restoring from NetBackup server, you must specify the timestamp of the backup with the -t option.

When restoring a large amount of data, set the NetBackup CLIENT_READ_TIMEOUT option to a value that is at least twice the expected duration of the operation (in seconds). The CLIENT_READ_TIMEOUT default value is 300 seconds (5 minutes).

NetBackup is not compatible with DD Boost. Both NetBackup and DD Boost cannot be used in a single back up operation.

Restoring a Database with Named Pipes

If you used named pipes when you backed up a database with gpcrondump, named pipes with the backup data must be available when restoring the database from the backup.

Error Reporting

After a restore operation completes, the utility checks the gpdbrestore status file for SQL execution errors and displays a warning if an error is found. The default location of the restore status files are in the db_dumps/date/ directory.

Be sure to examine the utility output messages and the restore status files when the operation completes. If gpdbrestore output displays any warnings, or you notice errors in the restore status files, you must execute the restore process again.

Options

-a (do not prompt)

Do not prompt the user for confirmation.

-b YYYYMMDD

Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/YYYYMMDD. If --ddboost is specified, the systems looks for dump files on the Data Domain Boost host.

-B parallel_processes

The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore.

--change-schema=schema_name

Optional. Restores tables from a backup created with gpcrondump to a different schema. The schema_name must exist in the database. If the schema does not exist, the utility returns an error. System catalog schemas are not supported.

You must specify tables to restore with the -T and --table-file options. If a table that is being restored exists in schema-name, the utility returns a warning and attempts to append the data to the table from the backup. You can specify the --trunctate option to truncate table data before restoring data to the table from the backup.

This option is not supported if -S is specified.

-d master_data_directory

Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.

--ddboost [—ddboost-storage-unit=unit-ID ]

Use Data Domain Boost for this restore operation, if the --ddboost option was specified when the data was backed up with the gpcrondump utility. Before using Data Domain Boost, make sure the one-time Data Domain Boost credential setup is complete.

The --ddboost-storage-unit option is optional. When restoring a backup from a Data Domain server, you must use the same storage unit that was used when you backed up the data.

See Restoring from a Data Domain System with DD Boost.

If you backed up Greenplum Database configuration files with the gpcrondump option -g and specified the --ddboost option, you must manually restore the backup from the Data Domain system. The configuration files must be restored for the Greenplum Database master and all the hosts and segments. The backup location on the Data Domain system is the directory unit\_ID/``backup_directory``/date. The backup_directory is set when you specify the Data Domain credentials with gpcrondump. The unit_ID is the Data Domain system storage unit that was used when you backed up the database.

This option is not supported if --netbackup-service-host is specified.

-e (create target database before restore)

Creates the target database and then performs a restore operation. If the target database exists, drops the existing database before creating the database and performing a restore operation.

The utility restores database-level settings for the server configuration parameters gp_default_storage_options, optimizer, and search_path.

-G [include | only] (restore global objects)

Restores database metadata information that is not associated with a specific schema or table, such as roles and tablespaces, if the global object dump file db_dumps/date/prefix\_string\_gp_global_-1_1_timestamp is found in the master data directory. The global object file is created with the gpcrondump option -G.

  • The keyword include restores global objects in addition to performing a restore. This is the default if no keyword is specified.
  • The keyword only restores only global objects. No other database objects or database table data are restored.

The -m option restores metadata associated with schemas or tables.

-l logfile_directory

The directory to write the log file. Defaults to ~/gpAdminLogs.

--list-backup

Lists the set of full and incremental backup sets required to perform a restore based on the timestamp_key specified with the -t option and the location of the backup set.

This option is supported only if the timestamp_key is for an incremental backup.

-L (list tablenames in backup set)

When used with the -t option, lists the table names that exist in the named backup set and exits. Does not perform a restore.

-m (restore metadata only)

Restores database metadata information such schema and table definitions and information created by SET statements. This option does not restore database table data. All table and schema metadata is restored unless options are specified that include or exclude tables or schemas. If table or schema filters are specified, the utility restores the schema and table metadata only for the schemas and tables that are specified to be restored.

Database information that is not associated with a specific schema or table, such as roles and tablespaces, is not restored. You can specify the -G option with this option to restore global metadata that was backed up with the gpcrondump utility.

Database statistics are not restored. You can specify the --restore-stats option to restore statistics that were backed up with the gpcrondump utility.

Not supported with the --noplan or --noanalyze options.

--netbackup-block-size size

Specify the block size, in bytes, of data being transferred from the Veritas NetBackup server. The default is 512 bytes.

NetBackup options are not supported if DDBoost backup options are specified.

--netbackup-service-host netbackup_server

The NetBackup master server that Greenplum Database connects to when backing up to NetBackup. If you specify this option, you must specify the timestamp of the backup with the -t option.

The maximum length for the this parameter is 127 characters.

This option is not supported with any of the these options: -R, -s, -b, -L, or --ddboost.

NetBackup options are not supported if DDBoost backup options are specified.

--noanalyze

The ANALYZE command is not run after a successful restore. The default is to run the ANALYZE command on restored tables. This option is useful if running ANALYZE on tables in your database requires a significant amount of time.

If this option is specified, you should run ANALYZE manually on restored tables. Failure to run ANALYZE following a restore might result in poor database performance.

Not supported with the -m option.

--noplan

Restores only the data backed up during the incremental backup specified by the timestamp_key. No other data from the complete backup set are restored. The full backup set containing the incremental backup must be available.

The --noplan option does not truncate existing tables prior to restoration. To avoid loading duplicate data into existing tables, you can truncate the tables before performing an incremental restore with the --noplan option.

If the timestamp_key specified with the -t option does not reference an incremental backup, an error is returned.

Not supported with the -m option.

--prefix prefix_string

If you specified the gpcrondump option --prefix prefix\_string to create the backup, you must specify this option with the prefix_string when restoring the backup.

If you created a full backup of a set of tables with gpcrondump and specified a prefix, you can use gpcrondump with the options --list-filter-tables and --prefix prefix\_string to list the tables that were included or excluded for the backup.

-q (no screen output)

Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.

-R hostname:path_to_dumpset

Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master.

--redirect database_name

Specify the name of the database where the data is restored. Specify this option to restore data to a database that is different than the database specified during back up. If database_name does not exist, it is created.

--report-status-dir report_directory

Specifies the absolute path to the directory on the each Greenplum Database host (master and segment hosts) where gpdbrestore writes report status files for a restore operation. If report_directory does not exist or is not writable, gpdbrestore returns an error and stops.

If this option is not specified and the -u option is specified, report status files are written to the location specified by the -u option if the -u location is writable. If the location specified by -u option is not writable, the report status files are written to segment data directories.

--restore-stats [include | only]

Specify this option to restore database statistics that were backed up with the gpcrondump utility option --dump-stats.

  • The keyword include restores the statistics that were backed up in addition to performing a restore. This is the default if no keyword is specified.
  • The keyword only restores only the statistics that were backed up. No other database objects or database data are restored.

If this option is specified with other options that include or exclude tables or schemas to restore, the utility restores statistics only for the tables specified to be restored.

If statistics would be restored for a table that does not exist in the database, the utility displays a warning. The statistics are not restored.

-s database_name

Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts.

-S schema_name

Restore all the tables, views, indexes, functions, and sequences qualified by the specified schema from the backup. The -S option can be specified multiple times. System catalog schemas are not supported. The schema name must exist in the backup set of the database being restored. To replace the data in the schema tables with the data from backup, you can specify the --truncate option. The schema tables are truncated before the data is restored.

To support functions that might be restored, all procedural languages that are in the backup are also restored even though they are not schema specific. External items such as shared libraries that are used by a language are not included in a backup.

The -S option cannot be specified with the --change-schema option.

-t timestamp_key

The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts.

-T schema.table_name

The name of a table to restore. The -T option can be specified multiple times. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. To replace existing data in the table from backup, you can specify the --truncate option.

Wildcard characters are not supported.

--table-file file_name

Specify a file file_name that contains a list of table names to restore. The file contains any number of table names, listed one per line. See the -T option for information about restoring specific tables.

--truncate

Truncate table data before restoring data to the table from the backup. If this option is not specified, existing table data is not removed before data is restored to the table.

This option is supported only when restoring a set of tables with the option -S, -T or --table-file. If a table to be restored does not exist in the database, the table is restored and the utility returns a warning message stating that the table did not exist in the database.

This option is not supported with the -e option.

-u backup_directory

Specifies the absolute path to the directory containing the db_dumps directory on each host. If not specified, defaults to the data directory of each instance to be backed up. Specify this option if you specified a backup directory with the gpcrondump option -u when creating a backup set.

If backup_directory is not writable, backup operation report status files are written to segment data directories. You can specify a different location where report status files are written with the --report-status-dir option.

Note: This option is not supported if --ddboost is specified.

-v | —verbose

Specifies verbose mode.

--version (show utility version)

Displays the version of this utility.

-? (help)

Displays the online help.

Examples

Restore the sales database from the latest backup files generated by gpcrondump (assumes backup files are in the segment data directories in db_dumps):

  1. gpdbrestore -s sales

Restore a database from backup files that reside on an archive host outside the Greenplum Database array (command issued on the Greenplum master host):

  1. gpdbrestore -R archivehostname:/data_p1/db_dumps/20160214

Restore global objects only (roles and tablespaces):

  1. gpdbrestore -G

Note: The -R option is not supported when restoring a backup set that includes incremental backups.

If you restore from a backup set that contains an incremental backup, all the files in the backup set must be available to gpdbrestore. For example, the following timestamp keys specify a backup set. 20160514054532 is the full backup and the others are incremental.

  1. 20160514054532
  2. 20160714095512
  3. 20150914081205
  4. 20161114064330
  5. 20170114051246

The following gbdbrestore command specifies the timestamp key 20161114064330. The incremental backup with the timestamps 20160714095512 and 20160914081205 and the full backup must be available to perform a restore.

  1. gpdbrestore -t 20161114064330

The following gbdbrestore command uses the --noplan option to restore only the data that was backed up during the incremental backup with the timestamp key 20161114064330. Data in the previous incremental backups and the data in the full backup are not restored. The --noplan option does not truncate existing tables prior to restoration. To avoid loading duplicate data into existing tables, you can truncate the tables before performing an incremental restore with the --noplan option.

  1. gpdbrestore -t 20161114064330 --noplan

This gpdbrestore command restores Greenplum Database data from the data managed by NetBackup master server nbu_server1. The option -t 20170530090000 specifies the timestamp generated by gpcrondump when the backup was created. The -e option specifies that the target database is dropped before it is restored.

  1. gpdbrestore -t 20170530090000 -e --netbackup-service-host=nbu_server1

See Also

gpcrondump