analyzedb
A utility that performs ANALYZE
operations on tables incrementally and concurrently.
Synopsis
analyzedb -d <dbname> -s <schema>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -t <schema>.<table>
[ -i col1[, col2, ...] | -x col1[, col2, ...] ]
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -f <config-file> | --file <config-file>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> --clean_last | --clean_all
analyzedb --version
analyzedb -? | -h | --help
Description
The analyzedb
utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently.
While performing ANALYZE
operations, analyzedb
creates a snapshot of the table metadata and stores it on disk on the master host. An ANALYZE
operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, analyzedb
automatically skips the table or partition because it already contains up-to-date statistics.
For a partitioned table analyzedb
analyzes only those partitions that have no statistics, or that have stale statistics. analyzedb
also refreshes the statistics on the root partition.
By default, analyzedb
creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, analyzedb
issues an ANALYZE
command to the database and specifies different table names. The -p
option controls the maximum number of concurrent sessions.
Notes
The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous analyzedb
operation. The snapshots of table metadata are stored as state files in the directory db_analyze
in the HAWQ master data directory. You can specify the --clean_last
or --clean_all
option to remove state files generated by analyzedb
.
If you do not specify a table, set of tables, or schema, the analyzedb
utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.
External tables are not affected by analyzedb
.
Table names that contain spaces are not supported.
Arguments
-d
Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable PGDATABASE
. If PGDATABASE
is not set, the user name specified for the connection is used.
-s
Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
-t
Collect statistics only on
-f
option to specify multiple tables in a file or the -s
option to specify all the tables in a schema.Only one of these options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
-f, —file
Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified.
The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the -i
or -x
. No other options are allowed in the file. Other options such as --full
must be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
When performing ANALYZE
operations on multiple tables, analyzedb
creates concurrent sessions to analyze tables in parallel. The -p
option controls the maximum number of concurrent sessions.
In the following example, the first line performs an ANALYZE
operation on the table public.nation
, the second line performs an ANALYZE
operation only on the columns l_shipdate
and l_receiptdate
in the table public.lineitem
.
public.nation
public.lineitem -i l_shipdate, l_receiptdate
Options
-x
Optional. Must be specified with the -t
option. For the table specified with the -t
option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.
Only -i
, or -x
can be specified. Both options cannot be specified.
-i
Optional. Must be specified with the -t
option. For the table specified with the -t
option, collect statistics only for the specified columns.
Only -i
, or -x
can be specified. Both options cannot be specified.
--full
Perform an ANALYZE
operation on all the specified tables. The operation is performed even if the statistics are up to date.
-l, —list
Lists the tables that would have been analyzed with the specified options. The ANALYZE
operations are not performed.
-p
The number of tables that are analyzed in parallel. The value for can be an integer between 1 and 10, inclusive. Default value is 5.
-a
Quiet mode. Do not prompt for user confirmation.
-v, —verbose
If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.
--clean_last
Remove the state files generated by last analyzedb
operation. All other options except -d
are ignored.
--clean_all
Remove all the state files generated by analyzedb
. All other options except-d
are ignored.
-h, -?, —help
Displays the online help.
--version
Displays the version of this utility.
Examples
An example that collects statistics only on a set of table columns. In the database mytest
, collect statistics on the columns shipdate
and receiptdate
in the table public.orders
:
$ analyzedb -d mytest -t public.orders -i shipdate, receiptdate
An example that collects statistics on a table and exclude a set of columns. In the database mytest
, collect statistics on the table public.foo
, and do not collect statistics on the columns bar
and test2
.
$ analyzedb -d mytest -t public.foo -x bar, test2
An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file analyze-tables
in the database named mytest
.
$ analyzedb -d mytest -f analyze-tables
If you do not specify a table, set of tables, or schema, the analyzedb
utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database mytest
.
$ analyzedb -d mytest