analyzedb
A utility that performs ANALYZE
operations on tables incrementally and concurrently. For append optimized tables, analyzedb updates statistics only if the statistics are not current.
Synopsis
analyzedb -d <dbname>
{ -s <schema> |
{ -t <schema>.<table>
[ -i <col1>[,<col2>, ...] |
-x <col1>[,<col2>, ...] ] } |
{ -f | --file} <config-file> }
[ -l | --list ]
[ --gen_profile_only ]
[ -p <parallel-level> ]
[ --full ]
[ --skip_root_stats ]
[ --skip_orca_root_stats ]
[ -v | --verbose ]
[ -a ]
analyzedb { --clean_last | --clean_all }
analyzedb --version
analyzedb { -? | -h | --help }
Description
The analyzedb utility updates statistics on table data for the specified tables in a Greenplum database incrementally and concurrently.
While performing ANALYZE operations, analyzedb creates a snapshot of the table metadata and stores it on disk on the master host. An ANALYZE
operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, analyzedb automatically skips the table or partition because it already contains up-to-date statistics.
- For append optimized tables, analyzedb updates statistics incrementally, if the statistics are not current. For example, if table data is changed after statistics were collected for the table. If there are no statistics for the table, statistics are collected.
- For heap tables, statistics are always updated.
Specify the --full
option to update append-optimized table statistics even if the table statistics are current.
By default, analyzedb creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, analyzedb issues an ANALYZE command to the database and specifies different table names. The -p
option controls the maximum number of concurrent sessions.
Partitioned Append-Optimized Tables
For a partitioned, append-optimized table, analyzedb checks the partitioned table root partition and leaf partitions. If needed, the utility updates statistics for non-current partitions and the root partition. For information about how statistics are collected for partitioned tables, see ANALYZE.
analyzedb
must sample additional partitions within a partitioned table when it encounters a stale partition, even when statistics are already collected. Consider it a best practice to run analyzedb
on the root partition any time that you add a new partition(s) to a partitioned table. This operation both analyzes the child leaf partitions in parallel and merges any updated statistics into the root partition.
Notes
The analyzedb utility updates append optimized table statistics if the table has been modified by DML or DDL commands, including INSERT, DELETE, UPDATE, CREATE TABLE, ALTER TABLE and TRUNCATE. The utility determines if a table has been modified by comparing catalog metadata of tables with the previous snapshot of metadata taken during a previous analyzedb operation. The snapshots of table metadata are stored as state files in the directory db_analyze/<db_name>/<timestamp>
in the Greenplum Database master data directory.
The utility preserves old snapshot information from the past 8 days, and the 3 most recent state directories regardless of age, while all other directories are automatically removed. You can also specify the --clean_last
or --clean_all
option to remove state files generated by analyzedb.
If you do not specify a table, set of tables, or schema, the analyzedb utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.
External tables are not affected by analyzedb.
Table names that contain spaces are not supported.
Running the ANALYZE
command on a table, not using the analyzedb
utility, does not update the table metadata that the analyzedb
utility uses to determine whether table statistics are up to date.
Options
--clean_last
Remove the state files generated by last analyzedb operation. All other options except -d are ignored.
--clean_all
Remove all the state files generated by analyzedb. All other options except -d are ignored.
-d dbname
Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable PGDATABASE
. If PGDATABASE
is not set, the user name specified for the connection is used.
-f config-file | —file config-file
Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified.
The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the -i or -x. No other options are allowed in the file. Other options such as --full
must be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
When performing ANALYZE operations on multiple tables, analyzedb creates concurrent sessions to analyze tables in parallel. The -p
option controls the maximum number of concurrent sessions.
In the following example, the first line performs an ANALYZE operation on the table public.nation
, the second line performs an ANALYZE operation only on the columns l_shipdate
and l_receiptdate
in the table public.lineitem
.
public.nation
public.lineitem -i l_shipdate,l_receiptdate
--full
Perform an ANALYZE operation on all the specified tables. The operation is performed even if the statistics are up to date.
--gen_profile_only
Update the analyzedb
snapshot of table statistics information without performing any ANALYZE
operations. If other options specify tables or a schema, the utility updates the snapshot information only for the specified tables.
Specify this option if the ANALYZE
command was run on database tables and you want to update the analyzedb
snapshot for the tables.
-i col1,col2, …
Optional. Must be specified with the -t option. For the table specified with the -t option, collect statistics only for the specified columns.
Only -i, or -x can be specified. Both options cannot be specified.
-l | —list
Lists the tables that would have been analyzed with the specified options. The ANALYZE operations are not performed.
-p parallel-level
The number of tables that are analyzed in parallel. parallel level can be an integer between 1 and 10, inclusive. Default value is 5.
--skip_root_stats
This option is no longer used, you may remove it from your scripts.
--skip_orca_root_stats
Note
Do not use this option if GPORCA is enabled.
Use this option if you find that ANALYZE ROOTPARTITION
commands take a very long time to complete.
Warning: After you run analyzedb
with this option, subsequent analyzedb
executions will not update root partition statistics except when changes have been made to the table.
-s schema
Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
-t schema.table
Collect statistics only on schema.table. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the -f
option to specify multiple tables in a file or the -s
option to specify all the tables in a schema.
Only one of these options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
-x col1,col2, …
Optional. Must be specified with the -t option. For the table specified with the -t option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.
Only -i, or -x can be specified. Both options cannot be specified.
-a
Quiet mode. Do not prompt for user confirmation.
-h | -? | —help
Displays the online help.
-v | —verbose
If specified, sets the logging level to verbose to write additional information the log file and to the command line during command execution. The information includes a list of all the tables to be analyzed (including child leaf partitions of partitioned tables). Output also includes the duration of each ANALYZE
operation.
--version
Displays the version of this utility.
Examples
An example that collects statistics only on a set of table columns. In the database mytest
, collect statistics on the columns shipdate
and receiptdate
in the table public.orders
:
analyzedb -d mytest -t public.orders -i shipdate,receiptdate
An example that collects statistics on a table and exclude a set of columns. In the database mytest
, collect statistics on the table public.foo
, and do not collect statistics on the columns bar
and test2
.
analyzedb -d mytest -t public.foo -x bar,test2
An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file analyze-tables
in the database named mytest
.
analyzedb -d mytest -f analyze-tables
If you do not specify a table, set of tables, or schema, the analyzedb utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database mytest
.
analyzedb -d mytest
You can create a PL/Python function to run the analyzedb
utility as a Greenplum Database function. This example CREATE FUNCTION
command creates a user defined PL/Python function that runs the analyzedb
utility and displays output on the command line. Specify analyzedb
options as the function parameter.
CREATE OR REPLACE FUNCTION analyzedb(params TEXT)
RETURNS VOID AS
$BODY$
import subprocess
cmd = ['analyzedb', '-a' ] + params.split()
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# verbose output of process
for line in iter(p.stdout.readline, ''):
plpy.info(line);
p.wait()
$BODY$
LANGUAGE plpythonu VOLATILE;
When this SELECT
command is run by the gpadmin user, the analyzedb
utility performs an analyze operation on the table public.mytable
that is in the database mytest
.
SELECT analyzedb('-d mytest -t public.mytable') ;
Note
To create a PL/Python function, the PL/Python procedural language must be registered as a language in the database. For example, this
CREATE LANGUAGE
command run as gpadmin registers PL/Python as an untrusted language:
CREATE LANGUAGE plpythonu;