hawq extract
Extracts the metadata of a specified table into a YAML file.
Synopsis
hawq extract [<connection_options>] [-o <output_file>] <tablename>
hawq extract -?
hawq extract --version
where:
<connection_options> =
[-h <host>]
[-p <port>]
[-U <username>]
[-d <database>]
[-W]
Description
hawq extract
is a utility that extracts a table’s metadata into a YAML formatted file. HAWQ’s InputFormat uses this YAML-formatted file to read a HAWQ file stored on HDFS directly into the MapReduce program. The YAML configuration file can also be used provide the metadata for registering files into HAWQ with the hawq register
command.
Note: hawq extract
is bound by the following rules:
- You must start up HAWQ to use
hawq extract
. hawq extract
only supports AO and Parquet tables.hawq extract
supports partitioned tables, but does not support sub-partitions.
Arguments
<tablename>
Name of the table that you need to extract metadata. You can use the format namespace_name.table_name.
Options
-o <output_file>
Is the name of a file that hawq extract
uses to write the metadata. If you do not specify a name, hawq extract
writes to stdout
.
-v (verbose mode)
Displays the verbose output of the extraction process.
-? (help)
Displays the online help.
--version
Displays the version of this utility.
<connection_options>
-h <host>
Specifies the host name of the machine on which the HAWQ master database server is running. If not specified, it reads from the environment variable $PGHOST
or defaults to localhost
.
-p <port>
Specifies the TCP port on which the HAWQ master database server is listening for connections. If not specified, reads from the environment variable $PGPORT
or defaults to 5432.
-U <username>
The database role name to connect as. If not specified, reads from the environment variable $PGUSER
or defaults to the current system user name.
-d <database>
The database to connect to. If not specified, it reads from the environment variable $PGDATABASE
or defaults to template1
.
-W (force password prompt)
Force a password prompt. If not specified, reads the password from the environment variable $PGPASSWORD
or from a password file specified by $PGPASSFILE
or in ~/.pgpass
.
Metadata File Format
hawq extract
exports the table metadata into a file using YAML 1.1 document format. The file contains various key information about the table, such as table schema, data file locations and sizes, partition constraints and so on.
The basic structure of the metadata file is as follows:
Version: string (1.0.0)
DBVersion: string
FileFormat: string (AO/Parquet)
TableName: string (schemaname.tablename)
DFS_URL: string (hdfs://127.0.0.1:9000)
Encoding: UTF8
AO_Schema:
- name: string
type: string
Bucketnum: 6
Distribution_policy: DISTRIBUTED RANDOMLY
AO_FileLocations:
Blocksize: int
Checksum: boolean
CompressionType: string
CompressionLevel: int
PartitionBy: string ('PARTITION BY ...')
Files:
- path: string (/gpseg0/16385/35469/35470.1)
size: long
Partitions:
- Blocksize: int
Checksum: Boolean
CompressionType: string
CompressionLevel: int
Name: string
Constraint: string (PARTITION Jan08 START (date '2008-01-01') INCLUSIVE)
Files:
- path: string
size: long
Parquet_Schema:
- name: string
type: string
Parquet_FileLocations:
RowGroupSize: long
PageSize: long
CompressionType: string
CompressionLevel: int
Checksum: boolean
EnableDictionary: boolean
PartitionBy: string
Files:
- path: string
size: long
Partitions:
- Name: string
RowGroupSize: long
PageSize: long
CompressionType: string
CompressionLevel: int
Checksum: boolean
EnableDictionary: boolean
Constraint: string
Files:
- path: string
size: long
Example - Extracting an AO table
Extract the rank
table’s metadata into a file named rank_table.yaml
:
$ hawq extract -o rank_table.yaml -d postgres rank
Output content in rank_table.yaml
AO_FileLocations:
Blocksize: 32768
Checksum: false
CompressionLevel: 0
CompressionType: null
Files:
- path: /gpseg0/16385/35469/35692.1
size: 0
- path: /gpseg1/16385/35469/35692.1
size: 0
PartitionBy: PARTITION BY list (gender)
Partitions:
- Blocksize: 32768
Checksum: false
CompressionLevel: 0
CompressionType: null
Constraint: PARTITION girls VALUES('F') WITH (appendonly=true)
Files:
- path: /gpseg0/16385/35469/35697.1
size: 0
- path: /gpseg1/16385/35469/35697.1
size: 0
Name: girls
- Blocksize: 32768
Checksum: false
CompressionLevel: 0
CompressionType: null
Constraint: PARTITION boys VALUES('M') WITH (appendonly=true)
Files:
- path: /gpseg0/16385/35469/35703.1
size: 0
- path: /gpseg1/16385/35469/35703.1
size: 0
Name: boys
- Blocksize: 32768
Checksum: false
CompressionLevel: 0
CompressionType: null
Constraint: DEFAULT PARTITION other WITH appendonly=true)
Files:
- path: /gpseg0/16385/35469/35709.1
size: 90071728
- path: /gpseg1/16385/35469/35709.1
size: 90071512
Name: other
AO_Schema:
- name: id
type: int4
- name: rank
type: int4
- name: year
type: int4
- name: gender
type: bpchar
- name: count
type: int4
DFS_URL: hdfs://127.0.0.1:9000
Distribution_policy: DISTRIBUTED RANDOMLY
Encoding: UTF8
FileFormat: AO
TableName: public.rank
Version: 1.0.0
Example - Extracting a Parquet table
Extract the orders
table’s metadata into a file named orders.yaml
:
$ hawq extract -o orders.yaml -d postgres orders
Output content in orders.yaml
DFS_URL: hdfs://127.0.0.1:9000
Encoding: UTF8
FileFormat: Parquet
TableName: public.orders
Version: 1.0.0
Parquet_FileLocations:
Checksum: false
CompressionLevel: 0
CompressionType: none
EnableDictionary: false
Files:
- path: /hawq-data/gpseg0/16385/16626/16657.1
size: 0
- path: /hawq-data/gpseg1/16385/16626/16657.1
size: 0
PageSize: 1048576
PartitionBy: PARTITION BY range (o_orderdate)
Partitions:
- Checksum: false
CompressionLevel: 0
CompressionType: none
Constraint: PARTITION p1_1 START ('1992-01-01'::date) END ('1994-12-31'::date)
EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
rowgroupsize=8388608, compresstype=none, compresslevel=0)
EnableDictionary: false
Files:
- path: /hawq-data/gpseg0/16385/16626/16662.1
size: 8140599
- path: /hawq-data/gpseg1/16385/16626/16662.1
size: 8099760
Name: orders_1_prt_p1_1
PageSize: 1048576
RowGroupSize: 8388608
- Checksum: false
CompressionLevel: 0
CompressionType: none
Constraint: PARTITION p1_11 START ('1995-01-01'::date) END ('1997-12-31'::date)
EVERY ('e years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
rowgroupsize=8388608, compresstype=none, compresslevel=0)
EnableDictionary: false
Files:
- path: /hawq-data/gpseg0/16385/16626/16668.1
size: 8088559
- path: /hawq-data/gpseg1/16385/16626/16668.1
size: 8075056
Name: orders_1_prt_p1_11
PageSize: 1048576
RowGroupSize: 8388608
- Checksum: false
CompressionLevel: 0
CompressionType: none
Constraint: PARTITION p1_21 START ('1998-01-01'::date) END ('2000-12-31'::date)
EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
rowgroupsize=8388608, compresstype=none, compresslevel=0)
EnableDictionary: false
Files:
- path: /hawq-data/gpseg0/16385/16626/16674.1
size: 8065770
- path: /hawq-data/gpseg1/16385/16626/16674.1
size: 8126669
Name: orders_1_prt_p1_21
PageSize: 1048576
RowGroupSize: 8388608
RowGroupSize: 8388608
Parquet_Schema:
- name: o_orderkey
type: int8
- name: o_custkey
type: int4
- name: o_orderstatus
type: bpchar
- name: o_totalprice
type: numeric
- name: o_orderdate
type: date
- name: o_orderpriority
type: bpchar
- name: o_clerk
type: bpchar
- name: o_shippriority
type: int4
- name: o_comment
type: varchar
Distribution_policy: DISTRIBUTED RANDOMLY