hawq extract

Extracts the metadata of a specified table into a YAML file.

Synopsis

  1. hawq extract [<connection_options>] [-o <output_file>] <tablename>
  2. hawq extract -?
  3. hawq extract --version

where:

  1. <connection_options> =
  2. [-h <host>]
  3. [-p <port>]
  4. [-U <username>]
  5. [-d <database>]
  6. [-W]

Description

hawq extract is a utility that extracts a table’s metadata into a YAML formatted file. HAWQ’s InputFormat uses this YAML-formatted file to read a HAWQ file stored on HDFS directly into the MapReduce program. The YAML configuration file can also be used provide the metadata for registering files into HAWQ with the hawq register command.

Note: hawq extract is bound by the following rules:

  • You must start up HAWQ to use hawq extract.
  • hawq extract only supports AO and Parquet tables.
  • hawq extract supports partitioned tables, but does not support sub-partitions.

Arguments

<tablename>

Name of the table that you need to extract metadata. You can use the format namespace_name.table_name.

Options

-o <output_file>

Is the name of a file that hawq extract uses to write the metadata. If you do not specify a name, hawq extract writes to stdout.

-v (verbose mode)

Displays the verbose output of the extraction process.

-? (help)

Displays the online help.

--version

Displays the version of this utility.

<connection_options>

-h <host>

Specifies the host name of the machine on which the HAWQ master database server is running. If not specified, it reads from the environment variable $PGHOST or defaults to localhost.

-p <port>

Specifies the TCP port on which the HAWQ master database server is listening for connections. If not specified, reads from the environment variable $PGPORT or defaults to 5432.

-U <username>

The database role name to connect as. If not specified, reads from the environment variable $PGUSER or defaults to the current system user name.

-d <database>

The database to connect to. If not specified, it reads from the environment variable $PGDATABASE or defaults to template1.

-W (force password prompt)

Force a password prompt. If not specified, reads the password from the environment variable $PGPASSWORD or from a password file specified by $PGPASSFILE or in ~/.pgpass.

Metadata File Format

hawq extract exports the table metadata into a file using YAML 1.1 document format. The file contains various key information about the table, such as table schema, data file locations and sizes, partition constraints and so on.

The basic structure of the metadata file is as follows:

  1. Version: string (1.0.0)
  2. DBVersion: string
  3. FileFormat: string (AO/Parquet)
  4. TableName: string (schemaname.tablename)
  5. DFS_URL: string (hdfs://127.0.0.1:9000)
  6. Encoding: UTF8
  7. AO_Schema:
  8. - name: string
  9. type: string
  10. Bucketnum: 6
  11. Distribution_policy: DISTRIBUTED RANDOMLY
  12. AO_FileLocations:
  13. Blocksize: int
  14. Checksum: boolean
  15. CompressionType: string
  16. CompressionLevel: int
  17. PartitionBy: string ('PARTITION BY ...')
  18. Files:
  19. - path: string (/gpseg0/16385/35469/35470.1)
  20. size: long
  21. Partitions:
  22. - Blocksize: int
  23. Checksum: Boolean
  24. CompressionType: string
  25. CompressionLevel: int
  26. Name: string
  27. Constraint: string (PARTITION Jan08 START (date '2008-01-01') INCLUSIVE)
  28. Files:
  29. - path: string
  30. size: long
  31. Parquet_Schema:
  32. - name: string
  33. type: string
  34. Parquet_FileLocations:
  35. RowGroupSize: long
  36. PageSize: long
  37. CompressionType: string
  38. CompressionLevel: int
  39. Checksum: boolean
  40. EnableDictionary: boolean
  41. PartitionBy: string
  42. Files:
  43. - path: string
  44. size: long
  45. Partitions:
  46. - Name: string
  47. RowGroupSize: long
  48. PageSize: long
  49. CompressionType: string
  50. CompressionLevel: int
  51. Checksum: boolean
  52. EnableDictionary: boolean
  53. Constraint: string
  54. Files:
  55. - path: string
  56. size: long

Example - Extracting an AO table

Extract the rank table’s metadata into a file named rank_table.yaml:

  1. $ hawq extract -o rank_table.yaml -d postgres rank

Output content in rank_table.yaml

  1. AO_FileLocations:
  2. Blocksize: 32768
  3. Checksum: false
  4. CompressionLevel: 0
  5. CompressionType: null
  6. Files:
  7. - path: /gpseg0/16385/35469/35692.1
  8. size: 0
  9. - path: /gpseg1/16385/35469/35692.1
  10. size: 0
  11. PartitionBy: PARTITION BY list (gender)
  12. Partitions:
  13. - Blocksize: 32768
  14. Checksum: false
  15. CompressionLevel: 0
  16. CompressionType: null
  17. Constraint: PARTITION girls VALUES('F') WITH (appendonly=true)
  18. Files:
  19. - path: /gpseg0/16385/35469/35697.1
  20. size: 0
  21. - path: /gpseg1/16385/35469/35697.1
  22. size: 0
  23. Name: girls
  24. - Blocksize: 32768
  25. Checksum: false
  26. CompressionLevel: 0
  27. CompressionType: null
  28. Constraint: PARTITION boys VALUES('M') WITH (appendonly=true)
  29. Files:
  30. - path: /gpseg0/16385/35469/35703.1
  31. size: 0
  32. - path: /gpseg1/16385/35469/35703.1
  33. size: 0
  34. Name: boys
  35. - Blocksize: 32768
  36. Checksum: false
  37. CompressionLevel: 0
  38. CompressionType: null
  39. Constraint: DEFAULT PARTITION other WITH appendonly=true)
  40. Files:
  41. - path: /gpseg0/16385/35469/35709.1
  42. size: 90071728
  43. - path: /gpseg1/16385/35469/35709.1
  44. size: 90071512
  45. Name: other
  46. AO_Schema:
  47. - name: id
  48. type: int4
  49. - name: rank
  50. type: int4
  51. - name: year
  52. type: int4
  53. - name: gender
  54. type: bpchar
  55. - name: count
  56. type: int4
  57. DFS_URL: hdfs://127.0.0.1:9000
  58. Distribution_policy: DISTRIBUTED RANDOMLY
  59. Encoding: UTF8
  60. FileFormat: AO
  61. TableName: public.rank
  62. Version: 1.0.0

Example - Extracting a Parquet table

Extract the orders table’s metadata into a file named orders.yaml:

  1. $ hawq extract -o orders.yaml -d postgres orders

Output content in orders.yaml

  1. DFS_URL: hdfs://127.0.0.1:9000
  2. Encoding: UTF8
  3. FileFormat: Parquet
  4. TableName: public.orders
  5. Version: 1.0.0
  6. Parquet_FileLocations:
  7. Checksum: false
  8. CompressionLevel: 0
  9. CompressionType: none
  10. EnableDictionary: false
  11. Files:
  12. - path: /hawq-data/gpseg0/16385/16626/16657.1
  13. size: 0
  14. - path: /hawq-data/gpseg1/16385/16626/16657.1
  15. size: 0
  16. PageSize: 1048576
  17. PartitionBy: PARTITION BY range (o_orderdate)
  18. Partitions:
  19. - Checksum: false
  20. CompressionLevel: 0
  21. CompressionType: none
  22. Constraint: PARTITION p1_1 START ('1992-01-01'::date) END ('1994-12-31'::date)
  23. EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
  24. rowgroupsize=8388608, compresstype=none, compresslevel=0)
  25. EnableDictionary: false
  26. Files:
  27. - path: /hawq-data/gpseg0/16385/16626/16662.1
  28. size: 8140599
  29. - path: /hawq-data/gpseg1/16385/16626/16662.1
  30. size: 8099760
  31. Name: orders_1_prt_p1_1
  32. PageSize: 1048576
  33. RowGroupSize: 8388608
  34. - Checksum: false
  35. CompressionLevel: 0
  36. CompressionType: none
  37. Constraint: PARTITION p1_11 START ('1995-01-01'::date) END ('1997-12-31'::date)
  38. EVERY ('e years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
  39. rowgroupsize=8388608, compresstype=none, compresslevel=0)
  40. EnableDictionary: false
  41. Files:
  42. - path: /hawq-data/gpseg0/16385/16626/16668.1
  43. size: 8088559
  44. - path: /hawq-data/gpseg1/16385/16626/16668.1
  45. size: 8075056
  46. Name: orders_1_prt_p1_11
  47. PageSize: 1048576
  48. RowGroupSize: 8388608
  49. - Checksum: false
  50. CompressionLevel: 0
  51. CompressionType: none
  52. Constraint: PARTITION p1_21 START ('1998-01-01'::date) END ('2000-12-31'::date)
  53. EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576,
  54. rowgroupsize=8388608, compresstype=none, compresslevel=0)
  55. EnableDictionary: false
  56. Files:
  57. - path: /hawq-data/gpseg0/16385/16626/16674.1
  58. size: 8065770
  59. - path: /hawq-data/gpseg1/16385/16626/16674.1
  60. size: 8126669
  61. Name: orders_1_prt_p1_21
  62. PageSize: 1048576
  63. RowGroupSize: 8388608
  64. RowGroupSize: 8388608
  65. Parquet_Schema:
  66. - name: o_orderkey
  67. type: int8
  68. - name: o_custkey
  69. type: int4
  70. - name: o_orderstatus
  71. type: bpchar
  72. - name: o_totalprice
  73. type: numeric
  74. - name: o_orderdate
  75. type: date
  76. - name: o_orderpriority
  77. type: bpchar
  78. - name: o_clerk
  79. type: bpchar
  80. - name: o_shippriority
  81. type: int4
  82. - name: o_comment
  83. type: varchar
  84. Distribution_policy: DISTRIBUTED RANDOMLY

See Also

hawq load, hawq register