Loading Data with hawq load
The HAWQ hawq load
utility loads data using readable external tables and the HAWQ parallel file server ( gpfdist
or gpfdists
). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and gpfdist
or gpfdists
setup in a single configuration file.
To use hawq load
- Ensure that your environment is set up to run
hawq load
. Some dependent files from your HAWQ /> installation are required, such asgpfdist
and Python, as well as network access to the HAWQ segment hosts. Create your load control file. This is a YAML-formatted file that specifies the HAWQ connection information,
gpfdist
configuration information, external table options, and data format.For example:
---
VERSION: 1.0.0.1
DATABASE: ops
USER: gpadmin
HOST: mdw-1
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- etl1-1
- etl1-2
- etl1-3
- etl1-4
PORT: 8081
FILE:
- /var/load/data/*
- COLUMNS:
- name: text
- amount: float4
- category: text
- description: text
- date: date
- FORMAT: text
- DELIMITER: '|'
- ERROR_LIMIT: 25
- ERROR_TABLE: payables.err_expenses
OUTPUT:
- TABLE: payables.expenses
- MODE: INSERT
SQL:
- BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)"
- AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
Run
hawq load
, passing in the load control file. For example:$ hawq load -f my_load.yml