gpfdist Protocol
The gpfdist://
protocol is used in a URI to reference a running gpfdist
instance. The gpfdist
utility serves external data files from a directory on a file host to all HAWQ segments in parallel.
gpfdist
is located in the $GPHOME/bin
directory on your HAWQ master host and on each segment host.
Run gpfdist
on the host where the external data files reside. gpfdist
uncompresses gzip
(.gz
) and bzip2
(.bz2
) files automatically. You can use the wildcard character (*) or other C-style pattern matching to denote multiple files to read. The files specified are assumed to be relative to the directory that you specified when you started the gpfdist
instance.
All virtual segments access the external file(s) in parallel, subject to the number of segments set in the gp_external_max_segments
parameter, the length of the gpfdist
location list, and the limits specified by the hawq_rm_nvseg_perquery_limit
and hawq_rm_nvseg_perquery_perseg_limit
parameters. Use multiple gpfdist
data sources in a CREATE EXTERNAL TABLE
statement to scale the external table’s scan performance. For more information about configuring gpfdist
, see Using the HAWQ File Server (gpfdist).
See the gpfdist
reference documentation for more information about using gpfdist
with external tables.