About gpfdist Setup and Performance
Consider the following scenarios for optimizing your ETL network performance.
- Allow network traffic to use all ETL host Network Interface Cards (NICs) simultaneously. Run one instance of
gpfdist
on the ETL host, then declare the host name of each NIC in theLOCATION
clause of your external table definition (see Creating External Tables - Examples).
Figure: External Table Using Single gpfdist Instance with Multiple NICs
- Divide external table data equally among multiple
gpfdist
instances on the ETL host. For example, on an ETL system with two NICs, run twogpfdist
instances (one on each NIC) to optimize data load performance and divide the external table data files evenly between the twogpfdists
.
Figure: External Tables Using Multiple gpfdist Instances with Multiple NICs
Note: Use pipes (|) to separate formatted text when you submit files to gpfdist
. HAWQ encloses comma-separated text strings in single or double quotes. gpfdist
has to remove the quotes to parse the strings. Using pipes to separate formatted text avoids the extra step and improves performance.