HDFS Configuration Reference
This reference page describes HDFS configuration values that are configured for HAWQ either within hdfs-site.xml
, core-site.xml
, or hdfs-client.xml
.
HDFS Site Configuration (hdfs-site.xml and core-site.xml)
This topic provides a reference of the HDFS site configuration values recommended for HAWQ installations. These parameters are located in either hdfs-site.xml
or core-site.xml
of your HDFS deployment.
This table describes the configuration parameters and values that are recommended for HAWQ installations. Only HDFS parameters that need to be modified or customized for HAWQ are listed.
Parameter | Description | Recommended Value for HAWQ Installs | Comments |
---|---|---|---|
dfs.allow.truncate | Allows truncate. | true | HAWQ requires that you enable dfs.allow.truncate . The HAWQ service will fail to start if dfs.allow.truncate is not set to true . |
dfs.block.access.token.enable | If true , access tokens are used as capabilities for accessing DataNodes. If false , no access tokens are checked on accessing DataNodes. | false for an unsecured HDFS cluster, or true for a secure cluster | |
dfs.block.local-path-access.user | Comma separated list of the users allowed to open block files on legacy short-circuit local read. | gpadmin | |
dfs.client.read.shortcircuit | This configuration parameter turns on short-circuit local reads. | true | In Ambari, this parameter corresponds to HDFS Short-circuit read. The value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml . |
dfs.client.socket-timeout | The amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds. | 300000000 | |
dfs.client.use.legacy.blockreader.local | Setting this value to false specifies that the new version of the short-circuit reader is used. Setting this value to true means that the legacy short-circuit reader would be used. | false | |
dfs.datanode.data.dir.perm | Permissions for the directories on on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic. | 750 | In Ambari, this parameter corresponds to DataNode directories permission |
dfs.datanode.handler.count | The number of server threads for the DataNode. | 60 | |
dfs.datanode.max.transfer.threads | Specifies the maximum number of threads to use for transferring data in and out of the DataNode. | 40960 | In Ambari, this parameter corresponds to DataNode max data transfer threads |
dfs.datanode.socket.write.timeout | The amount of time before a write operation times out, expressed in milliseconds. | 7200000 | |
dfs.domain.socket.path | (Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string “_PORT” is present in this path, it is replaced by the TCP port of the DataNode. | If set, the value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml . | |
dfs.namenode.accesstime.precision | The access time for HDFS file is precise up to this value. Setting a value of 0 disables access times for HDFS. | 0 | In Ambari, this parameter corresponds to Access time precision |
dfs.namenode.handler.count | The number of server threads for the NameNode. | 600 | |
dfs.support.append | Whether HDFS is allowed to append to files. | true | |
ipc.client.connection.maxidletime | The maximum time in milliseconds after which a client will bring down the connection to the server. | 3600000 | In core-site.xml |
ipc.client.connect.timeout | Indicates the number of milliseconds a client will wait for the socket to establish a server connection. | 300000 | In core-site.xml |
ipc.server.listen.queue.size | Indicates the length of the listen queue for servers accepting client connections. | 3300 | In core-site.xml |
HDFS Client Configuration (hdfs-client.xml)
This topic provides a reference of the HAWQ configuration values located in $GPHOME/etc/hdfs-client.xml
.
This table describes the configuration parameters and their default values:
Parameter | Description | Default Value | Comments |
---|---|---|---|
dfs.client.failover.max.attempts | The maximum number of times that the DFS client retries issuing a RPC call when multiple NameNodes are configured. | 15 | |
dfs.client.log.severity | The minimal log severity level. Valid values include: FATAL, ERROR, INFO, DEBUG1, DEBUG2, and DEBUG3. | INFO | |
dfs.client.read.shortcircuit | Determines whether the DataNode is bypassed when reading file blocks, if the block and client are on the same node. The default value, true, bypasses the DataNode. | true | The value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml . |
dfs.client.use.legacy.blockreader.local | Determines whether the legacy short-circuit reader implementation, based on HDFS-2246, is used. Set this property to true on non-Linux platforms that do not have the new implementation based on HDFS-347. | false | |
dfs.default.blocksize | Default block size, in bytes. | 134217728 | Default is equivalent to 128 MB. |
dfs.default.replica | The default number of replicas. | 3 | |
dfs.domain.socket.path | (Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string “_PORT” is present in this path, it is replaced by the TCP port of the DataNode. | If set, the value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml . | |
dfs.prefetchsize | The number of blocks for which information is pre-fetched. | 10 | |
hadoop.security.authentication | Specifies the type of RPC authentication to use. A value of simple indicates no authentication. A value of kerberos enables authentication by Kerberos. | simple | |
input.connect.timeout | The timeout interval, in milliseconds, for when the input stream is setting up a connection to a DataNode. | 600000 | Default is equal to 1 hour. |
input.localread.blockinfo.cachesize | The size of the file block path information cache, in bytes. | 1000 | |
input.localread.default.buffersize | The size of the buffer, in bytes, used to hold data from the file block and verify the checksum. This value is used only when dfs.client.read.shortcircuit is set to true. | 1048576 | Default is equal to 1MB. Only used when is set to true. If an older version of |
input.read.getblockinfo.retry | The maximum number of times the client should retry getting block information from the NameNode. | 3 | |
input.read.timeout | The timeout interval, in milliseconds, for when the input stream is reading from a DataNode. | 3600000 | Default is equal to 1 hour. |
input.write.timeout | The timeout interval, in milliseconds, for when the input stream is writing to a DataNode. | 3600000 | |
output.close.timeout | The timeout interval for closing an output stream, in milliseconds. | 900000 | Default is equal to 1.5 hours. |
output.connect.timeout | The timeout interval, in milliseconds, for when the output stream is setting up a connection to a DataNode. | 600000 | Default is equal to 10 minutes. |
output.default.chunksize | The chunk size of the pipeline, in bytes. | 512 | |
output.default.packetsize | The packet size of the pipeline, in bytes. | 65536 | Default is equal to 64KB. |
output.default.write.retry | The maximum number of times that the client should reattempt to set up a failed pipeline. | 10 | |
output.packetpool.size | The maximum number of packets in a file’s packet pool. | 1024 | |
output.read.timeout | The timeout interval, in milliseconds, for when the output stream is reading from a DataNode. | 3600000 | Default is equal to 1 hour. |
output.replace-datanode-on-failure | Determines whether the client adds a new DataNode to pipeline if the number of nodes in the pipeline is less than the specified number of replicas. | false (if # of nodes less than or equal to 4), otherwise true | When you deploy a HAWQ cluster, the hawq init utility detects the number of nodes in the cluster and updates this configuration parameter accordingly. However, when expanding an existing cluster to 4 or more nodes, you must manually set this value to true. Set to false if you remove existing nodes and fall under 4 nodes. |
output.write.timeout | The timeout interval, in milliseconds, for when the output stream is writing to a DataNode. | 3600000 | Default is equal to 1 hour. |
rpc.client.connect.retry | The maximum number of times to retry a connection if the RPC client fails connect to the server. | 10 | |
rpc.client.connect.tcpnodelay | Determines whether TCP_NODELAY is used when connecting to the RPC server. | true | |
rpc.client.connect.timeout | The timeout interval for establishing the RPC client connection, in milliseconds. | 600000 | Default equals 10 minutes. |
rpc.client.max.idle | The maximum idle time for an RPC connection, in milliseconds. | 10000 | Default equals 10 seconds. |
rpc.client.ping.interval | The interval which the RPC client send a heart beat to server. 0 means disable. | 10000 | |
rpc.client.read.timeout | The timeout interval, in milliseconds, for when the RPC client is reading from the server. | 3600000 | Default equals 1 hour. |
rpc.client.socket.linger.timeout | The value to set for the SO_LINGER socket when connecting to the RPC server. | -1 | |
rpc.client.timeout | The timeout interval of an RPC invocation, in milliseconds. | 3600000 | Default equals 1 hour. |
rpc.client.write.timeout | The timeout interval, in milliseconds, for when the RPC client is writing to the server. | 3600000 | Default equals 1 hour. |