List of Configuration Properties
All Alluxio configuration settings fall into one of the six categories: Common (shared by Master and Worker), Master specific, Worker specific, User specific, Cluster specific (used for running Alluxio with cluster managers like Mesos and YARN), and Security specific (shared by Master, Worker, and User).
Common Configuration
The common configuration contains constants shared by different components.
Property Name | Default | Description |
---|---|---|
alluxio.conf.dir | ${alluxio.home}/conf | The directory containing files used to configure Alluxio. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.conf.dir”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.debug | false | Set to true to enable debug mode which has additional logging and info in the Web UI. |
alluxio.extensions.dir | ${alluxio.home}/extensions | The directory containing Alluxio extensions. |
alluxio.fuse.cached.paths.max | 500 | Maximum number of Alluxio paths to cache for FUSE conversion. |
alluxio.fuse.debug.enabled | false | Run FUSE in debug mode, and have the fuse process log every FS request. |
alluxio.fuse.fs.name | alluxio-fuse | The FUSE file system name. |
alluxio.fuse.jnifuse.enabled | true | Use JNI-Fuse library for better performance. If disabled, JNR-Fuse will be used. |
alluxio.fuse.logging.threshold | 10s | Logging a FUSE API call when it takes more time than the threshold. |
alluxio.fuse.maxwrite.bytes | 128KB | Maximum granularity of write operations, capped by the kernel to 128KB max (as of Linux 3.16.0). |
alluxio.fuse.shared.caching.reader.enabled | false | (Experimental) Use share grpc data reader for better performance on multi-process file reading through Alluxio JNI Fuse. Blocks data will be cached on the client side so more memory is required for the Fuse process. |
alluxio.fuse.user.group.translation.enabled | false | Whether to translate Alluxio users and groups into Unix users and groups when exposing Alluxio files through the FUSE API. When this property is set to false, the user and group for all FUSE files will match the user who started the alluxio-fuse process. |
alluxio.home | /opt/alluxio | Alluxio installation directory. |
alluxio.job.master.bind.host | 0.0.0.0 | The host that the Alluxio job master will bind to. |
alluxio.job.master.client.threads | 1024 | The number of threads the Alluxio master uses to make requests to the job master. |
alluxio.job.master.embedded.journal.addresses | A comma-separated list of journal addresses for all job masters in the cluster. The format is ‘hostname1:port1,hostname2:port2,…’. Defaults to the journal addresses set for the Alluxio masters (alluxio.master.embedded.journal.addresses), but with the job master embedded journal port. | |
alluxio.job.master.embedded.journal.port | 20003 | The port to use for embedded journal communication with other job masters. |
alluxio.job.master.finished.job.purge.count | -1 | The maximum amount of jobs to purge at any single time when the job master reaches its maximum capacity. It is recommended to set this value when setting the capacity of the job master to a large ( > 10M) value. Default is -1 denoting an unlimited value |
alluxio.job.master.finished.job.retention.time | 60sec | The length of time the Alluxio Job Master should save information about completed jobs before they are discarded. |
alluxio.job.master.hostname | ${alluxio.master.hostname} | The hostname of the Alluxio job master. |
alluxio.job.master.job.capacity | 100000 | The total possible number of available job statuses in the job master. This value includes running and finished jobs which are have completed within alluxio.job.master.finished.job.retention.time. |
alluxio.job.master.lost.worker.interval | 1sec | The time interval the job master waits between checks for lost workers. |
alluxio.job.master.rpc.addresses | The list of RPC addresses to use for the job service configured in non-zookeeper HA mode. If this property is not specifically defined, it will first fall back to using alluxio.master.rpc.addresses, replacing those address ports with the port defined by alluxio.job.master.rpc.port. Otherwise the addresses are inherited from alluxio.job.master.embedded.journal.addresses using the port defined in alluxio.job.master.rpc.port | |
alluxio.job.master.rpc.port | 20001 | The port for Alluxio job master’s RPC service. |
alluxio.job.master.web.bind.host | 0.0.0.0 | The host that the job master web server binds to. |
alluxio.job.master.web.hostname | ${alluxio.job.master.hostname} | The hostname of the job master web server. |
alluxio.job.master.web.port | 20002 | The port the job master web server uses. |
alluxio.job.master.worker.heartbeat.interval | 1sec | The amount of time that the Alluxio job worker should wait in between heartbeats to the Job Master. |
alluxio.job.master.worker.timeout | 60sec | The time period after which the job master will mark a worker as lost without a subsequent heartbeat. |
alluxio.job.worker.bind.host | 0.0.0.0 | The host that the Alluxio job worker will bind to. |
alluxio.job.worker.data.port | 30002 | The port the Alluxio Job worker uses to send data. |
alluxio.job.worker.hostname | ${alluxio.worker.hostname} | The hostname of the Alluxio job worker. |
alluxio.job.worker.rpc.port | 30001 | The port for Alluxio job worker’s RPC service. |
alluxio.job.worker.threadpool.size | 10 | Number of threads in the thread pool for job worker. This may be adjusted to a lower value to alleviate resource saturation on the job worker nodes (CPU + IO). |
alluxio.job.worker.throttling | false | Whether the job worker should throttle itself based on whether theresources are saturated. |
alluxio.job.worker.web.bind.host | 0.0.0.0 | The host the job worker web server binds to. |
alluxio.job.worker.web.port | 30003 | The port the Alluxio job worker web server uses. |
alluxio.jvm.monitor.info.threshold | 1sec | When the JVM pauses for anything longer than this, log an INFO message. |
alluxio.jvm.monitor.sleep.interval | 1sec | The time for the JVM monitor thread to sleep. |
alluxio.jvm.monitor.warn.threshold | 10sec | When the JVM pauses for anything longer than this, log a WARN message. |
alluxio.locality.compare.node.ip | false | Whether try to resolve the node IP address for locality checking |
alluxio.locality.node | Value to use for determining node locality | |
alluxio.locality.order | node,rack | Ordering of locality tiers |
alluxio.locality.rack | Value to use for determining rack locality | |
alluxio.locality.script | alluxio-locality.sh | A script to determine tiered identity for locality checking |
alluxio.logger.type | Console | The type of logger. |
alluxio.logs.dir | ${alluxio.work.dir}/logs | The path under Alluxio home directory to store log files. It has a corresponding environment variable $ALLUXIO_LOGS_DIR. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.logs.dir”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.logserver.hostname | The hostname of Alluxio logserver. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.logserver.hostname”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. | |
alluxio.logserver.logs.dir | ${alluxio.work.dir}/logs | Default location for remote log files. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.logserver.logs.dir”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.logserver.port | 45600 | Default port of logserver to receive logs from alluxio servers. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.logserver.port”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.logserver.threads.max | 2048 | The maximum number of threads used by logserver to service logging requests. |
alluxio.logserver.threads.min | 512 | The minimum number of threads used by logserver to service logging requests. |
alluxio.metrics.conf.file | ${alluxio.conf.dir}/metrics.properties | The file path of the metrics system configuration file. By default it is metrics.properties in the conf directory. |
alluxio.network.connection.auth.timeout | 30sec | Maximum time to wait for a connection (gRPC channel) to attempt to receive an authentication response. |
alluxio.network.connection.health.check.timeout | 5sec | Allowed duration for checking health of client connections (gRPC channels) before being assigned to a client. If a connection does not become active within configured time, it will be shut down and a new connection will be created for the client |
alluxio.network.connection.server.shutdown.timeout | 60sec | Maximum time to wait for gRPC server to stop on shutdown |
alluxio.network.connection.shutdown.graceful.timeout | 45sec | Maximum time to wait for connections (gRPC channels) to stop on shutdown |
alluxio.network.connection.shutdown.timeout | 15sec | Maximum time to wait for connections (gRPC channels) to stop after graceful shutdown attempt. |
alluxio.network.host.resolution.timeout | 5sec | During startup of the Master and Worker processes Alluxio needs to ensure that they are listening on externally resolvable and reachable host names. To do this, Alluxio will automatically attempt to select an appropriate host name if one was not explicitly specified. This represents the maximum amount of time spent waiting to determine if a candidate host name is resolvable over the network. |
alluxio.network.ip.address.used | false | If true, when alluxio.<service_name>.hostname and alluxio.<service_name>.bind.host of a service not specified, use IP as the connect host of the service. |
alluxio.proxy.s3.deletetype | ALLUXIO_AND_UFS | Delete type when deleting buckets and objects through S3 API. Valid options are ALLUXIO_AND_UFS (delete both in Alluxio and UFS), ALLUXIO_ONLY (delete only the buckets or objects in Alluxio namespace). |
alluxio.proxy.s3.multipart.temporary.dir.suffix | _s3_multipart_tmp | Suffix for the directory which holds parts during a multipart upload. |
alluxio.proxy.s3.writetype | CACHE_THROUGH | Write type when creating buckets and objects through S3 API. Valid options are MUST_CACHE (write will only go to Alluxio and must be stored in Alluxio), CACHE_THROUGH (try to cache, write to UnderFS synchronously), ASYNC_THROUGH (try to cache, write to UnderFS asynchronously), THROUGH (no cache, write to UnderFS synchronously). |
alluxio.proxy.stream.cache.timeout | 1hour | The timeout for the input and output streams cache eviction in the proxy. |
alluxio.proxy.web.bind.host | 0.0.0.0 | The hostname that the Alluxio proxy’s web server runs on. |
alluxio.proxy.web.hostname | The hostname Alluxio proxy’s web UI binds to. | |
alluxio.proxy.web.port | 39999 | The port Alluxio proxy’s web UI runs on. |
alluxio.secondary.master.metastore.dir | ${alluxio.work.dir}/secondary-metastore | The secondary master metastore work directory. Only some metastores need disk. |
alluxio.site.conf.dir | ${alluxio.conf.dir}/,${user.home}/.alluxio/,/etc/alluxio/ | Comma-separated search path for alluxio-site.properties. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.site.conf.dir”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.table.catalog.path | /catalog | The Alluxio file path for the table catalog metadata. |
alluxio.table.catalog.udb.sync.timeout | 1h | The timeout period for a db sync to finish in the catalog. If a synctakes longer than this timeout, the sync will be terminated. |
alluxio.table.enabled | true | (Experimental) Enables the table service. |
alluxio.table.journal.partitions.chunk.size | 500 | The maximum table partitions number in a single journal entry. |
alluxio.table.transform.manager.job.history.retention.time | 300sec | The length of time the Alluxio Table Master should keep information about finished transformation jobs before they are discarded. |
alluxio.table.transform.manager.job.monitor.interval | 10000 | Job monitor is a heartbeat thread in the transform manager, this is the time interval in milliseconds the job monitor heartbeat is run to check the status of the transformation jobs and update table and partition locations after transformation. |
alluxio.test.deprecated.key | N/A | |
alluxio.tmp.dirs | /tmp | The path(s) to store Alluxio temporary files, use commas as delimiters. If multiple paths are specified, one will be selected at random per temporary file. Currently, only files to be uploaded to object stores are stored in these paths. |
alluxio.underfs.allow.set.owner.failure | false | Whether to allow setting owner in UFS to fail. When set to true, it is possible file or directory owners diverge between Alluxio and UFS. |
alluxio.underfs.cleanup.enabled | false | Whether or not to clean up under file storage periodically.Some ufs operations may not be completed and cleaned up successfully in normal ways and leave some intermediate data that needs periodical cleanup.If enabled, all the mount points will be cleaned up when a leader master starts or cleanup interval is reached. This should be used sparingly. |
alluxio.underfs.cleanup.interval | 1day | The interval for periodically cleaning all the mounted under file storages. |
alluxio.underfs.eventual.consistency.retry.base.sleep | 50ms | To handle eventually consistent storage semantics for certain under storages, Alluxio will perform retries when under storage metadata doesn’t match Alluxio’s expectations. These retries use exponential backoff. This property determines the base time for the exponential backoff. |
alluxio.underfs.eventual.consistency.retry.max.num | 20 | To handle eventually consistent storage semantics for certain under storages, Alluxio will perform retries when under storage metadata doesn’t match Alluxio’s expectations. These retries use exponential backoff. This property determines the maximum number of retries. |
alluxio.underfs.eventual.consistency.retry.max.sleep | 30sec | To handle eventually consistent storage semantics for certain under storages, Alluxio will perform retries when under storage metadata doesn’t match Alluxio’s expectations. These retries use exponential backoff. This property determines the maximum wait time in the backoff. |
alluxio.underfs.gcs.default.mode | 0700 | Mode (in octal notation) for GCS objects if mode cannot be discovered. |
alluxio.underfs.gcs.directory.suffix | / | Directories are represented in GCS as zero-byte objects named with the specified suffix. |
alluxio.underfs.gcs.owner.id.to.username.mapping | Optionally, specify a preset gcs owner id to Alluxio username static mapping in the format “id1=user1;id2=user2”. The Google Cloud Storage IDs can be found at the console address https://console.cloud.google.com/storage/settings . Please use the “Owners” one. This property key is only valid when alluxio.underfs.gcs.version=1 | |
alluxio.underfs.gcs.version | 1 | Specify the version of GCS module to use. GCS version “1” builds on top of jets3t package which requires fs.gcs.accessKeyId and fs.gcs.secretAccessKey. GCS version “2” build on top of Google cloud API which requires fs.gcs.credential.path |
alluxio.underfs.hdfs.configuration | ${alluxio.conf.dir}/core-site.xml:${alluxio.conf.dir}/hdfs-site.xml | Location of the HDFS configuration file to overwrite the default HDFS client configuration. Note that, these files must be availableon every node. |
alluxio.underfs.hdfs.impl | org.apache.hadoop.hdfs.DistributedFileSystem | The implementation class of the HDFS as the under storage system. |
alluxio.underfs.hdfs.prefixes | hdfs://,glusterfs:/// | Optionally, specify which prefixes should run through the HDFS implementation of UnderFileSystem. The delimiter is any whitespace and/or ‘,’. |
alluxio.underfs.hdfs.remote | true | Boolean indicating whether or not the under storage worker nodes are remote with respect to Alluxio worker nodes. If set to true, Alluxio will not attempt to discover locality information from the under storage because locality is impossible. This will improve performance. The default value is true. |
alluxio.underfs.kodo.connect.timeout | 50sec | The connect timeout of kodo. |
alluxio.underfs.kodo.downloadhost | The download domain of Kodo bucket. | |
alluxio.underfs.kodo.endpoint | The endpoint of Kodo bucket. | |
alluxio.underfs.kodo.requests.max | 64 | The maximum number of kodo connections. |
alluxio.underfs.listing.length | 1000 | The maximum number of directory entries to list in a single query to under file system. If the total number of entries is greater than the specified length, multiple queries will be issued. |
alluxio.underfs.object.store.breadcrumbs.enabled | true | Set this to false to prevent Alluxio from creating zero byte objects during read or list operations on object store UFS. Leaving this on enables more efficient listing of prefixes. |
alluxio.underfs.object.store.mount.shared.publicly | false | Whether or not to share object storage under storage system mounted point with all Alluxio users. Note that this configuration has no effect on HDFS nor local UFS. |
alluxio.underfs.object.store.multi.range.chunk.size | ${alluxio.user.block.size.bytes.default} | Default chunk size for ranged reads from multi-range object input streams. |
alluxio.underfs.object.store.service.threads | 20 | The number of threads in executor pool for parallel object store UFS operations, such as directory renames and deletes. |
alluxio.underfs.oss.connection.max | 1024 | The maximum number of OSS connections. |
alluxio.underfs.oss.connection.timeout | 50sec | The timeout when connecting to OSS. |
alluxio.underfs.oss.connection.ttl | -1 | The TTL of OSS connections in ms. |
alluxio.underfs.oss.socket.timeout | 50sec | The timeout of OSS socket. |
alluxio.underfs.s3.admin.threads.max | 20 | The maximum number of threads to use for metadata operations when communicating with S3. These operations may be fairly concurrent and frequent but should not take much time to process. |
alluxio.underfs.s3.connection.ttl | -1 | The expiration time of S3 connections in ms. -1 means the connection will never expire. |
alluxio.underfs.s3.default.mode | 0700 | Mode (in octal notation) for S3 objects if mode cannot be discovered. |
alluxio.underfs.s3.directory.suffix | / | Directories are represented in S3 as zero-byte objects named with the specified suffix. |
alluxio.underfs.s3.disable.dns.buckets | false | Optionally, specify to make all S3 requests path style. |
alluxio.underfs.s3.endpoint | Optionally, to reduce data latency or visit resources which are separated in different AWS regions, specify a regional endpoint to make aws requests. An endpoint is a URL that is the entry point for a web service. For example, s3.cn-north-1.amazonaws.com.cn is an entry point for the Amazon S3 service in beijing region. | |
alluxio.underfs.s3.inherit.acl | true | Set this property to false to disable inheriting bucket ACLs on objects. Note that the translation from bucket ACLs to Alluxio user permissions is best effort as some S3-like storage services doe not implement ACLs fully compatible with S3. |
alluxio.underfs.s3.intermediate.upload.clean.age | 3day | Streaming uploads may not have been completed/aborted correctly and need periodical ufs cleanup. If ufs cleanup is enabled, intermediate multipart uploads in all non-readonly S3 mount points older than this age will be cleaned. This may impact other ongoing upload operations, so a large clean age is encouraged. |
alluxio.underfs.s3.list.objects.v1 | false | Whether to use version 1 of GET Bucket (List Objects) API. |
alluxio.underfs.s3.max.error.retry | The maximum number of retry attempts for failed retryable requests.Setting this property will override the AWS SDK default. | |
alluxio.underfs.s3.owner.id.to.username.mapping | Optionally, specify a preset s3 canonical id to Alluxio username static mapping, in the format “id1=user1;id2=user2”. The AWS S3 canonical ID can be found at the console address https://console.aws.amazon.com/iam/home?#security_credential . Please expand the “Account Identifiers” tab and refer to “Canonical User ID”. Unspecified owner id will map to a default empty username | |
alluxio.underfs.s3.proxy.host | Optionally, specify a proxy host for communicating with S3. | |
alluxio.underfs.s3.proxy.port | Optionally, specify a proxy port for communicating with S3. | |
alluxio.underfs.s3.request.timeout | 1min | The timeout for a single request to S3. Infinity if set to 0. Setting this property to a non-zero value can improve performance by avoiding the long tail of requests to S3. For very slow connections to S3, consider increasing this value or setting it to 0. |
alluxio.underfs.s3.secure.http.enabled | false | Whether or not to use HTTPS protocol when communicating with S3. |
alluxio.underfs.s3.server.side.encryption.enabled | false | Whether or not to encrypt data stored in S3. |
alluxio.underfs.s3.signer.algorithm | The signature algorithm which should be used to sign requests to the s3 service. This is optional, and if not set, the client will automatically determine it. For interacting with an S3 endpoint which only supports v2 signatures, set this to “S3SignerType”. | |
alluxio.underfs.s3.socket.timeout | 50sec | Length of the socket timeout when communicating with S3. |
alluxio.underfs.s3.streaming.upload.enabled | false | (Experimental) If true, using streaming upload to write to S3. |
alluxio.underfs.s3.streaming.upload.partition.size | 64MB | Maximum allowable size of a single buffer file when using S3A streaming upload. When the buffer file reaches the partition size, it will be uploaded and the upcoming data will write to other buffer files.If the partition size is too small, S3A upload speed might be affected. |
alluxio.underfs.s3.threads.max | 40 | The maximum number of threads to use for communicating with S3 and the maximum number of concurrent connections to S3. Includes both threads for data upload and metadata operations. This number should be at least as large as the max admin threads plus max upload threads. |
alluxio.underfs.s3.upload.threads.max | 20 | For an Alluxio worker, this is the maximum number of threads to use for uploading data to S3 for multipart uploads. These operations can be fairly expensive, so multiple threads are encouraged. However, this also splits the bandwidth between threads, meaning the overall latency for completing an upload will be higher for more threads. For the Alluxio master, this is the maximum number of threads used for the rename (copy) operation. It is recommended that value should be greater than or equal to alluxio.underfs.object.store.service.threads |
alluxio.underfs.web.connnection.timeout | 60s | Default timeout for a http connection. |
alluxio.underfs.web.header.last.modified | EEE, dd MMM yyyy HH:mm:ss zzz | Date format of last modified for a http response header. |
alluxio.underfs.web.parent.names | Parent Directory,..,../ | The text of the http link for the parent directory. |
alluxio.underfs.web.titles | Index of,Directory listing for | The title of the content for a http url. |
alluxio.web.cors.enabled | false | Set to true to enable Cross-Origin Resource Sharing for RESTful APIendpoints. |
alluxio.web.file.info.enabled | true | Whether detailed file information are enabled for the web UI. |
alluxio.web.refresh.interval | 15s | The amount of time to await before refreshing the Web UI if it is set to auto refresh. |
alluxio.web.threads | 1 | How many threads to serve Alluxio web UI. |
alluxio.web.ui.enabled | true | Whether the master/worker will have Web UI enabled. If set to false, the master/worker will not have Web UI page, but the RESTful endpoints and metrics will still be available. |
alluxio.work.dir | ${alluxio.home} | The directory to use for Alluxio’s working directory. By default, the journal, logs, and under file storage data (if using local filesystem) are written here. |
alluxio.zookeeper.address | Address of ZooKeeper. | |
alluxio.zookeeper.auth.enabled | true | If true, enable client-side Zookeeper authentication. |
alluxio.zookeeper.connection.timeout | 15s | Connection timeout for Alluxio (job) masters to select the leading (job) master when connecting to Zookeeper |
alluxio.zookeeper.election.path | /alluxio/election | Election directory in ZooKeeper. |
alluxio.zookeeper.enabled | false | If true, setup master fault tolerant mode using ZooKeeper. |
alluxio.zookeeper.job.election.path | /job_election | N/A |
alluxio.zookeeper.job.leader.path | /job_leader | N/A |
alluxio.zookeeper.leader.connection.error.policy | SESSION | Connection error policy defines how errors on zookeeper connections to be treated in leader election. STANDARD policy treats every connection event as failure.SESSION policy relies on zookeeper sessions for judging failures, helping leader to retain its status, as long as its session is protected. |
alluxio.zookeeper.leader.inquiry.retry | 10 | The number of retries to inquire leader from ZooKeeper. |
alluxio.zookeeper.leader.path | /alluxio/leader | Leader directory in ZooKeeper. |
alluxio.zookeeper.session.timeout | 60s | Session timeout to use when connecting to Zookeeper |
aws.accessKeyId | The access key of S3 bucket. | |
aws.secretKey | The secret key of S3 bucket. | |
fs.azure.account.oauth2.client.endpoint | The oauth endpoint for ABFS. | |
fs.azure.account.oauth2.client.id | The client id for ABFS. | |
fs.azure.account.oauth2.client.secret | The client secret for ABFS. | |
fs.cos.access.key | The access key of COS bucket. | |
fs.cos.app.id | The app id of COS bucket. | |
fs.cos.connection.max | 1024 | The maximum number of COS connections. |
fs.cos.connection.timeout | 50sec | The timeout of connecting to COS. |
fs.cos.region | The region name of COS bucket. | |
fs.cos.secret.key | The secret key of COS bucket. | |
fs.cos.socket.timeout | 50sec | The timeout of COS socket. |
fs.gcs.accessKeyId | The access key of GCS bucket. This property key is only valid when alluxio.underfs.gcs.version=1 | |
fs.gcs.credential.path | The json file path of Google application credentials. This property key is only valid when alluxio.underfs.gcs.version=2 | |
fs.gcs.secretAccessKey | The secret key of GCS bucket. This property key is only valid when alluxio.underfs.gcs.version=1 | |
fs.kodo.accesskey | The access key of Kodo bucket. | |
fs.kodo.secretkey | The secret key of Kodo bucket. | |
fs.oss.accessKeyId | The access key of OSS bucket. | |
fs.oss.accessKeySecret | The secret key of OSS bucket. | |
fs.oss.endpoint | The endpoint key of OSS bucket. | |
fs.swift.auth.method | Choice of authentication method: [tempauth (default), swiftauth, keystone, keystonev3]. | |
fs.swift.auth.url | Authentication URL for REST server, e.g., http://server:8090/auth/v1.0. | |
fs.swift.password | The password used for user:tenant authentication. | |
fs.swift.region | Service region when using Keystone authentication. | |
fs.swift.simulation | Whether to simulate a single node Swift backend for testing purposes: true or false (default). | |
fs.swift.tenant | Swift user for authentication. | |
fs.swift.user | Swift tenant for authentication. |
Master Configuration
The master configuration specifies information regarding the master node, such as the address and the port number.
Worker Configuration
The worker configuration specifies information regarding the worker nodes, such as the address and the port number.
User Configuration
The user configuration specifies values regarding file system access.
Property Name | Default | Description |
---|---|---|
alluxio.user.app.id | The custom id to use for labeling this client’s info, such as metrics. If unset, a random long will be used. This value is displayed in the client logs on initialization. Note that using the same app id will cause client info to be aggregated, so different applications must set their own ids or leave this value unset to use a randomly generated id. | |
alluxio.user.block.avoid.eviction.policy.reserved.size.bytes | 0MB | The portion of space reserved in a worker when using the LocalFirstAvoidEvictionPolicy class as block location policy. |
alluxio.user.block.master.client.pool.gc.interval | 120sec | The interval at which block master client GC checks occur. |
alluxio.user.block.master.client.pool.gc.threshold | 120sec | A block master client is closed if it has been idle for more than this threshold. |
alluxio.user.block.master.client.pool.size.max | 10 | The maximum number of block master clients cached in the block master client pool. |
alluxio.user.block.master.client.pool.size.min | 0 | The minimum number of block master clients cached in the block master client pool. For long running processes, this should be set to zero. |
alluxio.user.block.read.retry.max.duration | 2min | N/A |
alluxio.user.block.read.retry.sleep.base | 250ms | N/A |
alluxio.user.block.read.retry.sleep.max | 2sec | N/A |
alluxio.user.block.remote.read.buffer.size.bytes | 8MB | The size of the file buffer to read data from remote Alluxio worker. |
alluxio.user.block.size.bytes.default | 64MB | Default block size for Alluxio files. |
alluxio.user.block.worker.client.pool.gc.threshold | 300sec | A block worker client is closed if it has been idle for more than this threshold. |
alluxio.user.block.worker.client.pool.max | 1024 | The maximum number of block worker clients cached in the block worker client pool. |
alluxio.user.block.write.location.policy.class | alluxio.client.block.policy.LocalFirstPolicy | The default location policy for choosing workers for writing a file’s blocks. |
alluxio.user.client.cache.async.restore.enabled | true | If this is enabled, cache restore state asynchronously. |
alluxio.user.client.cache.async.write.enabled | true | If this is enabled, cache data asynchronously. |
alluxio.user.client.cache.async.write.threads | 16 | Number of threads to asynchronously cache data. |
alluxio.user.client.cache.dir | /tmp/alluxio_cache | The directory where client-side cache is stored. |
alluxio.user.client.cache.enabled | false | If this is enabled, data will be cached on Alluxio client. |
alluxio.user.client.cache.eviction.retries | 10 | Max number of eviction retries. |
alluxio.user.client.cache.evictor.class | alluxio.client.file.cache.evictor.LRUCacheEvictor | The strategy that client uses to evict local cached pages when running out of space. Currently valid options include alluxio.client.file.cache.evictor.LRUCacheEvictor ,alluxio.client.file.cache.evictor.LFUCacheEvictor . |
alluxio.user.client.cache.evictor.lfu.logbase | 2.0 | The log base for client cache LFU evictor bucket index. |
alluxio.user.client.cache.local.store.file.buckets | 1000 | The number of file buckets for the local page store of the client-side cache. It is recommended to set this to a high value if the number of unique files is expected to be high (# files / file buckets <= 100,000). |
alluxio.user.client.cache.page.size | 1MB | Size of each page in client-side cache. |
alluxio.user.client.cache.quota.enabled | false | Whether to support cache quota. |
alluxio.user.client.cache.size | 512MB | The maximum size of the client-side cache. |
alluxio.user.client.cache.store.type | LOCAL | The type of page store to use for client-side cache. Can be either LOCAL or ROCKS . The LOCAL page store stores all pages in a directory, the ROCKS page store utilizes rocksDB to persist the data. |
alluxio.user.client.cache.timeout.duration | -1 | The timeout duration for local cache I/O operations (reading/writing/deleting). When this property is a positive value,local cache operations after timing out will fail and fallback to external file system but transparent to applications; when this property is a negative value, this feature is disabled. |
alluxio.user.client.cache.timeout.threads | 32 | The number of threads to handle cache I/O operation timeout, when alluxio.user.client.cache.timeout.duration is positive. |
alluxio.user.conf.cluster.default.enabled | true | When this property is true, an Alluxio client will load the default values of configuration properties set by Alluxio master. |
alluxio.user.conf.sync.interval | 1min | The time period of client master heartbeat to update the configuration if necessary from meta master. |
alluxio.user.date.format.pattern | MM-dd-yyyy HH:mm:ss:SSS | Display formatted date in cli command and web UI by given date format pattern. |
alluxio.user.file.buffer.bytes | 8MB | The size of the file buffer to use for file system reads/writes. |
alluxio.user.file.copyfromlocal.block.location.policy.class | alluxio.client.block.policy.RoundRobinPolicy | The default location policy for choosing workers for writing a file’s blocks using copyFromLocal command. |
alluxio.user.file.create.ttl | -1 | Time to live for files created by a user, no ttl by default. |
alluxio.user.file.create.ttl.action | DELETE | When file’s ttl is expired, the action performs on it. Options: DELETE (default) or FREE |
alluxio.user.file.delete.unchecked | false | Whether to check if the UFS contents are in sync with Alluxio before attempting to delete persisted directories recursively. |
alluxio.user.file.master.client.pool.gc.interval | 120sec | The interval at which file system master client GC checks occur. |
alluxio.user.file.master.client.pool.gc.threshold | 120sec | A fs master client is closed if it has been idle for more than this threshold. |
alluxio.user.file.master.client.pool.size.max | 10 | The maximum number of fs master clients cached in the fs master client pool. |
alluxio.user.file.master.client.pool.size.min | 0 | The minimum number of fs master clients cached in the fs master client pool. For long running processes, this should be set to zero. |
alluxio.user.file.metadata.load.type | ONCE | The behavior of loading metadata from UFS. When information about a path is requested and the path does not exist in Alluxio, metadata can be loaded from the UFS. Valid options are ALWAYS , NEVER , and ONCE . ALWAYS will always access UFS to see if the path exists in the UFS. NEVER will never consult the UFS. ONCE will access the UFS the “first” time (according to a cache), but not after that. This parameter is ignored if a metadata sync is performed, via the parameter “alluxio.user.file.metadata.sync.interval” |
alluxio.user.file.metadata.sync.interval | -1 | The interval for syncing UFS metadata before invoking an operation on a path. -1 means no sync will occur. 0 means Alluxio will always sync the metadata of the path before an operation. If you specify a time interval, Alluxio will (best effort) not re-sync a path within that time interval. Syncing the metadata for a path must interact with the UFS, so it is an expensive operation. If a sync is performed for an operation, the configuration of “alluxio.user.file.metadata.load.type” will be ignored. |
alluxio.user.file.passive.cache.enabled | true | Whether to cache files to local Alluxio workers when the files are read from remote workers (not UFS). |
alluxio.user.file.persist.on.rename | false | Whether or not to asynchronously persist any files which have been renamed. This is helpful when working with compute frameworks which use rename to commit results. |
alluxio.user.file.persistence.initial.wait.time | 0 | Time to wait before starting the persistence job. When the value is set to -1, the file will be persisted by rename operation or persist CLI but will not be automatically persisted in other cases. This is to avoid the heavy object copy in rename operation when alluxio.user.file.writetype.default is set to ASYNC_THROUGH. This value should be smaller than the value of alluxio.master.persistence.max.total.wait.time |
alluxio.user.file.readtype.default | CACHE | Default read type when creating Alluxio files. Valid options are CACHE_PROMOTE (move data to highest tier if already in Alluxio storage, write data into highest tier of local Alluxio if data needs to be read from under storage), CACHE (write data into highest tier of local Alluxio if data needs to be read from under storage), NO_CACHE (no data interaction with Alluxio, if the read is from Alluxio data migration or eviction will not occur). |
alluxio.user.file.replication.durable | 1 | The target replication level of a file created by ASYNC_THROUGH writesbefore this file is persisted. |
alluxio.user.file.replication.max | -1 | The target max replication level of a file in Alluxio space. Setting this property to a negative value means no upper limit. |
alluxio.user.file.replication.min | 0 | The target min replication level of a file in Alluxio space. |
alluxio.user.file.reserved.bytes | ${alluxio.user.block.size.bytes.default} | The size to reserve on workers for file system writes.Using smaller value will improve concurrency for writes smaller than block size. |
alluxio.user.file.sequential.pread.threshold | 2MB | An upper bound on the client buffer size for positioned read to hint at the sequential nature of reads. For reads with a buffer size greater than this threshold, the read op is treated to be sequential and the worker may handle the read differently. For instance, cold reads from the HDFS ufs may use a different HDFS client API. |
alluxio.user.file.target.media | Preferred media type while storing file’s blocks. | |
alluxio.user.file.ufs.tier.enabled | false | When workers run out of available memory, whether the client can skip writing data to Alluxio but fallback to write to UFS without stopping the application. This property only works when the write type is ASYNC_THROUGH. |
alluxio.user.file.waitcompleted.poll | 1sec | The time interval to poll a file for its completion status when using waitCompleted. |
alluxio.user.file.write.tier.default | 0 | The default tier for choosing a where to write a block. Valid option is any integer. Non-negative values identify tiers starting from top going down (0 identifies the first tier, 1 identifies the second tier, and so on). If the provided value is greater than the number of tiers, it identifies the last tier. Negative values identify tiers starting from the bottom going up (-1 identifies the last tier, -2 identifies the second to last tier, and so on). If the absolute value of the provided value is greater than the number of tiers, it identifies the first tier. |
alluxio.user.file.writetype.default | ASYNC_THROUGH | Default write type when creating Alluxio files. Valid options are MUST_CACHE (write will only go to Alluxio and must be stored in Alluxio), CACHE_THROUGH (try to cache, write to UnderFS synchronously), THROUGH (no cache, write to UnderFS synchronously), ASYNC_THROUGH (write to cache, write to UnderFS asynchronously, replicated alluxio.user.file.replication.durable times in Alluxio before data is persisted. |
alluxio.user.hostname | The hostname to use for an Alluxio client. | |
alluxio.user.local.reader.chunk.size.bytes | 8MB | When a client reads from a local worker, the maximum data chunk size. |
alluxio.user.local.writer.chunk.size.bytes | 64KB | When a client writes to a local worker, the maximum data chunk size. |
alluxio.user.logging.threshold | 10s | Logging a client RPC when it takes more time than the threshold. |
alluxio.user.logs.dir | ${alluxio.logs.dir}/user | The path to store logs of Alluxio shell. To change its value, one can set environment variable $ALLUXIO_USER_LOGS_DIR. Note: overwriting this property will only work when it is passed as a JVM system property (e.g., appending “-Dalluxio.user.logs.dir”=<NEW_VALUE>” to $ALLUXIO_JAVA_OPTS). Setting it in alluxio-site.properties will not work. |
alluxio.user.master.polling.timeout | 30sec | The maximum time for a rpc client to wait for master to respond. |
alluxio.user.metadata.cache.enabled | false | If this is enabled, metadata of paths will be cached. The cached metadata will be evicted when it expires after alluxio.user.metadata.cache.expiration.time or the cache size is over the limit of alluxio.user.metadata.cache.max.size. |
alluxio.user.metadata.cache.expiration.time | 10min | Metadata will expire and be evicted after being cached for this time period. Only valid if the filesystem is alluxio.client.file.MetadataCachingBaseFileSystem. |
alluxio.user.metadata.cache.max.size | 100000 | Maximum number of paths with cached metadata. Only valid if the filesystem is alluxio.client.file.MetadataCachingBaseFileSystem. |
alluxio.user.metrics.collection.enabled | false | Enable collecting the client-side metrics and heartbeat them to master |
alluxio.user.metrics.heartbeat.interval | 10sec | The time period of client master heartbeat to send the client-side metrics. |
alluxio.user.network.data.timeout | The maximum time for an Alluxio client to wait for a data response (e.g. block reads and block writes) from Alluxio worker. | |
alluxio.user.network.flowcontrol.window | The HTTP2 flow control window used by user gRPC connections. Larger value will allow more data to be buffered but will use more memory. | |
alluxio.user.network.keepalive.time | The amount of time for a gRPC client (for block reads and block writes) to wait for a response before pinging the server to see if it is still alive. | |
alluxio.user.network.keepalive.timeout | The maximum time for a gRPC client (for block reads and block writes) to wait for a keepalive response before closing the connection. | |
alluxio.user.network.max.inbound.message.size | The max inbound message size used by user gRPC connections. | |
alluxio.user.network.netty.channel | Type of netty channels. If EPOLL is not available, this will automatically fall back to NIO. | |
alluxio.user.network.netty.worker.threads | How many threads to use for remote block worker client to read from remote block workers. | |
alluxio.user.network.reader.buffer.size.messages | When a client reads from a remote worker, the maximum number of messages to buffer by the client. A message can be either a command response, a data chunk, or a gRPC stream event such as complete or error. | |
alluxio.user.network.reader.chunk.size.bytes | When a client reads from a remote worker, the maximum chunk size. | |
alluxio.user.network.rpc.flowcontrol.window | 2MB | The HTTP2 flow control window used by user rpc connections. Larger value will allow more data to be buffered but will use more memory. |
alluxio.user.network.rpc.keepalive.time | 9223372036854775807 | The amount of time for a rpc client to wait for a response before pinging the server to see if it is still alive. |
alluxio.user.network.rpc.keepalive.timeout | 30sec | The maximum time for a rpc client to wait for a keepalive response before closing the connection. |
alluxio.user.network.rpc.max.connections | 1 | The maximum number of physical connections to be used per target host. |
alluxio.user.network.rpc.max.inbound.message.size | 100MB | The max inbound message size used by user rpc connections. |
alluxio.user.network.rpc.netty.channel | EPOLL | Type of netty channels used by rpc connections. If EPOLL is not available, this will automatically fall back to NIO. |
alluxio.user.network.rpc.netty.worker.threads | 0 | How many threads to use for rpc client to read from remote workers. |
alluxio.user.network.streaming.flowcontrol.window | 2MB | The HTTP2 flow control window used by user streaming connections. Larger value will allow more data to be buffered but will use more memory. |
alluxio.user.network.streaming.keepalive.time | 9223372036854775807 | The amount of time for a streaming client to wait for a response before pinging the server to see if it is still alive. |
alluxio.user.network.streaming.keepalive.timeout | 30sec | The maximum time for a streaming client to wait for a keepalive response before closing the connection. |
alluxio.user.network.streaming.max.connections | 64 | The maximum number of physical connections to be used per target host. |
alluxio.user.network.streaming.max.inbound.message.size | 100MB | The max inbound message size used by user streaming connections. |
alluxio.user.network.streaming.netty.channel | EPOLL | Type of netty channels used by streaming connections. If EPOLL is not available, this will automatically fall back to NIO. |
alluxio.user.network.streaming.netty.worker.threads | 0 | How many threads to use for streaming client to read from remote workers. |
alluxio.user.network.writer.buffer.size.messages | When a client writes to a remote worker, the maximum number of messages to buffer by the client. A message can be either a command response, a data chunk, or a gRPC stream event such as complete or error. | |
alluxio.user.network.writer.chunk.size.bytes | When a client writes to a remote worker, the maximum chunk size. | |
alluxio.user.network.writer.close.timeout | The timeout to close a writer client. | |
alluxio.user.network.writer.flush.timeout | The timeout to wait for flush to finish in a data writer. | |
alluxio.user.network.zerocopy.enabled | Whether zero copy is enabled on client when processing data streams. | |
alluxio.user.rpc.retry.base.sleep | 50ms | Alluxio client RPCs automatically retry for transient errors with an exponential backoff. This property determines the base time in the exponential backoff. |
alluxio.user.rpc.retry.max.duration | 2min | Alluxio client RPCs automatically retry for transient errors with an exponential backoff. This property determines the maximum duration to retry for before giving up. Note that, this value is set to 5s for fs and fsadmin CLIs. |
alluxio.user.rpc.retry.max.sleep | 3sec | Alluxio client RPCs automatically retry for transient errors with an exponential backoff. This property determines the maximum wait time in the backoff. |
alluxio.user.short.circuit.enabled | true | The short circuit read/write which allows the clients to read/write data without going through Alluxio workers if the data is local is enabled if set to true. |
alluxio.user.short.circuit.preferred | false | When short circuit and domain socket both enabled, prefer to use short circuit. |
alluxio.user.streaming.data.timeout | 1h | The maximum time for an Alluxio client to wait for a data response (e.g. block reads and block writes) from Alluxio worker. Keep in mind that some streaming operations may take an unexpectedly long time, such as UFS io. In order to handle occasional slow operations, it is recommended for this parameter to be set to a large value, to avoid spurious timeouts. |
alluxio.user.streaming.reader.buffer.size.messages | 16 | When a client reads from a remote worker, the maximum number of messages to buffer by the client. A message can be either a command response, a data chunk, or a gRPC stream event such as complete or error. |
alluxio.user.streaming.reader.chunk.size.bytes | 1MB | When a client reads from a remote worker, the maximum chunk size. |
alluxio.user.streaming.reader.close.timeout | 5s | The timeout to close a grpc streaming reader client. If too long, it may add delays to closing clients. If too short, the client will complete the close() before the server confirms the close() |
alluxio.user.streaming.writer.buffer.size.messages | 16 | When a client writes to a remote worker, the maximum number of messages to buffer by the client. A message can be either a command response, a data chunk, or a gRPC stream event such as complete or error. |
alluxio.user.streaming.writer.chunk.size.bytes | 1MB | When a client writes to a remote worker, the maximum chunk size. |
alluxio.user.streaming.writer.close.timeout | 30min | The timeout to close a writer client. |
alluxio.user.streaming.writer.flush.timeout | 30min | The timeout to wait for flush to finish in a data writer. |
alluxio.user.streaming.zerocopy.enabled | true | Whether zero copy is enabled on client when processing data streams. |
alluxio.user.ufs.block.location.all.fallback.enabled | true | Whether to return all workers as block location if ufs block locations are not co-located with any Alluxio workers or is empty. |
alluxio.user.ufs.block.read.concurrency.max | 2147483647 | The maximum concurrent readers for one UFS block on one Block Worker. |
alluxio.user.ufs.block.read.location.policy | alluxio.client.block.policy.LocalFirstPolicy | When an Alluxio client reads a file from the UFS, it delegates the read to an Alluxio worker. The client uses this policy to choose which worker to read through. Built-in choices: [<a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/DeterministicHashPolicy.html">alluxio.client.block.policy.DeterministicHashPolicy</a>, <a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/LocalFirstAvoidEvictionPolicy.html">alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy</a>, <a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/LocalFirstPolicy.html">alluxio.client.block.policy.LocalFirstPolicy</a>, <a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/MostAvailableFirstPolicy.html">alluxio.client.block.policy.MostAvailableFirstPolicy</a>, <a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/RoundRobinPolicy.html">alluxio.client.block.policy.RoundRobinPolicy</a>, <a href=”https://docs.alluxio.io/os/javadoc/edge/alluxio/client/block/policy/SpecificHostPolicy.html">alluxio.client.block.policy.SpecificHostPolicy</a>]. |
alluxio.user.ufs.block.read.location.policy.deterministic.hash.shards | 1 | When alluxio.user.ufs.block.read.location.policy is set to alluxio.client.block.policy.DeterministicHashPolicy, this specifies the number of hash shards. |
alluxio.user.worker.list.refresh.interval | 2min | The interval used to refresh the live worker list on the client |
Resource Manager Configuration
When running Alluxio with resource managers like Mesos and YARN, Alluxio has additional configuration options.
Security Configuration
The security configuration specifies information regarding the security features, such as authentication and file permission. Settings for authentication take effect for master, worker, and user. Settings for file permission only take effect for master. See Security for more information about security features.