Google Cloud Storage
This guide describes how to configure Alluxio with Google Cloud Storage (GCS) as the under storage system.
Initial Setup
The Alluxio binaries must be on your machine. You can either compile Alluxio, or download the binaries locally.
In preparation for using GCS with Alluxio, create a bucket (or use an existing bucket). You should also note the directory you want to use in that bucket, either by creating a new directory in the bucket, or using an existing one. For the purposes of this guide, the GCS bucket name is called GCS_BUCKET
, and the directory in that bucket is called GCS_DIRECTORY
.
For more information on GCS, please read its documentation.
Configuring Alluxio
Configure Alluxio to use under storage systems by modifying conf/alluxio-site.properties
. If it does not exist, create the configuration file from the template.
cp conf/alluxio-site.properties.template conf/alluxio-site.properties
Configure Alluxio to use GCS as its under storage system. The first modification is to specify an existing GCS bucket and directory as the under storage system by modifying conf/alluxio-site.properties
to include:
alluxio.underfs.address=gs://GCS_BUCKET/GCS_DIRECTORY
Specify the Google credentials for GCS access. In conf/alluxio-site.properties
, add:
fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID>
fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY>
Replace <GCS_ACCESS_KEY_ID>
and <GCS_SECRET_ACCESS_KEY>
with actual GCS interoperable storage access keys, or other environment variables that contain your credentials. Note: GCS interoperability is disabled by default. Please click on the Interoperability tab in GCS setting and enable this feature. Click on Create a new key
to get the Access Key and Secret pair.
After these changes, Alluxio should be configured to work with GCS as its under storage system, and you can Run Alluxio Locally with GCS.
Configuring Application Dependency
When building your application to use Alluxio, your application should include a client module, the alluxio-core-client-fs
module to use the Alluxio file system interface or the alluxio-core-client-hdfs
module to use the Hadoop file system interface. For example, if you are using maven, you can add the dependency to your application with:
<!-- Alluxio file system interface -->
<dependency>
<groupId>org.alluxio</groupId>
<artifactId>alluxio-core-client-fs</artifactId>
<version>1.8.3-SNAPSHOT</version>
</dependency>
<!-- HDFS file system interface -->
<dependency>
<groupId>org.alluxio</groupId>
<artifactId>alluxio-core-client-hdfs</artifactId>
<version>1.8.3-SNAPSHOT</version>
</dependency>
Running Alluxio Locally with GCS
Start up Alluxio locally to see that everything works.
./bin/alluxio format
./bin/alluxio-start.sh local
This should start an Alluxio master and an Alluxio worker. You can see the master UI at http://localhost:19999.
Run a simple example program:
./bin/alluxio runTests
Visit your GCS directory GCS_BUCKET/GCS_DIRECTORY
to verify the files and directories created by Alluxio exist. For this test, you should see files named like:
GCS_BUCKET/GCS_DIRECTORY/alluxio/data/default_tests_files/Basic_CACHE_THROUGH
To stop Alluxio, you can run:
./bin/alluxio-stop.sh local
GCS Access Control
If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.
The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have the right access permission to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio permission on the first time when the metadata is loaded to Alluxio namespace.
Mapping from GCS user to Alluxio file owner
By default, Alluxio tries to extract the GCS user id from the credentials. Optionally, alluxio.underfs.gcs.owner.id.to.username.mapping
can be used to specify a preset gcs owner id to Alluxio username static mapping in the format “id1=user1;id2=user2”. The Google Cloud Storage IDs can be found at the console address. Please use the “Owners” one.
Mapping from GCS ACL to Alluxio permission
Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner’s permission mode to a Alluxio file. For example, if the GCS user has read-only access to the underlying bucket, the mounted directory and files would have 0500 mode. If the GCS user has full access to the underlying bucket, the mounted directory and files would have 0700 mode.
Mount point sharing
If you want to share the GCS mount point with other users in Alluxio namespace, you can enable alluxio.underfs.object.store.mount.shared.publicly
.
Permission change
Command such as chown, chgrp, and chmod to Alluxio directories and files do NOT propagate to the underlying GCS buckets nor objects.