S3 API
Alluxio supports a RESTFul API that is compatible with the basic operations of the Amazon S3 API.
The REST API documentation is generated as part of Alluxio build and accessible through ${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html
.
The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.
There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.
Features support
The following table describes the support status for current Amazon S3 functional features:
S3 Feature | Status |
---|---|
List Buckets | Supported |
Delete Buckets | Supported |
Create Bucket | Supported |
Bucket Lifecycle | Not Supported |
Policy (Buckets, Objects) | Not Supported |
Bucket ACLs (Get, Put) | Not Supported |
Bucket Location | Not Supported |
Bucket Notification | Not Supported |
Bucket Object Versions | Not Supported |
Get Bucket Info (HEAD) | Not Supported |
Put Object | Supported |
Delete Object | Supported |
Get Object | Supported |
Get Object Info (HEAD) | Supported |
Get Object (Range Query) | Not Supported [ALLUXIO-3321] |
Object ACLs (Get, Put) | Not Supported |
POST Object | Not Supported |
Copy Object | Not Supported |
Multipart Uploads | Supported |
Language support
Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.
Example Usage
REST API
For example, you can run the following RESTFul API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.
Create a bucket
# curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:34:41 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)
Get the bucket (listing objects)
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:35:00 GMT
Content-Type: application/xml
Content-Length: 200
Server: Jetty(9.2.z-SNAPSHOT)
<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>
Put an object
Assuming there is an existing file on local file system called LICENSE
:
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:36:03 GMT
ETag: "9347237b67b0be183499e5893128704e"
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)
Get the object:
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:37:34 GMT
Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
Content-Type: application/xml
Content-Length: 26847
Server: Jetty(9.2.z-SNAPSHOT)
.................. Content of the test file ...................
Listing a bucket with one object
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:38:48 GMT
Content-Type: application/xml
Content-Length: 363
Server: Jetty(9.2.z-SNAPSHOT)
<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
Listing a bucket with multiple objects
You can upload more files and use the max-keys
and continuation-token
as the GET bucket request param. For example:
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:40:45 GMT
Content-Type: application/xml
Content-Length: 537
Server: Jetty(9.2.z-SNAPSHOT)
<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:41:18 GMT
Content-Type: application/xml
Content-Length: 540
Server: Jetty(9.2.z-SNAPSHOT)
<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
You can also verify those objects are represented as Alluxio files, under /testbucket
directory.
./bin/alluxio fs ls -R /testbucket
Delete objects
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject
Initiate a multipart upload
# curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 197
Server: Jetty(9.2.z-SNAPSHOT)
<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="">
<Bucket>testbucket</Bucket>
<Key>testobject</Key>
<UploadId>2</UploadId>
</InitiateMultipartUploadResult>
Upload part
# curl -i -X PUT 'http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2'
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Server: Jetty(9.2.z-SNAPSHOT)
List parts
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 985
Server: Jetty(9.2.z-SNAPSHOT)
<?xml version="1.0" encoding="UTF-8"?>
<ListPartsResult xmlns="">
<Bucket>testbucket</Bucket>
<Key>testobject</Key>
<UploadId>2</UploadId>
<StorageClass>STANDARD</StorageClass>
<IsTruncated>false</IsTruncated>
<Part>
<PartNumber>1</PartNumber>
<LastModified>2017-08-29T20:48:34.000Z</LastModified>
<ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
<Size>10485760</Size>
</Part>
</ListPartsResult>
Complete a multipart upload
# curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
<CompleteMultipartUpload>
<Part>
<PartNumber>1</PartNumber>
<ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
</Part>
</CompleteMultipartUpload>'
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)
<?xml version="1.0" encoding="UTF-8"?>
<CompleteMultipartUploadResult xmlns="">
<Location>/testbucket/testobjectLocation>
<Bucket>testbucket</Bucket>
<Key>testobject</Key>
<ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
</CompleteMultipartUploadResult>
Abort a multipart upload
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 204 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)
Delete an empty bucket
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:45:19 GMT
Python S3 Client
Create a connection:
import boto
import boto.s3.connection
conn = boto.connect_s3(
aws_access_key_id = '',
aws_secret_access_key = '',
host = 'localhost',
port = 39999,
path = '/api/v1/s3',
is_secure=False,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
Create a bucket
bucketName = 'bucket-for-testing'
bucket = conn.create_bucket(bucketName)
PUT a small object
smallObjectKey = 'small.txt'
smallObjectContent = 'Hello World!'
key = bucket.new_key(smallObjectKey)
key.set_contents_from_string(smallObjectContent)
Get the small object
assert smallObjectContent == key.get_contents_as_string()
Upload a large object
Create a 8MB file on local file system.
# dd if=/dev/zero of=8mb.data bs=1048576 count=8
Then use python S3 client to upload this as an object
largeObjectKey = 'large.txt'
largeObjectFile = '8mb.data'
key = bucket.new_key(largeObjectKey)
with open(largeObjectFile, 'rb') as f:
key.set_contents_from_file(f)
with open(largeObjectFile, 'rb') as f:
largeObject = f.read()
Get the large object
assert largeObject == key.get_contents_as_string()
Delete the objects
bucket.delete_key(smallObjectKey)
bucket.delete_key(largeObjectKey)
Initiate a multipart upload
mp = bucket.initiate_multipart_upload(largeObjectFile)
Upload parts
import math, os
from filechunkio import FileChunkIO
# Use a chunk size of 1MB (feel free to change this)
sourceSize = os.stat(largeObjectFile).st_size
chunkSize = 1048576
chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
for i in range(chunkCount):
offset = chunkSize * i
bytes = min(chunkSize, sourceSize - offset)
with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
mp.upload_part_from_file(fp, part_num=i + 1)
Complete the multipart upload
mp.complete_upload()
Abort the multipart upload
mp.cancel_upload()
Delete the bucket
conn.delete_bucket(bucketName)