S3 API

Slack Docker Pulls GitHub edit source

Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 API.

The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.

There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.

Features support

The following table describes the support status for current Amazon S3 functional features:

S3 FeatureStatus
List BucketsSupported
Delete BucketsSupported
Create BucketSupported
Bucket LifecycleNot Supported
Policy (Buckets, Objects)Not Supported
Bucket ACLs (Get, Put)Not Supported
Bucket LocationNot Supported
Bucket NotificationNot Supported
Bucket Object VersionsNot Supported
Get Bucket Info (HEAD)Not Supported
Put ObjectSupported
Delete ObjectSupported
Get ObjectSupported
Get Object Info (HEAD)Supported
Get Object (Range Query)Not Supported [ALLUXIO-3321]
Object ACLs (Get, Put)Not Supported
POST ObjectNot Supported
Copy ObjectNot Supported
Multipart UploadsSupported

Language support

Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.

Example Usage

REST API

For example, you can run the following RESTful API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.

Create a bucket

  1. $ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:23:18 GMT
  4. Content-Length: 0
  5. Server: Jetty(9.2.z-SNAPSHOT)

Get the bucket (listing objects)

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:23:56 GMT
  4. Content-Type: application/xml
  5. Content-Length: 191
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

Put an object

Assuming there is an existing file on local file system called LICENSE:

  1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 100 Continue
  3. HTTP/1.1 200 OK
  4. Date: Tue, 18 Jun 2019 21:24:32 GMT
  5. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
  6. Content-Length: 0
  7. Server: Jetty(9.2.z-SNAPSHOT)

Get the object:

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:24:57 GMT
  4. Last-Modified: Tue, 18 Jun 2019 21:24:33 GMT
  5. Content-Type: application/xml
  6. Content-Length: 27040
  7. Server: Jetty(9.2.z-SNAPSHOT)
  8. .................. Content of the test file ...................

Listing a bucket with one object

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:25:27 GMT
  4. Content-Type: application/xml
  5. Content-Length: 354
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

Listing a bucket with multiple objects

You can upload more files and use the max-keys and continuation-token as the GET bucket request parameter. For example:

  1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
  2. HTTP/1.1 100 Continue
  3. HTTP/1.1 200 OK
  4. Date: Tue, 18 Jun 2019 21:26:05 GMT
  5. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
  6. Content-Length: 0
  7. Server: Jetty(9.2.z-SNAPSHOT)
  8. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
  9. HTTP/1.1 100 Continue
  10. HTTP/1.1 200 OK
  11. Date: Tue, 18 Jun 2019 21:26:28 GMT
  12. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
  13. Content-Length: 0
  14. Server: Jetty(9.2.z-SNAPSHOT)
  15. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
  16. HTTP/1.1 100 Continue
  17. HTTP/1.1 200 OK
  18. Date: Tue, 18 Jun 2019 21:26:43 GMT
  19. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
  20. Content-Length: 0
  21. Server: Jetty(9.2.z-SNAPSHOT)
  22. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
  23. HTTP/1.1 200 OK
  24. Date: Tue, 18 Jun 2019 21:26:57 GMT
  25. Content-Type: application/xml
  26. Content-Length: 528
  27. Server: Jetty(9.2.z-SNAPSHOT)
  28. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2019-06-18T14:26:05.694Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2019-06-18T14:26:28.153Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
  29. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
  30. HTTP/1.1 200 OK
  31. Date: Tue, 18 Jun 2019 21:28:14 GMT
  32. Content-Type: application/xml
  33. Content-Length: 531
  34. Server: Jetty(9.2.z-SNAPSHOT)
  35. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2019-06-18T14:26:43.081Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

You can also verify those objects are represented as Alluxio files, under /testbucket directory.

  1. $ ./bin/alluxio fs ls -R /testbucket
  2. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:05:694 100% /testbucket/key1
  3. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:28:153 100% /testbucket/key2
  4. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:43:081 100% /testbucket/key3
  5. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:24:33:029 100% /testbucket/testobject

Delete objects

  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 18 Jun 2019 21:31:27 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)
  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 18 Jun 2019 21:31:44 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)
  5. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
  6. HTTP/1.1 204 No Content
  7. Date: Tue, 18 Jun 2019 21:31:58 GMT
  8. Server: Jetty(9.2.z-SNAPSHOT)
  9. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject
  10. HTTP/1.1 204 No Content
  11. Date: Tue, 18 Jun 2019 21:32:08 GMT
  12. Server: Jetty(9.2.z-SNAPSHOT)

Initiate a multipart upload

  1. $ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:32:36 GMT
  4. Content-Type: application/xml
  5. Content-Length: 133
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <InitiateMultipartUploadResult><Bucket>testbucket</Bucket><Key>testobject</Key><UploadId>3</UploadId></InitiateMultipartUploadResult>

Note that the commands below related to multipart upload need the upload ID shown above, it’s not necessarily 3.

Upload part

  1. $ curl -i -X PUT 'http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=3'
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:33:36 GMT
  4. ETag: "d41d8cd98f00b204e9800998ecf8427e"
  5. Content-Length: 0
  6. Server: Jetty(9.2.z-SNAPSHOT)

List parts

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:35:10 GMT
  4. Content-Type: application/xml
  5. Content-Length: 296
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListPartsResult><Bucket>/testbucket</Bucket><Key>testobject</Key><UploadId>3</UploadId><StorageClass>STANDARD</StorageClass><IsTruncated>false</IsTruncated><Part><PartNumber>1</PartNumber><LastModified>2019-06-18T14:33:36.373Z</LastModified><ETag>""</ETag><Size>0</Size></Part></ListPartsResult>

Complete a multipart upload

  1. $ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
  2. HTTP/1.1 200 OK
  3. Date: Tue, 18 Jun 2019 21:35:47 GMT
  4. Content-Type: application/xml
  5. Content-Length: 201
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <CompleteMultipartUploadResult><Location>/testbucket/testobject</Location><Bucket>testbucket</Bucket><Key>testobject</Key><ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag></CompleteMultipartUploadResult>

Abort a multipart upload

A non-completed upload can be aborted:

  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 18 Jun 2019 21:37:27 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)

Delete an empty bucket

  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 18 Jun 2019 21:38:38 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)

Python S3 Client

Tested for Python 2.7.

Create a connection:

  1. import boto
  2. import boto.s3.connection
  3. conn = boto.connect_s3(
  4. aws_access_key_id = '',
  5. aws_secret_access_key = '',
  6. host = 'localhost',
  7. port = 39999,
  8. path = '/api/v1/s3',
  9. is_secure=False,
  10. calling_format = boto.s3.connection.OrdinaryCallingFormat(),
  11. )

Create a bucket

  1. bucketName = 'bucket-for-testing'
  2. bucket = conn.create_bucket(bucketName)

PUT a small object

  1. smallObjectKey = 'small.txt'
  2. smallObjectContent = 'Hello World!'
  3. key = bucket.new_key(smallObjectKey)
  4. key.set_contents_from_string(smallObjectContent)

Get the small object

  1. assert smallObjectContent == key.get_contents_as_string()

Upload a large object

Create a 8MB file on local file system.

  1. $ dd if=/dev/zero of=8mb.data bs=1048576 count=8

Then use python S3 client to upload this as an object

  1. largeObjectKey = 'large.txt'
  2. largeObjectFile = '8mb.data'
  3. key = bucket.new_key(largeObjectKey)
  4. with open(largeObjectFile, 'rb') as f:
  5. key.set_contents_from_file(f)
  6. with open(largeObjectFile, 'rb') as f:
  7. largeObject = f.read()

Get the large object

  1. assert largeObject == key.get_contents_as_string()

Delete the objects

  1. bucket.delete_key(smallObjectKey)
  2. bucket.delete_key(largeObjectKey)

Initiate a multipart upload

  1. mp = bucket.initiate_multipart_upload(largeObjectKey)

Upload parts

  1. import math, os
  2. from filechunkio import FileChunkIO
  3. # Use a chunk size of 1MB (feel free to change this)
  4. sourceSize = os.stat(largeObjectFile).st_size
  5. chunkSize = 1048576
  6. chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
  7. for i in range(chunkCount):
  8. offset = chunkSize * i
  9. bytes = min(chunkSize, sourceSize - offset)
  10. with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
  11. mp.upload_part_from_file(fp, part_num=i + 1)

Complete the multipart upload

  1. mp.complete_upload()

Abort the multipart upload

Non-completed uploads can be aborted.

  1. mp.cancel_upload()

Delete the bucket

  1. bucket.delete_key(largeObjectKey)
  2. conn.delete_bucket(bucketName)