Motivation

Cloud storage is an ideal place to backup warm data. Its storage is scalable, and cost is usually low compared to on-premise storage servers. Uploading to the cloud is usually free. However, usually the cloud storage access is not free and slow.

SeaweedFS is fast. However, it is limited by available number of volume servers.

One good way is to combine SeaweedFS' fast local access speed with the cloud storage's elastic capacity.

Assuming hot data is 20% and warm data is 80%. We can move the warm data to the cloud storage. The access for the warm data will be slower, but this can free up 80% servers, or repurpose them for faster local access, instead of just storing warm data with little access. This integration is all transparent to SeaweedFS users.

With fixed number of servers, this transparent cloud integration literally gives SeaweedFS unlimited capacity, in addition to its fast speed. Just add more local SeaweedFS volume servers to increase the throughput.

Design

If one volume is tiered to the cloud,

  • The volume is marked as readonly.
  • The index file is still local
  • The .dat file is moved to the cloud.
  • The same O(1) disk read is applied to the remote file. When requesting a file entry, a single range request retrieves the entry's content.

Usage

  • Use weed scaffold -conf=master to generate master.toml, tweak it, and start master server with the master.toml.
  • Use volume.tier.upload in weed shell to move volumes to the cloud.
  • Use volume.tier.download in weed shell to move volumes to the local cluster.

Configuring Storage Backend

(Currently only s3 is developed. More is coming soon.)

Multiple s3 buckets are supported. Usually you just need to configure one backend.

  1. [storage.backend]
  2. [storage.backend.s3.default]
  3. enabled = true
  4. aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
  5. aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
  6. region = "us-west-1"
  7. bucket = "one_bucket" # an existing bucket
  8. [storage.backend.s3.name2]
  9. enabled = true
  10. aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
  11. aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
  12. region = "us-west-2"
  13. bucket = "one_bucket_two" # an existing bucket

After this is configured, you can use this command to upload the .dat file content to the cloud.

  1. // move the volume 37.dat to the s3 cloud
  2. volume.tier.upload -dest=s3 -collection=benchmark -volumeId=37
  3. // or
  4. volume.tier.upload -dest=s3.default -collection=benchmark -volumeId=37
  5. // if for any reason you want to move the volume to a different bucket
  6. volume.tier.upload -dest=s3.name2 -collection=benchmark -volumeId=37