Motivation
Cloud storage is an ideal place to backup warm data. Its storage is scalable, and cost is usually low compared to on-premise storage servers. Uploading to the cloud is usually free. However, usually the cloud storage access is not free and slow.
SeaweedFS is fast. However, it is limited by available number of volume servers.
One good way is to combine SeaweedFS' fast local access speed with the cloud storage's elastic capacity.
Assuming hot data is 20% and warm data is 80%. We can move the warm data to the cloud storage. The access for the warm data will be slower, but this can free up 80% servers, or repurpose them for faster local access, instead of just storing warm data with little access. This integration is all transparent to SeaweedFS users.
With fixed number of servers, this transparent cloud integration literally gives SeaweedFS unlimited capacity, in addition to its fast speed. Just add more local SeaweedFS volume servers to increase the throughput.
Design
If one volume is tiered to the cloud,
- The volume is marked as readonly.
- The index file is still local
- The
.dat
file is moved to the cloud. - The same O(1) disk read is applied to the remote file. When requesting a file entry, a single range request retrieves the entry's content.
Usage
- Use
weed scaffold -conf=master
to generatemaster.toml
, tweak it, and start master server with themaster.toml
. - Use
volume.tier.upload
inweed shell
to move volumes to the cloud. - Use
volume.tier.download
inweed shell
to move volumes to the local cluster.
Configuring Storage Backend
(Currently only s3 is developed. More is coming soon.)
Multiple s3 buckets are supported. Usually you just need to configure one backend.
[storage.backend]
[storage.backend.s3.default]
enabled = true
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-west-1"
bucket = "one_bucket" # an existing bucket
[storage.backend.s3.name2]
enabled = true
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-west-2"
bucket = "one_bucket_two" # an existing bucket
After this is configured, you can use this command to upload the .dat file content to the cloud.
// move the volume 37.dat to the s3 cloud
volume.tier.upload -dest=s3 -collection=benchmark -volumeId=37
// or
volume.tier.upload -dest=s3.default -collection=benchmark -volumeId=37
// if for any reason you want to move the volume to a different bucket
volume.tier.upload -dest=s3.name2 -collection=benchmark -volumeId=37