v1.59

Internet Archive

The Internet Archive backend utilizes Items on archive.org

Refer to IAS3 API documentation for the API this backend uses.

Paths are specified as remote:bucket (or remote: for the lsd command.) You may put subdirectories in too, e.g. remote:item/path/to/dir.

Unlike S3, listing up all items uploaded by you isn’t supported.

Once you have made a remote, you can use it like this:

Make a new item

  1. rclone mkdir remote:item

List the contents of a item

  1. rclone ls remote:item

Sync /home/local/directory to the remote item, deleting any excess files in the item.

  1. rclone sync --interactive /home/local/directory remote:item

Notes

Because of Internet Archive’s architecture, it enqueues write operations (and extra post-processings) in a per-item queue. You can check item’s queue at https://catalogd.archive.org/history/item-name-here . Because of that, all uploads/deletes will not show up immediately and takes some time to be available. The per-item queue is enqueued to an another queue, Item Deriver Queue. You can check the status of Item Deriver Queue here. This queue has a limit, and it may block you from uploading, or even deleting. You should avoid uploading a lot of small files for better behavior.

You can optionally wait for the server’s processing to finish, by setting non-zero value to wait_archive key. By making it wait, rclone can do normal file comparison. Make sure to set a large enough value (e.g. 30m0s for smaller files) as it can take a long time depending on server’s queue.

About metadata

This backend supports setting, updating and reading metadata of each file. The metadata will appear as file metadata on Internet Archive. However, some fields are reserved by both Internet Archive and rclone.

The following are reserved by Internet Archive:

  • name
  • source
  • size
  • md5
  • crc32
  • sha1
  • format
  • old_version
  • viruscheck
  • summation

Trying to set values to these keys is ignored with a warning. Only setting mtime is an exception. Doing so make it the identical behavior as setting ModTime.

rclone reserves all the keys starting with rclone-. Setting value for these keys will give you warnings, but values are set according to request.

If there are multiple values for a key, only the first one is returned. This is a limitation of rclone, that supports one value per one key. It can be triggered when you did a server-side copy.

Reading metadata will also provide custom (non-standard nor reserved) ones.

Filtering auto generated files

The Internet Archive automatically creates metadata files after upload. These can cause problems when doing an rclone sync as rclone will try, and fail, to delete them. These metadata files are not changeable, as they are created by the Internet Archive automatically.

These auto-created files can be excluded from the sync using metadata filtering.

  1. rclone sync ... --metadata-exclude "source=metadata" --metadata-exclude "format=Metadata"

Which excludes from the sync any files which have the source=metadata or format=Metadata flags which are added to Internet Archive auto-created files.

Configuration

Here is an example of making an internetarchive configuration. Most applies to the other providers as well, any differences are described below.

First run

  1. rclone config

This will guide you through an interactive setup process.

  1. No remotes found, make a new one?
  2. n) New remote
  3. s) Set configuration password
  4. q) Quit config
  5. n/s/q> n
  6. name> remote
  7. Option Storage.
  8. Type of storage to configure.
  9. Choose a number from below, or type in your own value.
  10. XX / InternetArchive Items
  11. \ (internetarchive)
  12. Storage> internetarchive
  13. Option access_key_id.
  14. IAS3 Access Key.
  15. Leave blank for anonymous access.
  16. You can find one here: https://archive.org/account/s3.php
  17. Enter a value. Press Enter to leave empty.
  18. access_key_id> XXXX
  19. Option secret_access_key.
  20. IAS3 Secret Key (password).
  21. Leave blank for anonymous access.
  22. Enter a value. Press Enter to leave empty.
  23. secret_access_key> XXXX
  24. Edit advanced config?
  25. y) Yes
  26. n) No (default)
  27. y/n> y
  28. Option endpoint.
  29. IAS3 Endpoint.
  30. Leave blank for default value.
  31. Enter a string value. Press Enter for the default (https://s3.us.archive.org).
  32. endpoint>
  33. Option front_endpoint.
  34. Host of InternetArchive Frontend.
  35. Leave blank for default value.
  36. Enter a string value. Press Enter for the default (https://archive.org).
  37. front_endpoint>
  38. Option disable_checksum.
  39. Don't store MD5 checksum with object metadata.
  40. Normally rclone will calculate the MD5 checksum of the input before
  41. uploading it so it can ask the server to check the object against checksum.
  42. This is great for data integrity checking but can cause long delays for
  43. large files to start uploading.
  44. Enter a boolean value (true or false). Press Enter for the default (true).
  45. disable_checksum> true
  46. Option encoding.
  47. The encoding for the backend.
  48. See the [encoding section in the overview](/overview/#encoding) for more info.
  49. Enter a encoder.MultiEncoder value. Press Enter for the default (Slash,Question,Hash,Percent,Del,Ctl,InvalidUtf8,Dot).
  50. encoding>
  51. Edit advanced config?
  52. y) Yes
  53. n) No (default)
  54. y/n> n
  55. Configuration complete.
  56. Options:
  57. - type: internetarchive
  58. - access_key_id: XXXX
  59. - secret_access_key: XXXX
  60. Keep this "remote" remote?
  61. y) Yes this is OK (default)
  62. e) Edit this remote
  63. d) Delete this remote
  64. y/e/d> y

Standard options

Here are the Standard options specific to internetarchive (Internet Archive).

--internetarchive-access-key-id

IAS3 Access Key.

Leave blank for anonymous access. You can find one here: https://archive.org/account/s3.php

Properties:

  • Config: access_key_id
  • Env Var: RCLONE_INTERNETARCHIVE_ACCESS_KEY_ID
  • Type: string
  • Required: false

--internetarchive-secret-access-key

IAS3 Secret Key (password).

Leave blank for anonymous access.

Properties:

  • Config: secret_access_key
  • Env Var: RCLONE_INTERNETARCHIVE_SECRET_ACCESS_KEY
  • Type: string
  • Required: false

Advanced options

Here are the Advanced options specific to internetarchive (Internet Archive).

--internetarchive-endpoint

IAS3 Endpoint.

Leave blank for default value.

Properties:

--internetarchive-front-endpoint

Host of InternetArchive Frontend.

Leave blank for default value.

Properties:

  • Config: front_endpoint
  • Env Var: RCLONE_INTERNETARCHIVE_FRONT_ENDPOINT
  • Type: string
  • Default: “https://archive.org

--internetarchive-disable-checksum

Don’t ask the server to test against MD5 checksum calculated by rclone. Normally rclone will calculate the MD5 checksum of the input before uploading it so it can ask the server to check the object against checksum. This is great for data integrity checking but can cause long delays for large files to start uploading.

Properties:

  • Config: disable_checksum
  • Env Var: RCLONE_INTERNETARCHIVE_DISABLE_CHECKSUM
  • Type: bool
  • Default: true

--internetarchive-wait-archive

Timeout for waiting the server’s processing tasks (specifically archive and book_op) to finish. Only enable if you need to be guaranteed to be reflected after write operations. 0 to disable waiting. No errors to be thrown in case of timeout.

Properties:

  • Config: wait_archive
  • Env Var: RCLONE_INTERNETARCHIVE_WAIT_ARCHIVE
  • Type: Duration
  • Default: 0s

--internetarchive-encoding

The encoding for the backend.

See the encoding section in the overview for more info.

Properties:

  • Config: encoding
  • Env Var: RCLONE_INTERNETARCHIVE_ENCODING
  • Type: Encoding
  • Default: Slash,LtGt,CrLf,Del,Ctl,InvalidUtf8,Dot

--internetarchive-description

Description of the remote.

Properties:

  • Config: description
  • Env Var: RCLONE_INTERNETARCHIVE_DESCRIPTION
  • Type: string
  • Required: false

Metadata

Metadata fields provided by Internet Archive. If there are multiple values for a key, only the first one is returned. This is a limitation of Rclone, that supports one value per one key.

Owner is able to add custom keys. Metadata feature grabs all the keys including them.

Here are the possible system metadata items for the internetarchive backend.

NameHelpTypeExampleRead Only
crc32CRC32 calculated by Internet Archivestring01234567Y
formatName of format identified by Internet ArchivestringComma-Separated ValuesY
md5MD5 hash calculated by Internet Archivestring01234567012345670123456701234567Y
mtimeTime of last modification, managed by RcloneRFC 33392006-01-02T15:04:05.999999999ZY
nameFull file path, without the bucket partfilenamebackend/internetarchive/internetarchive.goY
old_versionWhether the file was replaced and moved by keep-old-version flagbooleantrueY
rclone-ia-mtimeTime of last modification, managed by Internet ArchiveRFC 33392006-01-02T15:04:05.999999999ZN
rclone-mtimeTime of last modification, managed by RcloneRFC 33392006-01-02T15:04:05.999999999ZN
rclone-update-trackRandom value used by Rclone for tracking changes inside Internet ArchivestringaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN
sha1SHA1 hash calculated by Internet Archivestring0123456701234567012345670123456701234567Y
sizeFile size in bytesdecimal number123456Y
sourceThe source of the filestringoriginalY
summationCheck https://forum.rclone.org/t/31922 for how it is usedstringmd5Y
viruscheckThe last time viruscheck process was run for the file (?)unixtime1654191352Y

See the metadata docs for more info.