v1.54

HDFS

HDFS is a distributed file-system, part of the Apache Hadoop framework.

Paths are specified as remote: or remote:path/to/dir.

Configuration

Here is an example of how to make a remote called remote. First run:

  1. rclone config

This will guide you through an interactive setup process:

  1. No remotes found, make a new one?
  2. n) New remote
  3. s) Set configuration password
  4. q) Quit config
  5. n/s/q> n
  6. name> remote
  7. Type of storage to configure.
  8. Enter a string value. Press Enter for the default ("").
  9. Choose a number from below, or type in your own value
  10. [skip]
  11. XX / Hadoop distributed file system
  12. \ "hdfs"
  13. [skip]
  14. Storage> hdfs
  15. ** See help for hdfs backend at: https://rclone.org/hdfs/ **
  16. hadoop name node and port
  17. Enter a string value. Press Enter for the default ("").
  18. Choose a number from below, or type in your own value
  19. 1 / Connect to host namenode at port 8020
  20. \ "namenode:8020"
  21. namenode> namenode.hadoop:8020
  22. hadoop user name
  23. Enter a string value. Press Enter for the default ("").
  24. Choose a number from below, or type in your own value
  25. 1 / Connect to hdfs as root
  26. \ "root"
  27. username> root
  28. Edit advanced config? (y/n)
  29. y) Yes
  30. n) No (default)
  31. y/n> n
  32. Remote config
  33. Configuration complete.
  34. Options:
  35. - type: hdfs
  36. - namenode: namenode.hadoop:8020
  37. - username: root
  38. Keep this "remote" remote?
  39. y) Yes this is OK (default)
  40. e) Edit this remote
  41. d) Delete this remote
  42. y/e/d> y
  43. Current remotes:
  44. Name Type
  45. ==== ====
  46. hadoop hdfs
  47. e) Edit existing remote
  48. n) New remote
  49. d) Delete remote
  50. r) Rename remote
  51. c) Copy remote
  52. s) Set configuration password
  53. q) Quit config
  54. e/n/d/r/c/s/q> q

This remote is called remote and can now be used like this

See all the top level directories

  1. rclone lsd remote:

List the contents of a directory

  1. rclone ls remote:directory

Sync the remote directory to /home/local/directory, deleting any excess files.

  1. rclone sync --interactive remote:directory /home/local/directory

Setting up your own HDFS instance for testing

You may start with a manual setup or use the docker image from the tests:

If you want to build the docker image

  1. git clone https://github.com/rclone/rclone.git
  2. cd rclone/fstest/testserver/images/test-hdfs
  3. docker build --rm -t rclone/test-hdfs .

Or you can just use the latest one pushed

  1. docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs

NB it need few seconds to startup.

For this docker image the remote needs to be configured like this:

  1. [remote]
  2. type = hdfs
  3. namenode = 127.0.0.1:8020
  4. username = root

You can stop this image with docker kill rclone-hdfs (NB it does not use volumes, so all data uploaded will be lost.)

Modification times

Time accurate to 1 second is stored.

Checksum

No checksums are implemented.

Usage information

You can use the rclone about remote: command which will display filesystem size and current usage.

Restricted filename characters

In addition to the default restricted characters set the following characters are also replaced:

CharacterValueReplacement
:0x3A

Invalid UTF-8 bytes will also be replaced.

Standard options

Here are the Standard options specific to hdfs (Hadoop distributed file system).

--hdfs-namenode

Hadoop name nodes and ports.

E.g. “namenode-1:8020,namenode-2:8020,…” to connect to host namenodes at port 8020.

Properties:

  • Config: namenode
  • Env Var: RCLONE_HDFS_NAMENODE
  • Type: CommaSepList
  • Default:

--hdfs-username

Hadoop user name.

Properties:

  • Config: username
  • Env Var: RCLONE_HDFS_USERNAME
  • Type: string
  • Required: false
  • Examples:
    • “root”
      • Connect to hdfs as root.

Advanced options

Here are the Advanced options specific to hdfs (Hadoop distributed file system).

--hdfs-service-principal-name

Kerberos service principal name for the namenode.

Enables KERBEROS authentication. Specifies the Service Principal Name (SERVICE/FQDN) for the namenode. E.g. “hdfs/namenode.hadoop.docker” for namenode running as service ‘hdfs’ with FQDN ‘namenode.hadoop.docker’.

Properties:

  • Config: service_principal_name
  • Env Var: RCLONE_HDFS_SERVICE_PRINCIPAL_NAME
  • Type: string
  • Required: false

--hdfs-data-transfer-protection

Kerberos data transfer protection: authentication|integrity|privacy.

Specifies whether or not authentication, data signature integrity checks, and wire encryption are required when communicating with the datanodes. Possible values are ‘authentication’, ‘integrity’ and ‘privacy’. Used only with KERBEROS enabled.

Properties:

  • Config: data_transfer_protection
  • Env Var: RCLONE_HDFS_DATA_TRANSFER_PROTECTION
  • Type: string
  • Required: false
  • Examples:
    • “privacy”
      • Ensure authentication, integrity and encryption enabled.

--hdfs-encoding

The encoding for the backend.

See the encoding section in the overview for more info.

Properties:

  • Config: encoding
  • Env Var: RCLONE_HDFS_ENCODING
  • Type: Encoding
  • Default: Slash,Colon,Del,Ctl,InvalidUtf8,Dot

--hdfs-description

Description of the remote.

Properties:

  • Config: description
  • Env Var: RCLONE_HDFS_DESCRIPTION
  • Type: string
  • Required: false

Limitations

  • No server-side Move or DirMove.
  • Checksums not implemented.