File Manager

Some functions in Doris require some user-defined files. For example, public keys, key files, certificate files and so on are used to access external data sources. The File Manager provides a function that allows users to upload these files in advance and save them in Doris system, which can then be referenced or accessed in other commands.

Noun Interpretation

  • BDBJE: Oracle Berkeley DB Java Edition. Distributed embedded database for persistent metadata in FE.
  • SmallFileMgr: File Manager. Responsible for creating and maintaining user files.

Basic concepts

Files are files created and saved by users in Doris.

A file is located by database, catalog, file_name. At the same time, each file also has a globally unique ID (file_id), which serves as the identification in the system.

File creation and deletion can only be performed by users with admin privileges. A file belongs to a database. Users who have access to a database (queries, imports, modifications, etc.) can use the files created under the database.

Specific operation

File management has three main commands: CREATE FILE, SHOW FILE and DROP FILE, creating, viewing and deleting files respectively. The specific syntax of these three commands can be viewed by connecting to Doris and executing HELP cmd;.

CREATE FILE

This statement is used to create and upload a file to the Doris cluster. For details, see CREATE FILE.

Examples:

  1. 1. Create file ca.pem , classified as kafka
  2. CREATE FILE "ca.pem"
  3. PROPERTIES
  4. (
  5. "url" = "https://test.bj.bcebos.com/kafka-key/ca.pem",
  6. "catalog" = "kafka"
  7. );
  8. 2. Create a file client.key, classified as my_catalog
  9. CREATE FILE "client.key"
  10. IN my_database
  11. PROPERTIES
  12. (
  13. "url" = "https://test.bj.bcebos.com/kafka-key/client.key",
  14. "catalog" = "my_catalog",
  15. "md5" = "b5bb901bf10f99205b39a46ac3557dd9"
  16. );

SHOW FILE

This statement can view the files that have been created successfully. For details, see SHOW FILE.

Examples:

  1. 1. View uploaded files in database my_database
  2. SHOW FILE FROM my_database;

DROP FILE

This statement can view and delete an already created file. For specific operations, see DROP FILE.

Examples:

  1. 1. delete file ca.pem
  2. DROP FILE "ca.pem" properties("catalog" = "kafka");

Implementation details

Create and delete files

When the user executes the CREATE FILE command, FE downloads the file from a given URL. The contents of the file are stored in FE memory directly in the form of Base64 encoding. At the same time, the file content and meta-information related to the file will be persisted in BDBJE. All created files, their meta-information and file content reside in FE memory. If the FE goes down and restarts, meta information and file content will also be loaded into memory from the BDBJE. When a file is deleted, the relevant information is deleted directly from FE memory and persistent information is deleted from BDBJE.

Use of documents

If the FE side needs to use the created file, SmallFileMgr will directly save the data in FE memory as a local file, store it in the specified directory, and return the local file path for use.

If the BE side needs to use the created file, BE will download the file content to the specified directory on BE through FE’s HTTP interface api/get_small_file for use. At the same time, BE also records the information of the files that have been downloaded in memory. When BE requests a file, it first checks whether the local file exists and verifies it. If the validation passes, the local file path is returned directly. If the validation fails, the local file is deleted and downloaded from FE again. When BE restarts, local files are preloaded into memory.

Use restrictions

Because the file meta-information and content are stored in FE memory. So by default, only files with size less than 1MB can be uploaded. And the total number of files is limited to 100. The configuration items described in the next section can be modified.

Relevant configuration

  1. FE configuration
  • Small_file_dir: The path used to store uploaded files, defaulting to the small_files/ directory of the FE runtime directory.

  • max_small_file_size_bytes: A single file size limit in bytes. The default is 1MB. File creation larger than this configuration will be rejected.

  • max_small_file_number: The total number of files supported by a Doris cluster. The default is 100. When the number of files created exceeds this value, subsequent creation will be rejected.

    If you need to upload more files or increase the size limit of a single file, you can modify the max_small_file_size_bytes and max_small_file_number parameters by using the ADMIN SET CONFIG command. However, the increase in the number and size of files will lead to an increase in FE memory usage.

  1. BE configuration
  • Small_file_dir: The path used to store files downloaded from FE by default is in the lib/small_files/ directory of the BE runtime directory.

More Help

For more detailed syntax and best practices used by the file manager, see CREATE FILE, DROP FILE and SHOW FILE command manual, you can also enter HELP CREATE FILE, HELP DROP FILE and HELP SHOW FILE in the MySql client command line to get more help information.