Local

Name

SinceVersion dev

local

Description

Local table-valued-function(tvf), allows users to read and access local file contents on be node, just like accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file format.

It needs ADMIN privilege to use.

syntax

  1. local(
  2. "file_path" = "path/to/file.txt",
  3. "backend_id" = "be_id",
  4. "format" = "csv",
  5. "keyn" = "valuen"
  6. ...
  7. );

parameter description

Related parameters for accessing local file on be node:

  • file_path:

    (required) The path of the file to be read, which is a relative path to the user_files_secure_path directory, where user_files_secure_path parameter can be configured on be.

    Can not contains .. in path. Support using glob syntax to match multi files, such as log/*.log

  • backend_id:

    (required) The backend id where the file resides. The backend_id can be obtained by show backends command.

File format parameters:

  • format: (required) Currently support csv/csv_with_names/csv_with_names_and_types/json/parquet/orc

  • column_separator: (optional) default ,.

  • line_delimiter: (optional) default \n.

  • compress_type: (optional) Currently support UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE. Default value is UNKNOWN, it will automatically infer the type based on the suffix of uri.

    The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: Json Load

  • read_json_by_line: (optional) default "true"

  • strip_outer_array: (optional) default "false"

  • json_root: (optional) default ""

  • json_paths: (optional) default ""

  • num_as_string: (optional) default false

  • fuzzy_parse: (optional) default false

    SinceVersion devThe following 2 parameters are used for loading in csv format

  • trim_double_quotes: Boolean type (optional), the default value is false. True means that the outermost double quotes of each field in the csv file are trimmed.

  • skip_lines: Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled when the format is csv_with_names or csv_with_names_and_types.

Examples

Analyze the log file on specified BE:

  1. mysql> select * from local(
  2. "file_path" = "log/be.out",
  3. "backend_id" = "10006",
  4. "format" = "csv")
  5. where c1 like "%start_time%" limit 10;
  6. +--------------------------------------------------------+
  7. | c1 |
  8. +--------------------------------------------------------+
  9. | start time: 2023 08 07 星期一 23:20:32 CST |
  10. | start time: 2023 08 07 星期一 23:32:10 CST |
  11. | start time: 2023 08 08 星期二 00:20:50 CST |
  12. | start time: 2023 08 08 星期二 00:29:15 CST |
  13. +--------------------------------------------------------+

Read and access csv format files located at path ${DORIS_HOME}/student.csv:

  1. mysql> select * from local(
  2. "file_path" = "student.csv",
  3. "backend_id" = "10003",
  4. "format" = "csv");
  5. +------+---------+--------+
  6. | c1 | c2 | c3 |
  7. +------+---------+--------+
  8. | 1 | alice | 18 |
  9. | 2 | bob | 20 |
  10. | 3 | jack | 24 |
  11. | 4 | jackson | 19 |
  12. | 5 | liming | d18 |
  13. +------+---------+--------+

Can be used with desc function :

  1. mysql> desc function local(
  2. "file_path" = "student.csv",
  3. "backend_id" = "10003",
  4. "format" = "csv");
  5. +-------+------+------+-------+---------+-------+
  6. | Field | Type | Null | Key | Default | Extra |
  7. +-------+------+------+-------+---------+-------+
  8. | c1 | TEXT | Yes | false | NULL | NONE |
  9. | c2 | TEXT | Yes | false | NULL | NONE |
  10. | c3 | TEXT | Yes | false | NULL | NONE |
  11. +-------+------+------+-------+---------+-------+

Keywords

  1. local, table-valued-function, tvf

Best Practice

For more detailed usage of local tvf, please refer to S3 tvf, The only difference between them is the way of accessing the storage system.