File Metadata
Description
The File Metadata transform scans a file to determine its metadata structure or layout.
Use this transforms in situations where you need to read a structured text file (e.g. CSV, TSV) when you don’t know the exact layout in advance.
The information provided in this file can be used to actually read the file later, e.g. through metadata injection.
The layout detected depends on the number of rows scanned.
For example, if the first 100 rows of a file are scanned and the first field is detected as an integer, there is a possibility this field contains alphanumerical characters in later rows. Using 0 rows for ‘limit scanned rows’ is a way to make sure the entire file is scanned and the layout is detected correctly, even though this may be time consuming or even impossible for large files.
Options
Option | Description |
---|---|
transform name | the name for this transform |
filename | the filename to scan for metadata |
limit scanned rows | the number of rows to limit the scan to (default 10,000). Use 0 rows to scan the entire file. |
fallback charset | charset to use while scanning the file |
delimiter candidates | list of delimiters to try while detecting the file layout. Tab, semicolon, comma are provided by default |
enclosure candidates | list of delimiters to try while detecting the file layout. Double and single quote are provided by default. |
Output Fields
The fields returned by this transform are
charset
delimiter
field_count
skip_header_lines
skip_footer_lines
header_line_present
name
type
length
precision
mask
decimal_symbol
grouping_symbol