Impala Type Conversion Functions
Conversion functions are typically used in combination with other functions to explicitly pass the expected data types. Impala has strict rules regarding data types for function parameters. For example, Impala does not automatically convert a DOUBLE
value to FLOAT
, a BIGINT
value to INT
, or other conversion where precision could be lost or overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts, you might need to convert values to or from the STRING
type.
Function reference:
Impala supports the following type conversion functions:
CAST(expression AS type)
Purpose: Returns expression converted to the type data type.
If the expression value is of a type that cannot be converted to the target type:
- Of
DECIMAL
,DATE
, andBOOLEAN
, the function returns an error. - Of all other types, the function returns
NULL
.
Usage notes:
Use CAST
when passing a column value or literal to a function that expects a parameter with a different type. Frequently used in SQL operations such as CREATE TABLE AS SELECT
and INSERT ... VALUES
to ensure that values from various sources are of the appropriate type for the destination columns.
Where practical, do a one-time CAST()
operation during the ingestion process to make each column into the appropriate type, rather than using many CAST()
operations in each query; doing type conversions for each row during each query can be expensive for tables with millions or billions of rows.
The way this function deals with time zones when converting to or from TIMESTAMP
values is affected by the ‑‑use_local_tz_for_unix_timestamp_conversions
startup flag for the impalad daemon. See TIMESTAMP Data Type for details about how Impala handles time zone considerations for the TIMESTAMP
data type.
Related information:
For details of casts from each kind of data type, see the description of the appropriate type: TINYINT Data Type, SMALLINT Data Type, INT Data Type, BIGINT Data Type, FLOAT Data Type, DOUBLE Data Type, DECIMAL Data Type, STRING Data Type, CHAR Data Type, VARCHAR Data Type, TIMESTAMP Data Type, BOOLEAN Data Type
CAST(expression AS type FORMAT pattern)
Purpose: Returns expression converted to the type data type based on the pattern format string. This signature of CAST()
with the FORMAT
clause is supported only for casts between STRING
/ CHAR
/ VARCHAR
types and TIMESTAMP
/ DATE
types.
Note: The patterns allowed in the FORMAT
clause support ISO SQL:2016 standard patterns. Those patterns are not the same format patterns used with the other Impala conversion functions, e.g. TO_TIMESTAMP()
and FROM_TIMESTAMP()
.
The following rules apply to pattern. Any exceptions to these rules are noted in the Details column of the table below.
- pattern is a case-insensitive
STRING
unless noted otherwise in the table below. - If pattern is
NULL
, an empty string, or a number, an error returns. - A fewer digits in expression than specified in the pattern is accepted if a separator is correctly specified in the pattern. For example,
CAST('5-01-2017' AS DATE FORMAT 'MM-dd-yyyy')
returnsDATE'2017-05-01'
. - If fewer number of digits are in expression than specified in the pattern, the current date is used to complete the year pattern. For example,
CAST('19/05' AS DATE FORMAT 'YYYY/MM')
will returnDATE'2019-05-01'
when executed on August 8, 2019.
The following format patterns are supported in the FORMAT
clause.
Pattern | Description | Details |
---|---|---|
YYYY | 4-digit year. | |
YYY | Last 3 digits of a year. | |
YY | Last 2 digits of a year. | |
Y | Last digit of a year | |
RRRR | 4-digit round year | If 1,3 or 4-digit year values are provided in expression, treated as If 2-digit years are provided in expression, treated as For date/time to string conversions, treated as If |
RR | 2-digit round year. | For date/time to string conversions, same as For string to date/time conversions, the first 2 digits of the year in the return value depends on the specified two-digit year and the last two digits of the current year as follows:
If If 1-digit year values are provided in expression, it is treated as |
Q | Quarter of year (1-4) | Not supported in string to date/time conversions. |
MM | Month number | In date/time to string conversions, 1-digit months are prefixed with a zero. |
MONTH
| Full month name | In string to date/time conversions:
In date/time to string conversions:
This pattern token is case sensitive. |
MON
| Abbreviated 3-character month name | This pattern token is case sensitive. |
WW | Week of year (1-53) | Not supported in string to date/time conversions. The 1st week begins on January 1st and ends on January 7th. |
W | Week of month (1-5) | Not supported in string to date/time conversions. 1st week begins on the 1st of the month and ends on the 7th. |
D | Day of week (1-7) | Not supported in string to date/time conversions. In date/time to string conversions, converts to a day number, for example, 1 for Sunday, 2 for Monday. |
DD | Day of month (1-31) | In date/time to string conversions, 1-digit days are prefixed with a zero. |
DDD | Day of year (1-366) | In string to date/time conversions, providing |
DAY
| Day name | Not supported in string to date/time conversions. In date/time to string conversions:
This pattern token is case sensitive. |
DY
| Abbreviated 3-character day name | Not supported in string to date/time conversions. This pattern token is case sensitive. |
HH
| Hour of day (1-12) | In date/time to string conversions, 1-digit hours are prefixed with a zero. If provided hour in expression is not between 1 and 12, returns an error. If no AM/PM is provided in expression, the default is |
HH24 | Hour of day (0-23) | In string to date/time conversions, if |
MI | Minute of hour (0-59) | In date/time to string conversions, 1-digit minutes are prefixed with a zero. |
SS | Second of minute (0-59) | In date/time to string conversions, 1-digit seconds are prefixed with a zero. |
SSSSS | Second of Day (0-86399) | In string to timestamp conversions, if |
FF
| Fractional second | A number, 1 to 9, can be used to indicate the number of digits.
|
AM
| Meridiem indicator | For date/time to string conversions, For string to date/time conversion, |
TZH | Timezone offset hour | An optional sign, + or -, and 2 digits for the value of signed numbers are allowed for the source expression, e.g. The allowed values are from |
TZM | Timezone offset minute | Unsigned numbers are allowed for the source expression. The allowed values are from |
-
<space> | Separator | For string to date/time conversions, any separator character in the pattern string would match any separator character in the input expression. For example, |
T | Separates the date from the time. | This pattern is used for accepting ISO 8601 date/time formats. Example: |
Z | Indicates the zero hour offset from UTC. | This pattern is used for accepting ISO 8601 date/time formats. |
FX | Forces an exact match between the format pattern, pattern, and the input argument, expression. | Must be specified at the beginning of the format pattern and is valid for the whole pattern. In string to date/time conversions:
In date/time to string conversions, the outputs are padded as follows:
|
FM | Overrides FX . | In a date/time to string conversions, In string to date/time conversions, FM is used to override the effect of FX for certain tokens as follows:
|
Free text | Nested string | You can specify a free text with surrounding double quotes (“) in pattern where the same text is expected in the input argument without surrounding double quotes. In date/time to string conversions, the string is included in the output with the characters’ case preserved. In string to date/time conversions, the nested string is not included in the resulting date/time object. However, the nested string has to match the related part of the input string, including the case. The following rules apply:
|
Examples:
Input | Output |
---|---|
| 2014-11-02 |
| 2014-12-31 |
| Executed at 2019-01-01 11:11:11: 2015-01-26 |
| 2018-11-10 15:11:04 |
| Executed at 2019-01-01 11:11:11: 2095-01-28 |
| Round year when last 2 digits of current year is greater than 49. Executed at 2099-01-01 11:11:11: 2149-01-15 |
| 2019-10-10 13:30:40.123456000 |
| 2001-03-01 03:10:15.123456000 |
| NULL because the separator between the year and month do not match. |
| NULL because the length of the month token does not match the pattern. |
| 2001-03-11 03:15:00.123450000 The multiple |
| 1:01:1 |
| 2019-11-10 |
TYPEOF(expression)
Purpose: Returns the name of the data type corresponding to expression. For types with extra attributes, such as length for CHAR
and VARCHAR
, or precision and scale for DECIMAL
, includes the full specification of the type.
Return type: STRING
Usage notes: Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as CREATE TABLE
statements, for example, to get the type of an expression such as col1 / col2
or CONCAT(col1, col2, col3)
. This function is especially useful for arithmetic expressions involving DECIMAL
types because the precision and scale of the result is can be different than that of the operands.
Examples:
TYPEOF(2)
returns TINYINT
.
TYPEOF(NOW())
returns TIMESTAMP
.
Parent topic: Impala Built-In Functions