STRUCT Complex Type (Impala 2.3 or higher only)
A complex data type, representing multiple fields of a single item. Frequently used as the element type of an ARRAY
or the VALUE
part of a MAP
.
Syntax:
column_name STRUCT < name : type [COMMENT 'comment_string'], ... >
type ::= primitive_type | complex_type
The names and number of fields within the STRUCT
are fixed. Each field can be a different type. A field within a STRUCT
can also be another STRUCT
, or an ARRAY
or a MAP
, allowing you to create nested data structures with a maximum nesting depth of 100.
A STRUCT
can be the top-level type for a column, or can itself be an item within an ARRAY
or the value part of the key-value pair in a MAP
.
When a STRUCT
is used as an ARRAY
element or a MAP
value, you use a join clause to bring the ARRAY
or MAP
elements into the result set, and then refer to array_name.ITEM.field
or map_name.VALUE.field
. In the case of a STRUCT
directly inside an ARRAY
or MAP
, you can omit the .ITEM
and .VALUE
pseudocolumns and refer directly to array_name.field
or map_name.field
.
Usage notes:
Because complex types are often used in combination, for example an ARRAY
of STRUCT
elements, if you are unfamiliar with the Impala complex types, start with Complex Types (Impala 2.3 or higher only) for background information and usage examples.
A STRUCT
is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an ARRAY
of STRUCT
elements.
Because a STRUCT
has a fixed number of named fields, it typically does not make sense to have a STRUCT
as the type of a table column. In such a case, you could just make each field of the STRUCT
into a separate column of the table. The STRUCT
type is most useful as an item of an ARRAY
or the value part of the key-value pair in a MAP
. A nested type column with a STRUCT
at the lowest level lets you associate a variable number of row-like objects with each row of the table.
The STRUCT
type is straightforward to reference within a query. You do not need to include the STRUCT
column in a join clause or give it a table alias, as is required for the ARRAY
and MAP
types. You refer to the individual fields using dot notation, such as struct_column_name.field_name
, without any pseudocolumn such as ITEM
or VALUE
.
You can pass a multi-part qualified name to DESCRIBE
to specify an ARRAY
, STRUCT
, or MAP
column and visualize its structure as if it were a table. For example, if table T1
contains an ARRAY
column A1
, you could issue the statement DESCRIBE t1.a1
. If table T1
contained a STRUCT
column S1
, and a field F1
within the STRUCT
was a MAP
, you could issue the statement DESCRIBE t1.s1.f1
. An ARRAY
is shown as a two-column table, with ITEM
and POS
columns. A STRUCT
is shown as a table with each field representing a column in the table. A MAP
is shown as a two-column table, with KEY
and VALUE
columns.
Internal details:
Within the Parquet data file, the values for each STRUCT
field are stored adjacent to each other, so that they can be encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even when the STRUCT
values are part of an ARRAY
or MAP
. During a query, Impala avoids unnecessary I/O by reading only the portions of the Parquet data file containing the requested STRUCT
fields.
Added in: Impala 2.3.0
Restrictions:
Columns with this data type can only be used in tables or partitions with the Parquet or ORC file format.
Columns with this data type cannot be used as partition key columns in a partitioned table.
The
COMPUTE STATS
statement does not produce any statistics for columns of this data type.The maximum length of the column definition for any complex type, including declarations for any nested types, is 4000 characters.
See Limitations and Restrictions for Complex Types for a full list of limitations and associated guidelines about complex type columns.
Kudu considerations:
Currently, the data types CHAR
, ARRAY
, MAP
, and STRUCT
cannot be used with Kudu tables.
Examples:
Note: Many of the complex type examples refer to tables such as CUSTOMER
and REGION
adapted from the tables used in the TPC-H benchmark. See Sample Schema and Data for Experimenting with Impala Complex Types for the table definitions.
The following example shows a table with various kinds of STRUCT
columns, both at the top level and nested within other complex types. Practice the CREATE TABLE
and query notation for complex type columns using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
CREATE TABLE struct_demo
(
id BIGINT,
name STRING,
-- A STRUCT as a top-level column. Demonstrates how the table ID column
-- and the ID field within the STRUCT can coexist without a name conflict.
employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,
-- A STRUCT as the element type of an ARRAY.
places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,
-- A STRUCT as the value portion of the key-value pairs in a MAP.
memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,
-- A STRUCT where one of the fields is another STRUCT.
current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
)
STORED AS PARQUET;
The following example shows how to examine the structure of a table containing one or more STRUCT
columns by using the DESCRIBE
statement. You can visualize each STRUCT
as its own table, with columns named the same as each field of the STRUCT
. If the STRUCT
is nested inside another complex type, such as ARRAY
, you can extend the qualified name passed to DESCRIBE
until the output shows just the STRUCT
fields.
DESCRIBE struct_demo;
+-------------------+--------------------------+
| name | type |
+-------------------+--------------------------+
| id | bigint |
| name | string |
| employee_info | struct< |
| | employer:string, |
| | id:bigint, |
| | address:string |
| | > |
| places_lived | array<struct< |
| | street:string, |
| | city:string, |
| | country:string |
| | >> |
| memorable_moments | map<string,struct< |
| | year:int, |
| | place:string, |
| | details:string |
| | >> |
| current_address | struct< |
| | street_address:struct< |
| | street_number:int, |
| | street_name:string, |
| | street_type:string |
| | >, |
| | country:string, |
| | postal_code:string |
| | > |
+-------------------+--------------------------+
The top-level column EMPLOYEE_INFO
is a STRUCT
. Describing table_name.struct_name
displays the fields of the STRUCT
as if they were columns of a table:
DESCRIBE struct_demo.employee_info;
+----------+--------+
| name | type |
+----------+--------+
| employer | string |
| id | bigint |
| address | string |
+----------+--------+
Because PLACES_LIVED
is a STRUCT
inside an ARRAY
, the initial DESCRIBE
shows the structure of the ARRAY
:
DESCRIBE struct_demo.places_lived;
+------+------------------+
| name | type |
+------+------------------+
| item | struct< |
| | street:string, |
| | city:string, |
| | country:string |
| | > |
| pos | bigint |
+------+------------------+
Ask for the details of the ITEM
field of the ARRAY
to see just the layout of the STRUCT
:
DESCRIBE struct_demo.places_lived.item;
+---------+--------+
| name | type |
+---------+--------+
| street | string |
| city | string |
| country | string |
+---------+--------+
Likewise, MEMORABLE_MOMENTS
has a STRUCT
inside a MAP
, which requires an extra level of qualified name to see just the STRUCT
part:
DESCRIBE struct_demo.memorable_moments;
+-------+------------------+
| name | type |
+-------+------------------+
| key | string |
| value | struct< |
| | year:int, |
| | place:string, |
| | details:string |
| | > |
+-------+------------------+
For a MAP
, ask to see the VALUE
field to see the corresponding STRUCT
fields in a table-like structure:
DESCRIBE struct_demo.memorable_moments.value;
+---------+--------+
| name | type |
+---------+--------+
| year | int |
| place | string |
| details | string |
+---------+--------+
For a STRUCT
inside a STRUCT
, we can see the fields of the outer STRUCT
:
DESCRIBE struct_demo.current_address;
+----------------+-----------------------+
| name | type |
+----------------+-----------------------+
| street_address | struct< |
| | street_number:int, |
| | street_name:string, |
| | street_type:string |
| | > |
| country | string |
| postal_code | string |
+----------------+-----------------------+
Then we can use a further qualified name to see just the fields of the inner STRUCT
:
DESCRIBE struct_demo.current_address.street_address;
+---------------+--------+
| name | type |
+---------------+--------+
| street_number | int |
| street_name | string |
| street_type | string |
+---------------+--------+
The following example shows how to examine the structure of a table containing one or more STRUCT
columns by using the DESCRIBE
statement. You can visualize each STRUCT
as its own table, with columns named the same as each field of the STRUCT
. If the STRUCT
is nested inside another complex type, such as ARRAY
, you can extend the qualified name passed to DESCRIBE
until the output shows just the STRUCT
fields.
DESCRIBE struct_demo;
+-------------------+--------------------------+---------+
| name | type | comment |
+-------------------+--------------------------+---------+
| id | bigint | |
| name | string | |
| employee_info | struct< | |
| | employer:string, | |
| | id:bigint, | |
| | address:string | |
| | > | |
| places_lived | array<struct< | |
| | street:string, | |
| | city:string, | |
| | country:string | |
| | >> | |
| memorable_moments | map<string,struct< | |
| | year:int, | |
| | place:string, | |
| | details:string | |
| | >> | |
| current_address | struct< | |
| | street_address:struct< | |
| | street_number:int, | |
| | street_name:string, | |
| | street_type:string | |
| | >, | |
| | country:string, | |
| | postal_code:string | |
| | > | |
+-------------------+--------------------------+---------+
SELECT id, employee_info.id FROM struct_demo;
SELECT id, employee_info.id AS employee_id FROM struct_demo;
SELECT id, employee_info.id AS employee_id, employee_info.employer
FROM struct_demo;
SELECT id, name, street, city, country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, pl.pos, pl.street, pl.city, pl.country
FROM struct_demo, struct_demo.places_lived AS pl;
SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, pos, street, city, country
FROM struct_demo, struct_demo.places_lived;
SELECT id, name, memorable_moments.key,
memorable_moments.value.year,
memorable_moments.value.place,
memorable_moments.value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details
FROM struct_demo, struct_demo.memorable_moments AS mm
WHERE mm.key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, memorable_moments.key, memorable_moments.value.year,
memorable_moments.value.place, memorable_moments.value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, key, value.year, value.place, value.details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name, key, year, place, details
FROM struct_demo, struct_demo.memorable_moments
WHERE key IN ('Birthday','Anniversary','Graduation');
SELECT id, name,
current_address.street_address.street_number,
current_address.street_address.street_name,
current_address.street_address.street_type,
current_address.country,
current_address.postal_code
FROM struct_demo;
For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work, mobile, and so on kinds of phone numbers.
CREATE TABLE contact_info_many_structs
(
id BIGINT, name STRING,
phone_numbers ARRAY < STRUCT <category:STRING, country_code:STRING, area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING > >
) STORED AS PARQUET;
Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose things such as addresses:
CREATE TABLE contact_info_detailed_address
(
id BIGINT, name STRING,
address STRUCT < house_number:INT, street:STRING, street_type:STRING, apartment:STRING, city:STRING, region:STRING, country:STRING >
);
In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis on each field independently. For example, which streets have the largest number range of addresses, what are the statistical properties of the street names, which areas have a higher proportion of “Roads”, “Courts” or “Boulevards”, and so on.
Related information:
Complex Types (Impala 2.3 or higher only), ARRAY Complex Type (Impala 2.3 or higher only), MAP Complex Type (Impala 2.3 or higher only)
Parent topic: Data Types