12.167. Release 0.80
New Hive ORC Reader
We have added a new ORC reader implementation. The new reader supports vectorizedreads, lazy loading, and predicate push down, all of which make the reader moreefficient and typically reduces wall clock time for a query. Although the newreader has been heavily tested, it is an extensive rewrite of the Apache HiveORC reader, and may have some latent issues. If you are seeing issues, you candisable the new reader on a per-query basis by setting the<hive-catalog>.optimized_reader_enabled
session property, or you can disablethe reader by default by setting the Hive catalog propertyhive.optimized-reader.enabled=false
.
Hive Changes
- The maximum retry time for the Hive S3 file system can be configuredby setting
hive.s3.max-retry-time
. - Fix Hive partition pruning for null keys (i.e.
HIVE_DEFAULT_PARTITION
).
Cassandra Changes
- Update Cassandra driver to 2.1.0.
- Map Cassandra
TIMESTAMP
type to PrestoTIMESTAMP
type.
“Big Query” Support
We’ve added experimental support for “big” queries. This provides a separatequeue controlled by the following properties:
experimental.max-concurrent-big-queries
experimental.max-queued-big-queries
experimental_big_query
session property:
experimental.big-query-initial-hash-partitions
experimental.big-query-max-task-memory
max()
, min()
, DISTINCT
aggregates) to executeagainst table metadata.
For example, if key
, key1
and key2
are partition keys, the following querieswill benefit:
This optimization is turned off by default. To turn it on, add
- SELECT min(key), max(key) FROM t;
- SELECT DISTINCT key FROM t;
- SELECT count(DISTINCT key) FROM t;
- SELECT count(DISTINCT key + 5) FROM t;
- SELECT count(DISTINCT key) FROM (SELECT key FROM t ORDER BY 1 LIMIT 10);
- SELECT key1, count(DISTINCT key2) FROM t GROUP BY 1;
optimizer.optimize-metadata-queries=true
to the coordinator config properties.
Warning
This optimization will cause queries to produce incorrect results ifthe connector allows partitions to contain no data. For example, theHive connector will produce incorrect results if your Hive warehousecontains partitions without data.
## General Changes
-
Add support implicit joins. The following syntax is now allowed:
- SELECT * FROM a, b WHERE a.id = b.id;
-
Add property
task.verbose-stats
to enable verbose statistics collection fortasks. The default isfalse
. - Format binary data in the CLI as a hex dump.
-
Add approximate numeric histogram function
numeric_histogram()
. -
Add
array_sort()
function. -
Add
map_keys()
andmap_values()
functions. -
Make
row_number()
completely streaming. -
Add property
task.max-partial-aggregation-memory
to configure the memory limitfor the partial step of aggregations. -
Fix exception when processing queries with an
UNNEST
operation where the output was not used. - Only show query progress in UI after the query has been fully scheduled.
- Add query execution visualization to the coordinator UI. It can be accessed via the query details page.