11.147. Release 0.94
ORC Memory Usage
This release contains additional changes to the Presto ORC reader to favorsmall buffers when reading varchar and varbinary data. Some ORC files containcolumns of data that are hundreds of megabytes compressed. When reading thesecolumns, Presto would allocate a single buffer for the compressed column data,and this would cause heap fragmentation in CMS and G1 and eventually OOMs.In this release, the hive.orc.max-buffer-size
sets the maximum size for asingle ORC buffer, and for larger columns we instead stream the data. Thisreduces heap fragmentation and excessive buffers in ORC at the expense ofHDFS IOPS. The default value is 8MB
.
General Changes
- Update Hive CDH 4 connector to CDH 4.7.1
- Fix
ORDER BY
withLIMIT 0
- Fix compilation of
try_cast
- Group threads into Java thread groups to ease debugging
- Add
task.min-drivers
config to help limit number of concurrent readers