Using PXF with Unmanaged Data
HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ to query external system data.
PXF includes built-in connectors for accessing data inside HDFS files, Hive tables, and HBase tables. PXF also integrates with HCatalog to query Hive tables directly.
PXF allows users to create custom connectors to access other parallel data stores or processing engines. To create these connectors using Java plug-ins, see the PXF External Tables and API.
-
This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, and HBase. You should install the appropriate RPMs on each node in your cluster.
-
This topic describes how to configure the PXF service.
-
This topic describes how to access HDFS file data using PXF.
-
This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF’s integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog.
-
This topic describes how to access HBase data using PXF.
-
This topic describes how to access JSON data using PXF.
Accessing External SQL Databases
This topic describes how to access data in external SQL databases using PXF.
-
This topic describes how to write to HDFS using PXF.
Using Profiles to Read and Write Data
PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.
-
You can use the PXF API to create your own connectors to access any other type of parallel data store or processing engine.