HBase Connector
Overview
The HBase Connector allows querying and creating tables on an external Apache HBase instance. Users can create a table in HBase connector, mapping it to an existing table in HBase Cluster, and support insert, select, or delete.
The HBase Connector maintains a Metastore to persist HBase metadata, currently only following storage format are supported as Metastore: Local File System
, Hadoop Distributed File System (HDFS)
, and openLooKeng Metastore
.
Note: Apache HBase 1.3.1 version is supported by HBase connector
Connector Configuration
Create etc/catalog/hbase.properties
to mount the HBase connector as the HBase catalog, replacing the hbase.xxx
properties as required:
connector.name=hbase-connector
hbase.zookeeper.quorum=xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx
hbase.zookeeper.property.clientPort=xxxx
hbase.metastore.type=local
hbase.metastore.uri=/xxx/hbasemetastore.ini
For the value of hbase.zookeeper.quorum
, please use comma (,
) as the delimiter if it has multiple ip addresses.
Note
If you are using Local File System
, HDFS
as Metastore, you need to create an empty file , such as hbasemetastore.ini
.
Use HDFS to store HBase metadata
You need to add below properties:
hbase.core.site.path=/xxx/core-site.xml
hbase.hdfs.site.path=/xxx/hdfs-site.xml
hbase.metastore.type=hdfs
hbase.metastore.uri=hdfs://xxx.xxx.xxx.xxx:21088/xxx/hbasemetastore.ini
Use openLooKeng Metastore to store HBase metadata
You have to create etc/hetu-metastore.properties
to connect database. For the details of configuration, please refer to VDM Connector. Adding below property to the configuration file:
hbase.metastore.type=hetuMetastore
Kerberos Configuration:
hbase.jaas.conf.path=/xxx/jaas.conf
hbase.hbase.site.path=/xxx/hbase-site.xml
hbase.krb5.conf.path=/xxx/krb5.conf
hbase.kerberos.keytab=/xxx/user.keytab
hbase.kerberos.principal=xxx
hbase.authentication.type=KERBEROS
Configuration Properties
Property Name | Default Value | Required | Description |
---|---|---|---|
hbase.zookeeper.quorum | (none) | Yes | Zookeeper cluster address |
hbase.zookeeper.property.clientPort | (none) | Yes | Zookeeper client port |
hbase.client.retries.number | 3 | No | Retry times to connect to hbase client |
hbase.client.pause.time | 100 | No | HBase client disconnect time |
hbase.rpc.protection.enable | false | No | Communication privacy protection. You can get this from hbase-site.xml . |
hbase.default.value | NULL | No | The default value of data in table |
hbase.metastore.type | local | No | The storage of hbase metadata, you can choose one of local/hdfs/hetuMetastore |
hbase.metastore.uri | (none) | Yes | File path for storing hbase metadata |
hbase.core.site.path | (none) | No | Configuration file for connecting hdfs |
hbase.hdfs.site.path | (none) | No | Configuration file for connecting hdfs |
hbase.authentication.type | (none) | No | Access security authentication mode of hdfs/hbase component |
hbase.kerberos.principal | (none) | No | User name for security authentication |
hbase.kerberos.keytab | (none) | No | Key for security authentication |
hbase.hbase.site.path | (none) | No | Configuration used to connect to a secure hbase cluster |
hbase.jaas.conf.path | (none) | No | Jaas for security authentication |
hbase.krb5.conf.path | (none) | No | Krb5 for security authentication |
Table Properties
Property | Type | Default Value | Required | Description |
---|---|---|---|---|
column_mapping | String | All columns in the same family | No | Specify the mapping relationship between column families and column qualifiers in the table. If you need to link a table in the hbase data source, the column_mapping information must be consistent with the hbase data source; if you create a new table that does not exist on the hbase data source, the column_mapping is specified by the user. |
row_id | String | The first column name | No | row_id is the column name corresponding to rowkey in the hbase table |
hbase_table_name | String | null | No | hbase_table_name specifies the tablespace and table name on the hbase data source to be linked, use “:” connect tablespace and table name, the default tablespace is “default”. |
external | Boolean | true | No | If external is true, it means that the table is a mapping table of the table in the hbase data source. It does not support deleting the original table on the hbase data source; if external is false, the table on hbase data source will be deleted at the same time as the local hbase table is deleted. |
Data Types
Currently, following 10 data type are supported: VARCHAR, TINYINT, SMALLINT, INTEGER, BIGINT, DOUBLE, BOOLEAN, TIME, DATE and TIMESTAMP.
Push Down
The HBase Connector supports push down most of operators, such as rowkey-based point query, rowkey-based range query. Besides, those predicate conditions are supported to push down: =
, >=
, >
, <
, <=
, !=
, in
, not in
, is null
, is not null
, between and
.
Usage Examples
The HBase Connector basically supports all SQL statements, includes creating, querying, and deleting schemas; Adding, dropping, and modifying tables; Inserting data, deleting rows and so on.
Here are some example:
Create Schema
CREATE SCHEMA schemaName;
Drop Schema
It can only drop empty schema.
DROP SCHEMA schemaName;
Create Table
HBase Connector supports two forms of table creation:
Create a table and directly link to a existing table in the HBase data source.
Create a new table that does not exist in the HBase data source.
Below is an example of how to create a table schemaName.tableName
and link it to an existing table named hbaseNamespace:hbaseTable
:
CREATE TABLE schemaName.tableName (
rowId VARCHAR,
qualifier1 TINYINT,
qualifier2 SMALLINT,
qualifier3 INTEGER,
qualifier4 BIGINT,
qualifier5 DOUBLE,
qualifier6 BOOLEAN,
qualifier7 TIME,
qualifier8 DATE,
qualifier9 TIMESTAMP
)
WITH (
column_mapping = 'rowId:f:rowId, qualifier1:f1:q1, qualifier2:f1:q2,
qualifier3:f2:q3, qualifier4:f2:q4, qualifier5:f2:q5, qualifier6:f3:q1,
qualifier7:f3:q2, qualifier8:f3:q3, qualifier9:f3:q4',
row_id = 'rowId',
hbase_table_name = 'hbaseNamespace:hbaseTable',
external = false
);
Note
If user specifies hbase_table_name
, then the schemaName
must be same as the namespace in hbase_table_name
when creating a table. Otherwise, the creation will be failed.
Example (failure to create table):
CREATE TABLE default.typeMapping (
rowId VARCHAR,
qualifier1 INTEGER
)
WITH (
column_mapping = 'rowId:f:rowId, qualifier2:f1:q2',
row_id = 'rowId',
hbase_table_name = 'hello:type4'
);
When creating a table to link with HBase cluster, you must ensure that the schema name in HBase connector and HBase data source are the same and achieve the same isolation level.
Drop Table
DROP TABLE schemaName.tableName;
Rename Table
ALTER TABLE schemaName.tableName RENAME TO schemaName.tableNameNew;
Note
Currently, it does not support renaming the tables which are created through HBase.
Add Column
ALTER TABLE schemaName.tableName
ADD COLUMN column_name INTEGER
WITH (family = 'f1', qualifier = 'q1');
Note
Currently, deleting columns is not supported
Delete row/rows
DELETE FROM schemeName.tableName where rowId = xxx;
DELETE FROM schemeName.tableName where qualifier1 = xxx;
DELETE FROM schemeName.tableName;
Limitations
Statement show tables
can only display the tables that the user has established the associations with HBase data source. Because HBase does not provide an interface to retrieve metadata of tables.