Apache HBase External APIs

Apache HBase External APIs

本章包含访问HBase的非Java语言及自定义协议的相关内容。

REST

代表性状态传输（Representational State Transfer ）在2000年由Roy Fielding在其博士论文中提出（Roy Fielding是HTTP的主要作者之一。）

REST本身并不在本文讨论的范围，REST允许客户-服务通过URL绑定在一起的API进行交互。本节讨论如何在HBase上配置并运行REST服务，将HBase表，行，cells以及元数据作为URL中特定的资源。

76.1.Starting and Stopping the REST Server

REST服务可以作为集成的Jetty服务容器中的一部分，其服务部署于Jetty中。可以使用下面的命令在前后台来启动REST服务。端口是可选的，默认为8080。

# Foreground
$ bin/hbase rest start -p <port>
# Background, logging to a file in $HBASE_LOGS_DIR
$ bin/hbase-daemon.sh start rest -p <port>

在前台运行时，使用ctrl-c来停止REST服务，在后台运行时，使用下面命令

$ bin/hbase-daemon.sh stop rest

76.2. 配置REST服务和客户端

For information about configuring the REST server and client for SSL, as well as doAs impersonation for the REST server, see Configure the Thrift Gateway to Authenticate on Behalf of the Client and other portions of the Securing Apache HBase chapter.

76.3. 使用REST Endpoint

Endpoint HTTP Verb Description Example

Table 11. Cluster-Wide Endpoints
Endpoint	HTTP Verb	Description	Example
`/version/cluster`	`GET`	集群上运行的HBase版本	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/version/cluster“
`/status/cluster`	`GET`	集群状态	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/status/cluster“
`/`	`GET`	列出所有非系统表	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/“

/version/cluster

GET

集群上运行的HBase版本

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/version/cluster“

/status/cluster

GET

集群状态

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/status/cluster“

/

GET

列出所有非系统表

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/“

Table 12. Namespace Endpoints
Endpoint	HTTP Verb	Description	Example
`/namespaces`	`GET`	列出所有命名空间	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/namespaces/“
`/namespaces/namespace`	`GET`	描述某个特定的命名空间	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/namespaces/special_ns“
`/namespaces/namespace`	`POST`	创建新的命名空间	curl -vi -X POST \ -H “Accept: text/xml” \ “example.com:8000/namespaces/special_ns”
`/namespaces/namespace/tables`	`GET`	列出指定命名空间中指定的表格	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/namespaces/special_ns/tables“
`/namespaces/namespace`	`PUT`	修改某个已存在的表格。当前不被使用	curl -vi -X PUT \ -H “Accept: text/xml” \ “http://example.com:8000/namespaces/special_ns
`/namespaces/namespace`	`DELETE`	删除一个命名空间，其必须为空	curl -vi -X DELETE \ -H “Accept: text/xml” \ “example.com:8000/namespaces/special_ns”

Table 13. Table Endpoints
Endpoint	HTTP Verb	Description	Example
`/table/schema`	`GET`	描述指定表的结构	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/schema“
`/table/schema`	`POST`	创建一张新表或替换存在的表	curl -vi -X POST \ -H “Accept: text/xml” \ -H “Content-Type: text/xml” \ -d ‘<?xml version=”1.0” encoding=”UTF-8”?><TableSchema name=”users”><ColumnSchema name=”cf” /></TableSchema>’ \ “http://example.com:8000/users/schema“
`/table/schema`	`PUT`	使用提供的表结构来更新现有表	curl -vi -X PUT \ -H “Accept: text/xml” \ -H “Content-Type: text/xml” \ -d ‘<?xml version=”1.0” encoding=”UTF-8”?><TableSchema name=”users”><ColumnSchema name=”cf” KEEP_DELETED_CELLS=”true” /></TableSchema>’ \ “http://example.com:8000/users/schema“
`/table/schema`	`DELETE`	删除表。必须使用 `/table/schema` 这种形式, 不能仅指定`/table/`.	curl -vi -X DELETE \ -H “Accept: text/xml” \ “http://example.com:8000/users/schema“
`/table/regions`	`GET`	列出表区域	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/regions

Endpoint HTTP Verb Description Example

Table 14. Endpoints for `Get` Operations
Endpoint	HTTP Verb	Description	Example
`/table/row/column:qualifier/timestamp`	`GET`	得到某行的值，该值是经过Base-64编码的	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/row1“ curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/row1/cf:a/1458586888395“
`/table/row/column:qualifier`	`GET`	获取某列的值，该值是经过Base-64编码的。	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/row1/cf:a“ curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/row1/cf:a/“
`/table/row/column:qualifier/?v=number_of_versions`	`GET`	获取某个cell指定版本数量的值，该值经过Base-64编码的。	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/row1/cf:a?v=2“

/table/row/column:qualifier/timestamp

GET

得到某行的值，该值是经过Base-64编码的

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/users/row1“
 
curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/users/row1/cf:a/1458586888395“

/table/row/column:qualifier

GET

获取某列的值，该值是经过Base-64编码的。

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/users/row1/cf:a“
 
curl -vi -X GET \
  -H “Accept: text/xml” \
   “http://example.com:8000/users/row1/cf:a/“

/table/row/column:qualifier/?v=number_of_versions

GET

获取某个cell指定版本数量的值，该值经过Base-64编码的。

curl -vi -X GET \
  -H “Accept: text/xml” \
  “http://example.com:8000/users/row1/cf:a?v=2“

Table 15. Endpoints for `Scan` Operations
Endpoint	HTTP Verb	Description	Example
`/table/scanner/`	`PUT`	获取一个扫描对象，其他所有Scan都要使用。调整批量参数，为扫描时应返回的行数。看下下个例子给你的scanner增加过滤器。The scanner endpoint URL is returned as the `Location` in the HTTP response. The other examples in this table assume that the scanner endpoint is `http://example.com:8000/users/scanner/145869072824375522207`.	curl -vi -X PUT \ -H “Accept: text/xml” \ -H “Content-Type: text/xml” \ -d ‘<Scanner batch=”1”/>’ \ “http://example.com:8000/users/scanner/“
`/table/scanner/`	`PUT`	要给Scanner对象提供过滤器或配置Scanner，可以创建一个文本文件并将你的过滤器加到里边。例如，要返回keys start with <codeph>u123</codeph> 的唯一行，而batch size为100, 过滤文件如下: <Scanner batch=”100”> <filter> { “type”: “PrefixFilter”, “value”: “u123” } </filter> </Scanner> Pass the file to the `-d` argument of the `curl` request.	curl -vi -X PUT \ -H “Accept: text/xml” \ -H “Content-Type:text/xml” \ -d @filter.txt \ “http://example.com:8000/users/scanner/“
`/table/scanner/scanner-id`	`GET`	扫描获取下一批. Cell 值是字节编码的. 如果scanner exhausted , HTTP status `204` is returned.	curl -vi -X GET \ -H “Accept: text/xml” \ “http://example.com:8000/users/scanner/145869072824375522207“
`table/scanner/scanner-id`	`DELETE`	删除scanner,释放使用的资源	curl -vi -X DELETE \ -H “Accept: text/xml” \ “http://example.com:8000/users/scanner/145869072824375522207“

Endpoint HTTP Verb Description Example

Table 16. Endpoints for `Put` Operations
Endpoint	HTTP Verb	Description	Example
`/table/row_key`	`PUT`	在表中增加一行。行，列族，以及其值必须是Base-64编码的.要编码一段字符串，使用命令行工具`base64` . 要解码字符串`base64 -d`. 负载在参数 `—data` , and the `/users/fakerow`值为placeholder. 通过将它们加入`<CellSet>`来实现插入多行。也可以将数据保存到文件中并pass it to the `-d` parameter with syntax like `-d @filename.txt`.	curl -vi -X PUT \ -H “Accept: text/xml” \ -H “Content-Type: text/xml” \ -d ‘<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?><CellSet><Row key=”cm93NQo=”><Cell column=”Y2Y6ZQo=”>dmFsdWU1Cg==</Cell></Row></CellSet>’ \ “http://example.com:8000/users/fakerow“ curl -vi -X PUT \ -H “Accept: text/json” \ -H “Content-Type: text/json” \ -d ‘{“Row”:[{“key”:”cm93NQo=”, “Cell”: [{“column”:”Y2Y6ZQo=”, “$”:”dmFsdWU1Cg==”}]}]}’’ \ “example.com:8000/users/fakerow”

/table/row_key

PUT

在表中增加一行。行，列族，以及其值必须是Base-64编码的.要编码一段字符串，使用命令行工具base64 . 要解码字符串base64 -d. 负载在参数 —data , and the /users/fakerow值为placeholder. 通过将它们加入<CellSet>来实现插入多行。也可以将数据保存到文件中并pass it to the -d parameter with syntax like -d @filename.txt.

curl -vi -X PUT \
  -H “Accept: text/xml” \
  -H “Content-Type: text/xml” \
  -d ‘<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?><CellSet><Row key=”cm93NQo=”><Cell column=”Y2Y6ZQo=”>dmFsdWU1Cg==</Cell></Row></CellSet>’ \
  “http://example.com:8000/users/fakerow“
 
curl -vi -X PUT \
  -H “Accept: text/json” \
  -H “Content-Type: text/json” \
  -d ‘{“Row”:[{“key”:”cm93NQo=”, “Cell”: [{“column”:”Y2Y6ZQo=”, “$”:”dmFsdWU1Cg==”}]}]}’’ \
  “example.com:8000/users/fakerow”

76.4. REST XML Schema

<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="RESTSchema">
      <element name="Version" type="tns:Version"></element>
      <complexType name="Version">
        <attribute name="REST" type="string"></attribute>
        <attribute name="JVM" type="string"></attribute>
        <attribute name="OS" type="string"></attribute>
        <attribute name="Server" type="string"></attribute>
        <attribute name="Jersey" type="string"></attribute>
      </complexType>
      <element name="TableList" type="tns:TableList"></element>
      <complexType name="TableList">
        <sequence>
              <element name="table" type="tns:Table" maxOccurs="unbounded" minOccurs="1"></element>
        </sequence>
      </complexType>
    <complexType name="Table">
        <sequence>
              <element name="name" type="string"></element>
        </sequence>
      </complexType>
      <element name="TableInfo" type="tns:TableInfo"></element>
      <complexType name="TableInfo">
        <sequence>
              <element name="region" type="tns:TableRegion" maxOccurs="unbounded" minOccurs="1"></element>
        </sequence>
        <attribute name="name" type="string"></attribute>
      </complexType>
      <complexType name="TableRegion">
        <attribute name="name" type="string"></attribute>
        <attribute name="id" type="int"></attribute>
        <attribute name="startKey" type="base64Binary"></attribute>
        <attribute name="endKey" type="base64Binary"></attribute>
        <attribute name="location" type="string"></attribute>
      </complexType>
      <element name="TableSchema" type="tns:TableSchema"></element>
      <complexType name="TableSchema">
        <sequence>
              <element name="column" type="tns:ColumnSchema" maxOccurs="unbounded" minOccurs="1"></element>
        </sequence>
        <attribute name="name" type="string"></attribute>
        <anyAttribute></anyAttribute>
      </complexType>
      <complexType name="ColumnSchema">
        <attribute name="name" type="string"></attribute>
        <anyAttribute></anyAttribute>
      </complexType>
      <element name="CellSet" type="tns:CellSet"></element>
      <complexType name="CellSet">
        <sequence>
              <element name="row" type="tns:Row" maxOccurs="unbounded" minOccurs="1"></element>
        </sequence>
     </complexType>
      <element name="Row" type="tns:Row"></element>
      <complexType name="Row">
        <sequence>
              <element name="key" type="base64Binary"></element>
              <element name="cell" type="tns:Cell" maxOccurs="unbounded" minOccurs="1"></element>
        </sequence>
      </complexType>
      <element name="Cell" type="tns:Cell"></element>
      <complexType name="Cell">
        <sequence>
              <element name="value" maxOccurs="1" minOccurs="1">
                <simpleType><restriction base="base64Binary">
                </simpleType>
              </element>
        </sequence>
        <attribute name="column" type="base64Binary" />
        <attribute name="timestamp" type="int" />
      </complexType>
      <element name="Scanner" type="tns:Scanner"></element>
      <complexType name="Scanner">
        <sequence>
              <element name="column" type="base64Binary" minOccurs="0" maxOccurs="unbounded"></element>
        </sequence>
        <sequence>
              <element name="filter" type="string" minOccurs="0" maxOccurs="1"></element>
        </sequence>
        <attribute name="startRow" type="base64Binary"></attribute>
        <attribute name="endRow" type="base64Binary"></attribute>
        <attribute name="batch" type="int"></attribute>
        <attribute name="startTime" type="int"></attribute>
        <attribute name="endTime" type="int"></attribute>
      </complexType>
      <element name="StorageClusterVersion" type="tns:StorageClusterVersion" />
      <complexType name="StorageClusterVersion">
        <attribute name="version" type="string"></attribute>
      </complexType>
      <element name="StorageClusterStatus"
        type="tns:StorageClusterStatus">
      </element>
      <complexType name="StorageClusterStatus">
           <sequence>
              <element name="liveNode" type="tns:Node"
                maxOccurs="unbounded" minOccurs="0">
              </element>
              <element name="deadNode" type="string" maxOccurs="unbounded"
                minOccurs="0">
              </element>
        </sequence>
        <attribute name="regions" type="int"></attribute>
        <attribute name="requests" type="int"></attribute>
        <attribute name="averageLoad" type="float"></attribute>
      </complexType>
      <complexType name="Node">
        <sequence>
              <element name="region" type="tns:Region"
                   maxOccurs="unbounded" minOccurs="0">
              </element>
        </sequence>
        <attribute name="name" type="string"></attribute>
        <attribute name="startCode" type="int"></attribute>
        <attribute name="requests" type="int"></attribute>
        <attribute name="heapSizeMB" type="int"></attribute>
        <attribute name="maxHeapSizeMB" type="int"></attribute>
      </complexType>
      <complexType name="Region">
        <attribute name="name" type="base64Binary"></attribute>
        <attribute name="stores" type="int"></attribute>
        <attribute name="storefiles" type="int"></attribute>
        <attribute name="storefileSizeMB" type="int"></attribute>
        <attribute name="memstoreSizeMB" type="int"></attribute>
        <attribute name="storefileIndexSizeMB" type="int"></attribute>
      </complexType>
</schema>

76.5. REST Protobufs Schema

message Version {
      optional string restVersion = 1;
      optional string jvmVersion = 2;
      optional string osVersion = 3;
      optional string serverVersion = 4;
      optional string jerseyVersion = 5;
}
message StorageClusterStatus {
      message Region {
        required bytes name = 1;
        optional int32 stores = 2;
        optional int32 storefiles = 3;
        optional int32 storefileSizeMB = 4;
        optional int32 memstoreSizeMB = 5;
        optional int32 storefileIndexSizeMB = 6;
      }
      message Node {
        required string name = 1;    // name:port
        optional int64 startCode = 2;
        optional int32 requests = 3;
        optional int32 heapSizeMB = 4;
        optional int32 maxHeapSizeMB = 5;
        repeated Region regions = 6;
    }
      // node status
      repeated Node liveNodes = 1;
      repeated string deadNodes = 2;
      // summary statistics
      optional int32 regions = 3;
      optional int32 requests = 4;
      optional double averageLoad = 5;
}
message TableList {
      repeated string name = 1;
}    
message TableInfo {
      required string name = 1;
      message Region {
        required string name = 1;
        optional bytes startKey = 2;
        optional bytes endKey = 3;
        optional int64 id = 4;
        optional string location = 5;
      }
      repeated Region regions = 2;
}
message TableSchema {
      optional string name = 1;
      message Attribute {
        required string name = 1;
        required string value = 2;
      }
      repeated Attribute attrs = 2;
      repeated ColumnSchema columns = 3;
      // optional helpful encodings of commonly used attributes
      optional bool inMemory = 4;
      optional bool readOnly = 5;
}
message ColumnSchema {
      optional string name = 1;
      message Attribute {
        required string name = 1;
        required string value = 2;
      }
      repeated Attribute attrs = 2;
      // optional helpful encodings of commonly used attributes
      optional int32 ttl = 3;
      optional int32 maxVersions = 4;
      optional string compression = 5;
}
message Cell {
      optional bytes row = 1;       // unused if Cell is in a CellSet
      optional bytes column = 2;
      optional int64 timestamp = 3;
      optional bytes data = 4;
}
message CellSet {
      message Row {
        required bytes key = 1;
        repeated Cell values = 2;
      }
      repeated Row rows = 1;
}
message Scanner {
      optional bytes startRow = 1;
      optional bytes endRow = 2;
      repeated bytes columns = 3;
      optional int32 batch = 4;
      optional int64 startTime = 5;
      optional int64 endTime = 6;
}

77. Thrift

Documentation about Thrift has moved to Thrift API and Filter Language.

78. C/C++ Apache HBase Client

FB’s Chip Turner wrote a pure C/C++ client. Check it out.

79. Using Java Data Objects (JDO) with HBase

Example 41. JDO Example

This example uses JDO to create a table and an index, insert a row into a table, get a row, get a column value, perform a query, and do some additional HBase operations.

package com.apache.hadoop.hbase.client.jdo.examples;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Hashtable;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.tableindexed.IndexedTable;
import com.apache.hadoop.hbase.client.jdo.AbstractHBaseDBO;
import com.apache.hadoop.hbase.client.jdo.HBaseBigFile;
import com.apache.hadoop.hbase.client.jdo.HBaseDBOImpl;
import com.apache.hadoop.hbase.client.jdo.query.DeleteQuery;
import com.apache.hadoop.hbase.client.jdo.query.HBaseOrder;
import com.apache.hadoop.hbase.client.jdo.query.HBaseParam;
import com.apache.hadoop.hbase.client.jdo.query.InsertQuery;
import com.apache.hadoop.hbase.client.jdo.query.QSearch;
import com.apache.hadoop.hbase.client.jdo.query.SelectQuery;
import com.apache.hadoop.hbase.client.jdo.query.UpdateQuery;
/*
  Hbase JDO Example.
 
  dependency library.
  - commons-beanutils.jar
  - commons-pool-1.5.5.jar
  - hbase0.90.0-transactionl.jar
 
  you can expand Delete,Select,Update,Insert Query classes.
 
 /
public class HBaseExample {
  public static void main(String[] args) throws Exception {
    AbstractHBaseDBO dbo = new HBaseDBOImpl();
    //drop if table is already exist.
    if(dbo.isTableExist("user")){
     dbo.deleteTable("user");
    }
    //create table*
    dbo.createTableIfNotExist("user",HBaseOrder.DESC,"account");
    //dbo.createTableIfNotExist("user",HBaseOrder.ASC,"account");
    //create index.
    String[] cols={"id","name"};
    dbo.addIndexExistingTable("user","account",cols);
    //insert
    InsertQuery insert = dbo.createInsertQuery("user");
    UserBean bean = new UserBean();
    bean.setFamily("account");
    bean.setAge(20);
    bean.setEmail("ncanis@gmail.com");
    bean.setId("ncanis");
    bean.setName("ncanis");
    bean.setPassword("1111");
    insert.insert(bean);
    //select 1 row
    SelectQuery select = dbo.createSelectQuery("user");
    UserBean resultBean = (UserBean)select.select(bean.getRow(),UserBean.class);
    // select column value.
    String value = (String)select.selectColumn(bean.getRow(),"account","id",String.class);
    // search with option (QSearch has EQUAL, NOT_EQUAL, LIKE)
    // select id,password,name,email from account where id=’ncanis’ limit startRow,20
    HBaseParam param = new HBaseParam();
    param.setPage(bean.getRow(),20);
    param.addColumn("id","password","name","email");
    param.addSearchOption("id","ncanis",QSearch.EQUAL);
    select.search("account", param, UserBean.class);
    // search column value is existing.
    boolean isExist = select.existColumnValue("account","id","ncanis".getBytes());
    // update password.
    UpdateQuery update = dbo.createUpdateQuery("user");
    Hashtable<String, byte[]> colsTable = new Hashtable<String, byte[]>();
    colsTable.put("password","2222".getBytes());
    update.update(bean.getRow(),"account",colsTable);
    //delete
    DeleteQuery delete = dbo.createDeleteQuery("user");
    delete.deleteRow(resultBean.getRow());
    ////////////////////////////////////
    // etc
    // HTable pool with apache commons pool
    // borrow and release. HBasePoolManager(maxActive, minIdle etc..)
    IndexedTable table = dbo.getPool().borrow("user");
    dbo.getPool().release(table);
    // upload bigFile by hadoop directly.
    HBaseBigFile bigFile = new HBaseBigFile();
    File file = new File("doc/movie.avi");
    FileInputStream fis = new FileInputStream(file);
    Path rootPath = new Path("/files/");
    String filename = "movie.avi";
    bigFile.uploadFile(rootPath,filename,fis,true);
    // receive file stream from hadoop.
    Path p = new Path(rootPath,filename);
    InputStream is = bigFile.path2Stream(p,4096);
  }
}

13. Apache HBase External APIs

Apache HBase External APIs

REST

76.1.Starting and Stopping the REST Server

76.2. 配置REST服务和客户端

76.3. 使用REST Endpoint

76.4. REST XML Schema

76.5. REST Protobufs Schema

77. Thrift

78. C/C++ Apache HBase Client

79. Using Java Data Objects (JDO) with HBase