The Apache HBase Shell

The Apache HBase Shell

The Apache HBase Shell是在(J)Ruby’s IRB的基础上加了些特定地HBase命令，能在IRB中执行的，在HBase shell中也开以执行。

要运行HBase Shell，执行下面语句：

$ ./bin/hbase shell

输入help就会返回命令和选项表，至少看下文档末尾如何输入变量和命令参数到HBase Shell中，特别注意表名、行、列等要用单引号

See shell exercises for example basic shell operation.

Here is a nicely formatted listing of all shell commands by Rajeshbabu Chintaguntla.

13. Ruby脚本

到HBase bin目录下看下HBase脚本的例子，找到以*.rb结尾的文件。运行其中一个一下面的格式：

./bin/hbase org.jruby.Main PATH_TO_SCRIPT

14. 非交互式运行Shell

一种新的非交互式的模式被加入到HBase Shell中。非交互式模式捕捉HBase Shell命令的退出状态（成功或失败）并将状态传递给命令解释器。如果使用普通的交互模式，HBase Shell将只会返回它自己的退出状态，几乎总是0代表成功。

要使用非交互式模式输入-n 或 —non-interactive选项在HBase Shell中

15. 操作系统脚本中的HBase Shell

也可以在操作系统脚本语言解释器中使用HBase Shell，如UNIX/LINUX中默认的Bash中，下面的guidelines使用Bash的句法（syntax），也可以调整成C风格的Shell如csh或tcsh，可能也可以修改成微软Windows风格脚本。Submissions are welcome.

这种方式产生（spawning？）HBase Shell比较慢，所以当你要好好想想什么时候合适将二者结合（Hbase操作和操作系统命令行）

Example 8. Passing Commands to the HBase Shell

使用echo或者管道‘ | ‘可以将命令传递给非交互式的HBase Shell，注意Hbase命令中的转移字符（escape characters）不然它们可能被shell解释。下面的例子中一些调试级输出被截获
$ echo "describe 'test1'" | ./hbase shell -n
Version 0.98.3-hadoop2, rd5e65a9144e315bb0a964e7730871af32f5018d5, Sat May 31 19:56:09 PDT 2014

describe ‘test1’

DESCRIPTION ENABLED ‘test1’, {NAME => ‘cf’, DATA_BLOCK_ENCODING => ‘NON true E’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘1’, COMPRESSION => ‘NONE’, MIN_VERSIO NS => ‘0’, TTL => ‘FOREVER’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’ , BLOCKCACHE => ‘true’} 1 row(s) in 3.2410 seconds

To suppress all output, echo it to /dev/null:
$ echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1

Example 9. Checking the Result of a Scripted Command

由于脚本被设计以交互式运行，你需要一种新的方式去检查你的命令是否成功。HBase hell使用标准的返回0代表运行成功，而非零代表失败。Bash在一个特殊的环境变量（$？）中存放命令的返回值.因每次运行命令，这个变量都会被覆盖，所以我们应该把结果存放到一个不同的、脚本定义的变量中。

下面简单的脚本是一种存储返回值并利用它做操作的方法
 #!/bin/bash
  echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
  status=$?
  echo "The status was " $status
  if ($status == 0); then
      echo "The command succeeded."
  else
      echo "The command may have failed."
  fi
  return $status

15.1. 脚本中检查结果

得到退出码0一定代表脚本命令执行成功。而得到非零值不一定意味着执行失败。命令可能成功了，但客户端失去联系或者其他事情隐藏来成功的结果。这是因为RPC命令是stateless（Of a system or protocol, such that it does not keep a persistent state between transactions.对系统或协议，这表示业务不会保持持久的状态）。唯一确定操作的状态是去检查。例如，你的脚本创建来一个表，返回一个非零值，你应该在再次创建之前检查下这个表是否真创建了。

16. 从命令文件读取HBase Shell命令

可以将HBase Shell写入一个文本文件。一个命令一行，并将文件输入到HBase Shell

Example 10. Example Command File

creat 'test', 'cf'
list 'test'
put 'test', 'row1', 'cf:a', 'value1'
put 'test', 'row2', 'cf:b', 'value2'
put 'test', 'row3', 'cf:c', 'value3'
put 'test', 'row4', 'cf:d', 'value4'
scan 'test'
get 'test', 'row1'
disable 'test'
enable 'test'

Example 11. Directing HBase Shell to Execute the Commands

将路径作为Hbase shell命令的唯一参数传递给它。每个命令会被执行，结果如下。如果脚本中没有exit命令，将返回到Hbase shell命令行中。不能以编程的方式（programmatically）检查每个命令的结果。同样，即使你看到每个命令的输出，但每个命令本身没有在屏幕中对应(echoed),也很难将命令和输出对等起来。

$ ./hbase shell ./sample_commands.txt

0 row(s) in 3.4170 seconds

TABLE

test

1 row(s) in 0.0590 seconds

0 row(s) in 0.1540 seconds

0 row(s) in 0.0080 seconds

0 row(s) in 0.0060 seconds

0 row(s) in 0.0060 seconds

ROW COLUMN+CELL

row1 column=cf:a, timestamp=1407130286968, value=value1

row2 column=cf:b, timestamp=1407130286997, value=value2

row3 column=cf:c, timestamp=1407130287007, value=value3

row4 column=cf:d, timestamp=1407130287015, value=value4

4 row(s) in 0.0420 seconds

COLUMN CELL

cf:a timestamp=1407130286968, value=value1

1 row(s) in 0.0110 seconds

0 row(s) in 1.5630 seconds

0 row(s) in 0.4360 seconds

17. Passing VM Options to the Shell

使用环境变量HBASE_SHELL_OPTS可以将VM选项传递给Hbase shell，也可以同感编辑～/.bashrc来设置这个，或将这个作为登录Hbase shell的一个参数。下面例子设置了几个garbage-collection-related变量，它们只在这次VM运行HBase shell有用。命令应该写在一行之内，如果要断行利用 \

$ HBASE_SHELL_OPTS=”-verbose:gc -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps \ -XX:+PrintGCDetails -Xloggc:$HBASE_HOME/logs/gc-hbase.log” ./bin/hbase shell

18. Shell技巧

18.1. Table variables

HBase 0.95加入shell命令，它为表提供了jruby风格的面向对象的引用，之前所有针对表的shell命令都是一种程序型风格，即使用表名作为一个变量。HBase0.95引开始可以将一个表存入一个jruby变量中，引用表可以用来展示数据的读写操作，像puts、scans以及gets、管理功能如disabling、删除、describing表。

例如，之前你总要指定表名：

hbase(main):000:0> create ‘t’, ‘f’
0 row(s) in 1.0970 seconds
hbase(main):001:0> put 't', 'rold', 'f', 'v'
0 row(s) in 0.0080 seconds
hbase(main):002:0> scan 't'
ROW                                COLUMN+CELL
 rold                              column=f:, timestamp=1378473207660, value=v
1 row(s) in 0.0130 seconds
hbase(main):003:0> describe 't'
DESCRIPTION                                                                           ENABLED
't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
', BLOCKCACHE => 'true'}
1 row(s) in 1.4430 seconds
hbase(main):004:0> disable 't'
0 row(s) in 14.8700 seconds
hbase(main):005:0> drop 't'
0 row(s) in 23.1670 seconds
hbase(main):006:0>

如今你可以将表赋值给一个变量并在jruby shell中使用结果。

hbase(main):007 > t = create 't', 'f'
0 row(s) in 1.0970 seconds
=> Hbase::Table - t
hbase(main):008 > t.put 'r', 'f', 'v'
0 row(s) in 0.0640 seconds
hbase(main):009 > t.scan
ROW                           COLUMN+CELL
r                            column=f:, timestamp=1331865816290, value=v
1 row(s) in 0.0110 seconds
hbase(main):010:0> t.describe
DESCRIPTION                                                                           ENABLED
't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
', BLOCKCACHE => 'true'}
1 row(s) in 0.0210 seconds
hbase(main):038:0> t.disable
0 row(s) in 6.2350 seconds
hbase(main):039:0> t.drop
0 row(s) in 0.2340 seconds

如果这个表已经存在，可以用get_table方法将表赋值给变量：

hbase(main):011 > create 't','f'
0 row(s) in 1.2500 seconds
=> Hbase::Table - t
hbase(main):012:0> tab = get_table 't'
0 row(s) in 0.0010 seconds
=> Hbase::Table - t
hbase(main):013:0> tab.put ‘r1’ ,’f’, ‘v’
0 row(s) in 0.0100 seconds
hbase(main):014:0> tab.scan
ROW                                COLUMN+CELL
r1                                column=f:, timestamp=1378473876949, value=v
1 row(s) in 0.0240 seconds
hbase(main):015:0>

list functionality也被扩展这样可以返回一列字符串列表名。之后你可以用jruby利用这些名字执行脚本操作。list_snapshots命令也同样有用。

hbase(main):016 > tables = list(‘t.*’)
TABLE
t
1 row(s) in 0.1040 seconds
=> #<#<Class:0x7677ce29>:0x21d377a4>
hbase(main):017:0> tables.map { |t| disable t ; drop  t}
0 row(s) in 2.2510 seconds
=> [nil]
hbase(main):018:0>

18.2. irbrc

可以在你的home目录下创建一个.irbrc文件。加入一些自定义命令。一个有用的是记录历史命令这样可以将命令保存下来。

$ more .irbrc
require 'irb/ext/save-history'
IRB.conf[:SAVE_HISTORY] = 100
IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irv-save-history"

See the ruby documentation of .irbrc to learn about other possible configurations.

18.3. LOG data to timestamp

将hbase log中的日期 ‘08/08/16 20:56:29’，转换成时间戳，do

hbase(main):021:0> import java.text.SimpleDateFormat
hbase(main):022:0> import java.text.ParsePosition
hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000

反过来：

hbase(main):021:0> import java.util.Date
hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"

18.4.1 Pre-splitting tables with the HBase Shell

当在Hbase shell中创建表时，可以用许多种选项来pre-split表

最简单的方法是创建表时指定分割点的数组，注意当指定字符串作为分割点时，可能在代表字符串的底层字节的基础上创建分割点。因此当指定分割点‘10’时，实际指定了字节分割点‘\x31\30’。

分割点将定义n+1个区域，其中n是分割点的数量。最低的区域将包含所有的从最低的可能的key到第一个分割点（不包含第一个分割点），下一个区域将包含第一个分割点到下个分割点（不包含下个分割点），直到最后一个分割点。最后的区域被定为从最后的分割点到最大可能的key。

hbase>create 't1', 'f', SPLITS => ['10','20','30']

上面的例子中，表‘t1’创建时将包含column family ‘f’，且pre-split成四个区域。注意第一个区域将包含从‘\x00’到‘\x30’的所有keys（as ‘\x31’ is the ASCII code for ‘1’）

也可以用下面的方法将分割点存入文件中。例子中，从本地文件系统的本地路径读入一个文件。每行指定一个分割点

hbase>create 't14', 'f', SPLITS_FILE=>'splits.txt'

其他选项是按照一个期望的区域数量和分割算法来自动完成分割。Hbase提供了均匀分割key区域或以十六进制分割key区域的分割算法，但你也可以用自己的分割方法来分割key区域

# create table with four regions based on random bytes keys
hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
# create table with five regions based on hex keys
hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }

Hbase shell是有效的Ruby环境，所以可以用简单的Ruby脚本完成分割算法

# generate splits for long (Ruby fixnum) key range from start to end key
hbase(main):070:0> def gen_splits(start_key,end_key,num_regions)
hbase(main):071:1>   results=[]
hbase(main):072:1>   range=end_key-start_key
hbase(main):073:1>   incr=(range/num_regions).floor
hbase(main):074:1>   for i in 1 .. num_regions-1
hbase(main):075:2>     results.push([i*incr+start_key].pack("N"))
hbase(main):076:2>   end
hbase(main):077:1>   return results
hbase(main):078:1> end
hbase(main):079:0>
hbase(main):080:0> splits=gen_splits(1,2000000,10)
=> ["\000\003\r@", "\000\006\032\177", "\000\t'\276", "\000\f4\375", "\000\017B<", "\000\022O{", "\000\025\\\272", "\000\030i\371", "\000\ew8"]
hbase(main):081:0> create 'test_splits','f',SPLITS=>splits
0 row(s) in 0.2670 seconds
=> Hbase::Table - test_splits

注意Hbase shell命令 truncate 删除并使用默认选项重新创建表，这有可能破坏pre-splitting，所以如果你要truncate一个pre-split表，你应该先删除并重新创建那个表，创建时要明确指出自定义的分割选项。

18.5. 调试

18.5.1. Shell调试switch

你可以将shell设置成debug模式来获取更多输出，如更多命令异常的stack trace

hbase> debug <RETURN>

18.5.2 DEBUG log level

使用-d选项来启用DEBUG level logging

$ ./bin/hbase shell -d

18.6 命令

18.6.1. 计数

计数命令返回一个表的行数，如果配置来正确的CACHE很快就可以完成。

hbase> count '<tablename>', CACHE => 1000

上面的count命令一次获取1000行，如果rows很大就将CACHE设置的小点，默认一次获取一行。

5. HBase Shell