Running Apache HBase on Alluxio
This guide describes how to run Apache HBase, so that you can easily store HBase tables into Alluxio at various storage levels.
Prerequisites
- Alluxio has been set up and is running.
- Make sure that the Alluxio client jar is available. This Alluxio client jar file can be found at
/<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar
in the tarball downloaded from Alluxio download page. Alternatively, advanced users can compile this client jar from the source code by following the instructions. - Deploy HBase Please follow this guides for setting up HBase.
Basic Setup
Apache HBase allows you to use Alluxio through a generic file system wrapper for the Hadoop file system. Therefore, the configuration of Alluxio is done mostly in HBase configuration files.
Set property in hbase-site.xml
Set the following properties in conf/hbase-site.xml
and make sure all HBase cluster nodes have the configuration.
Set the hbase.rootdir
property as follows:
<property>
<name>hbase.rootdir</name>
<value>alluxio://master_hostname:port/hbase</value>
</property>
You do not need to create the
/hbase
directory in Alluxio, HBase will do this for you.
You also need to add the FS implementation classes to HBase configuration. These classes are provided in Alluxio Client jar.
<property>
<name>fs.alluxio.impl</name>
<value>alluxio.hadoop.FileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.alluxio.impl</name>
<value>alluxio.hadoop.AlluxioFileSystem</value>
</property>
Also add the following property to the same file hbase-site.xml
:
<property>
<name>hbase.regionserver.hlog.syncer.count</name>
<value>1</value>
</property>
This property is required to prevent HBase from flushing Alluxio file stream in a thread unsafe way.
If you are running HBase version greater than 2.0, add the following property:
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
This will disable HBase new stream capabilities (hflush/hsync) used for WAL.
Distribute the Alluxio Client jar
We need to make the Alluxio client jar file available to HBase, because it contains the configured alluxio.hadoop.FileSystem
class.
Specify the location of the jar file in the $HBASE_CLASSPATH
environment variable (make sure it’s available on all cluster nodes). For example:
$ export HBASE_CLASSPATH=/<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar:${HBASE_CLASSPATH}
Alternative ways are described in the Advanced Setup
Example
Ensure alluxio scheme is recognized before starting HBase:
$ ${HBASE_HOME}/bin/start-hbase.sh
If not, follow the Usage FAQs as needed.
Visit HBase Web UI at http://<HBASE_MASTER_HOSTNAME>:16010
to confirm that HBase is running on Alluxio (check the HBase Root Directory
attribute):
And visit Alluxio Web UI at http://<ALLUXIO_MASTER_HOSTNAME>:19999
, click Browse
and you can see the files HBase stores on Alluxio, including data and WALs:
Create a text file simple_test.txt
and write these commands into it:
create 'test', 'cf'
for i in Array(0..9999)
put 'test', 'row'+i.to_s , 'cf:a', 'value'+i.to_s
end
list 'test'
scan 'test', {LIMIT => 10, STARTROW => 'row1'}
get 'test', 'row1'
Run the following command from the top level HBase project directory:
$ bin/hbase shell simple_test.txt
You should see some output like this:
If you have Hadoop installed, you can run a Hadoop-utility program in HBase shell to count the rows of the newly created table:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter test
After this mapreduce job finishes, you can see a result like this:
Advanced Setup
Alluxio in HA mode
When Alluxio is running in HA mode, change the hbase.rootdir
property in conf/hbase-site.xml
to use a HA-style Alluxio authority like host1:19998,host2:19998,host3:19998
or zk@host1:2181,host2:2181,host3:2181
.
<property>
<name>hbase.rootdir</name>
<value>alluxio://master_hostname_1:19998,master_hostname_2:19998,master_hostname_3:19998/hbase</value>
</property>
See HA authority for more details.
Add additional Alluxio site properties to HBase
If there are any Alluxio site properties you want to specify for HBase, add those to hbase-site.xml
. For example, change alluxio.user.file.writetype.default
from default ASYNC_THROUGH
to CACHE_THROUGH
:
<property>
<name>alluxio.user.file.writetype.default</name>
<value>CACHE_THROUGH</value>
</property>
Alternative way to distribute the Alluxio Client jar
Instead of specifying the location of the jar file in the $HBASE_CLASSPATH
environment variable, users could copy the /<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar
file into the lib
directory of HBase (make sure it’s available on all cluster nodes).
$ cp `/<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar` /path/to/hbase-master/lib/
$ cp `/<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar` /path/to/current/hbase-client/lib/
$ cp `/<PATH_TO_ALLUXIO>/client/alluxio-2.1.2-client.jar` /path/to/hbase-regionserver/lib/
Troubleshooting
Logging Configuration
In order to change the logging configuration for HBase, you can modify your installation’s log4j.properties
file.