Running Apache HBase on Alluxio

Slack Docker Pulls GitHub edit source

This guide describes how to run Apache HBase, so that you can easily store HBase tables into Alluxio at various storage levels.

Prerequisites

  • Alluxio has been set up and is running.
  • Make sure that the Alluxio client jar is available. This Alluxio client jar file can be found at /<PATH_TO_ALLUXIO>/client/alluxio-1.8.2-client.jar in the tarball downloaded from Alluxio download page. Alternatively, advanced users can compile this client jar from the source code by following the instructions.
  • Deploy HBase Please follow this guides for setting up HBase.

Basic Setup

Apache HBase allows you to use Alluxio through a generic file system wrapper for the Hadoop file system. Therefore, the configuration of Alluxio is done mostly in HBase configuration files.

Set property in hbase-site.xml

Change the hbase.rootdir property in conf/hbase-site.xml:

You do not need to create the /hbase directory in Alluxio, HBase will do this for you.

  1. <property>
  2. <name>hbase.rootdir</name>
  3. <value>alluxio://master_hostname:port/hbase</value>
  4. </property>

Add the following property to the same file hbase-site.xml. (make sure it is configured in all HBase cluster nodes):

  1. <property>
  2. <name>hbase.regionserver.hlog.syncer.count</name>
  3. <value>1</value>
  4. </property>

This property is required to prevent HBase from flushing Alluxio file stream in a thread unsafe way.

Distribute the Alluxio Client jar

We need to make the Alluxio client jar file available to HBase, because it contains the configured alluxio.hadoop.FileSystem class.

Specify the location of the jar file in the $HBASE_CLASSPATH environment variable (make sure it’s available on all cluster nodes). For example:

  1. export HBASE_CLASSPATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.2-client.jar:${HBASE_CLASSPATH}

Alternative ways are described in the Advanced Setup

Example

Start HBase:

  1. ${HBASE_HOME}/bin/start-hbase.sh

Visit HBase Web UI at http://<HBASE_MASTER_HOSTNAME>:16010 to confirm that HBase is running on Alluxio (check the HBase Root Directory attribute):

HBaseRootDirectory

And visit Alluxio Web UI at http://<ALLUXIO_MASTER_HOSTNAME>:19999, click Browse and you can see the files HBase stores on Alluxio, including data and WALs:

HBaseRootDirectoryOnAlluxio

Create a text file simple_test.txt and write these commands into it:

  1. create 'test', 'cf'
  2. for i in Array(0..9999)
  3. put 'test', 'row'+i.to_s , 'cf:a', 'value'+i.to_s
  4. end
  5. list 'test'
  6. scan 'test', {LIMIT => 10, STARTROW => 'row1'}
  7. get 'test', 'row1'

Run the following command from the top level HBase project directory:

  1. bin/hbase shell simple_test.txt

You should see some output like this:

HBaseShellOutput

If you have Hadoop installed, you can run a Hadoop-utility program in HBase shell to count the rows of the newly created table:

  1. bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter test

After this mapreduce job finishes, you can see a result like this:

HBaseHadoopOutput

Advanced Setup

Alluxio in HA mode

When Alluxio is running in fault tolerant mode, change the hbase.rootdir property in conf/hbase-site.xml to include Zookeeper information.

  1. <property>
  2. <name>hbase.rootdir</name>
  3. <value>alluxio://zk@zookeeper_hostname1:2181,zookeeper_hostname2:2181,zookeeper_hostname3:2181/hbase</value>
  4. </property>

Add additional Alluxio site properties to HBase

If there are any Alluxio site properties you want to specify for HBase, add those to hbase-site.xml. For example, change alluxio.user.file.writetype.default from default MUST_CACHE to CACHE_THROUGH:

  1. <property>
  2. <name>alluxio.user.file.writetype.default</name>
  3. <value>CACHE_THROUGH</value>
  4. </property>

Alternative way to distribute the Alluxio Client jar

Instead of specifying the location of the jar file in the $HBASE_CLASSPATH environment variable, users could copy the /<PATH_TO_ALLUXIO>/client/alluxio-1.8.2-client.jar file into the lib directory of HBase (make sure it’s available on all cluster nodes).

Troubleshooting

Logging Configuration

In order to change the logging configuration for HBase, you can modify your installation’s log4j.properties file.