Configuring the Hive metastore

Metering is a deprecated feature. Deprecated functionality is still included in OKD and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.

For the most recent list of major functionality that has been deprecated or removed within OKD, refer to the Deprecated and removed features section of the OKD release notes.

Hive metastore is responsible for storing all the metadata about the database tables created in Presto and Hive. By default, the metastore stores this information in a local embedded Derby database in a persistent volume attached to the pod.

Generally, the default configuration of the Hive metastore works for small clusters, but users may wish to improve performance or move storage requirements out of cluster by using a dedicated SQL database for storing the Hive metastore data.

Configuring persistent volumes

By default, Hive requires one persistent volume to operate.

hive-metastore-db-data is the main persistent volume claim (PVC) required by default. This PVC is used by the Hive metastore to store metadata about tables, such as table name, columns, and location. Hive metastore is used by Presto and the Hive server to look up table metadata when processing queries. You remove this requirement by using MySQL or PostgreSQL for the Hive metastore database.

To install, Hive metastore requires that dynamic volume provisioning is enabled in a storage class, a persistent volume of the correct size must be manually pre-created, or you use a pre-existing MySQL or PostgreSQL database.

Configuring the storage class for the Hive metastore

To configure and specify a storage class for the hive-metastore-db-data persistent volume claim, specify the storage class in your MeteringConfig custom resource. An example storage section with the class field is included in the metastore-storage.yaml file below.

  1. apiVersion: metering.openshift.io/v1
  2. kind: MeteringConfig
  3. metadata:
  4. name: "operator-metering"
  5. spec:
  6. hive:
  7. spec:
  8. metastore:
  9. storage:
  10. # Default is null, which means using the default storage class if it exists.
  11. # If you wish to use a different storage class, specify it here
  12. # class: "null" (1)
  13. size: "5Gi"
1Uncomment this line and replace null with the name of the storage class to use. Leaving the value null will cause metering to use the default storage class for the cluster.

Configuring the volume size for the Hive metastore

Use the metastore-storage.yaml file below as a template to configure the volume size for the Hive metastore.

  1. apiVersion: metering.openshift.io/v1
  2. kind: MeteringConfig
  3. metadata:
  4. name: "operator-metering"
  5. spec:
  6. hive:
  7. spec:
  8. metastore:
  9. storage:
  10. # Default is null, which means using the default storage class if it exists.
  11. # If you wish to use a different storage class, specify it here
  12. # class: "null"
  13. size: "5Gi" (1)
1Replace the value for size with your desired capacity. The example file shows “5Gi”.

Using MySQL or PostgreSQL for the Hive metastore

The default installation of metering configures Hive to use an embedded Java database called Derby. This is unsuited for larger environments and can be replaced with either a MySQL or PostgreSQL database. Use the following example configuration files if your deployment requires a MySQL or PostgreSQL database for Hive.

There are three configuration options you can use to control the database used by Hive metastore: url, driver, and secretName.

Create your MySQL or Postgres instance with a username and password. Then create a secret by using the OpenShift CLI or a YAML file. The secretName you create for this secret must map to the spec.hive.spec.config.db.secretName field in the MeteringConfig resource.

To create a secret in OpenShift CLI you can use the following command:

  1. $ oc --namespace openshift-metering create secret generic <YOUR_SECRETNAME> --from-literal=username=<YOUR_DATABASE_USERNAME> --from-literal=password=<YOUR_DATABASE_PASSWORD>

To create a secret by using a YAML file, use the following example file:

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: <YOUR_SECRETNAME> (1)
  5. data:
  6. username: <BASE64_ENCODED_DATABASE_USERNAME> (2)
  7. password: <BASE64_ENCODED_DATABASE_PASSWORD> (3)
1The name of the secret.
2Base64 encoded database username.
3Base64 encoded database password.

Use the example configuration file below to use a MySQL database for Hive:

  1. spec:
  2. hive:
  3. spec:
  4. metastore:
  5. storage:
  6. create: false
  7. config:
  8. db:
  9. url: "jdbc:mysql://mysql.example.com:3306/hive_metastore"
  10. driver: "com.mysql.jdbc.Driver"
  11. secretName: "REPLACEME" (1)
1The name of the secret containing the base64-encrypted username and password database credentials.

You can pass additional JDBC parameters using the spec.hive.config.url. For more details see the MySQL Connector/J documentation.

Use the example configuration file below to use a PostgreSQL database for Hive:

  1. spec:
  2. hive:
  3. spec:
  4. metastore:
  5. storage:
  6. create: false
  7. config:
  8. db:
  9. url: "jdbc:postgresql://postgresql.example.com:5432/hive_metastore"
  10. driver: "org.postgresql.Driver"
  11. username: "REPLACEME"
  12. password: "REPLACEME"

You can pass additional JDBC parameters using the URL. For more details see the PostgreSQL JDBC driver documentation.