SQL settings

The SQL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster.

It is possible to independently disable processing of PPL or SQL queries.

You can update these settings like any other cluster setting:

  1. PUT _cluster/settings
  2. {
  3. "transient" : {
  4. "plugins.sql.enabled" : false
  5. }
  6. }

Alternatively, you can use the following request format:

  1. PUT _cluster/settings
  2. {
  3. "transient": {
  4. "plugins": {
  5. "ppl": {
  6. "enabled": "false"
  7. }
  8. }
  9. }
  10. }

Similarly, you can update the settings by sending a request to the _plugins/_query/settings endpoint:

  1. PUT _plugins/_query/settings
  2. {
  3. "transient" : {
  4. "plugins.sql.enabled" : false
  5. }
  6. }

Alternatively, you can use the following request format:

  1. PUT _plugins/_query/settings
  2. {
  3. "transient": {
  4. "plugins": {
  5. "ppl": {
  6. "enabled": "false"
  7. }
  8. }
  9. }
  10. }

Requests to the _plugins/_ppl and _plugins/_sql endpoints include index names in the request body, so they have the same access policy considerations as the bulk, mget, and msearch operations. Setting the rest.action.multi.allow_explicit_index parameter to false disables both the SQL and PPL endpoints.

Available settings

SettingDefaultDescription
plugins.sql.enabledTrueChange to false to disable the SQL support in the plugin.
plugins.ppl.enabledTrueChange to false to disable the PPL support in the plugin.
plugins.sql.slowlog2 secondsConfigures the time limit (in seconds) for slow queries. The plugin logs slow queries as Slow query: elapsed=xxx (ms) in opensearch.log.
plugins.sql.cursor.keep_alive1 minuteConfigures how long the cursor context is kept open. Cursor contexts are resource-intensive, so we recommend a low value.
plugins.query.memory_limit85%Configures the heap memory usage limit for the circuit breaker of the query engine.
plugins.query.size_limit200Sets the default size of index that the query engine fetches from OpenSearch.

Spark connector settings

The SQL plugin supports Apache Spark as an augmented compute source. When data sources are defined as tables in Apache Spark, OpenSearch can consume those tables. This allows you to run SQL queries against external sources inside OpenSearch Dashboard’s Discover and observability logs.

To get started, enable the following settings to add Spark as a data source and enable the correct permissions.

SettingDescription
spark.uriThe identifier for your Spark data source.
spark.auth.typeThe authorization type used to authenticate into Spark.
spark.auth.usernameThe username for your Spark data source.
spark.auth.passwordThe password for your Spark data source.
spark.datasource.flint.hostThe host of the Spark data source. Default is localhost.
spark.datasource.flint.portThe port number for Spark. Default is 9200.
spark.datasource.flint.schemeThe data scheme used in your Spark queries. Valid values are http and https.
spark.datasource.flint.authThe authorization required to access the Spark data source. Valid values are false and sigv4.
spark.datasource.flint.regionThe AWS Region in which your OpenSearch cluster is located. Only use when auth is set to sigv4. Default value is us-west-2`.
spark.datasource.flint.write.id_nameThe name of the index to which the Spark connector writes.
spark.datasource.flint.ignore.id_columnExcludes the id column when exporting data in a query. Default is true.
spark.datasource.flint.write.batch_sizeSets the batch size when writing to a Spark-connected index. Default is 1000.
spark.datasource.flint.write.refresh_policySets the refresh policy for the Spark connection upon failure for the connector to write data to OpenSearch. Either no refresh (false), an immediate refresh (true), or a set time to wait, wait_for: X. Default value is false.
spark.datasource.flint.read.scroll_sizeSets the number of results returned by queries run using Spark. Default is 100.
spark.flint.optimizer.enabledEnables OpenSearch to be optimized for Spark connection. Default is true.
spark.flint.index.hybridscan.enabledEnables OpenSearch to scan for write data on non-partitioned devices from the data source. Default is false.

Once configured, you can test your Spark connection using the following API call:

  1. POST /_plugins/_ppl
  2. content-type: application/json
  3. {
  4. "query": "source = my_spark.sql('select * from alb_logs')"
  5. }