Troubleshooting shards capacity health issues

Troubleshooting shards capacity health issues

Elasticsearch limits the maximum number of shards to be held per node using the cluster.max_shards_per_node and cluster.max_shards_per_node.frozen settings. The current shards capacity of the cluster is available in the health API shards capacity section.

Cluster is close to reaching the configured maximum number of shards for data nodes.

The cluster.max_shards_per_node cluster setting limits the maximum number of open shards for a cluster, only counting data nodes that do not belong to the frozen tier.

This symptom indicates that action should be taken, otherwise, either the creation of new indices or upgrading the cluster could be blocked.

If you’re confident your changes won’t destabilize the cluster, you can temporarily increase the limit using the cluster update settings API:

Elasticsearch Service Self-managed

Use Kibana

  1. Log in to the Elastic Cloud console.
  2. On the Elasticsearch Service panel, click the name of your deployment.

    If the name of your deployment is disabled your Kibana instances might be unhealthy, in which case please contact Elastic Support. If your deployment doesn’t include Kibana, all you need to do is enable it first.

  3. Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to Dev Tools > Console.

    Kibana Console

  4. Check the current status of the cluster according the shards capacity indicator:

    1. resp = client.health_report(
    2. feature="shards_capacity",
    3. )
    4. print(resp)
    1. response = client.health_report(
    2. feature: 'shards_capacity'
    3. )
    4. puts response
    1. const response = await client.healthReport({
    2. feature: "shards_capacity",
    3. });
    4. console.log(response);
    1. GET _health_report/shards_capacity

    The response will look like this:

    1. {
    2. "cluster_name": "...",
    3. "indicators": {
    4. "shards_capacity": {
    5. "status": "yellow",
    6. "symptom": "Cluster is close to reaching the configured maximum number of shards for data nodes.",
    7. "details": {
    8. "data": {
    9. "max_shards_in_cluster": 1000,
    10. "current_used_shards": 988
    11. },
    12. "frozen": {
    13. "max_shards_in_cluster": 3000,
    14. "current_used_shards": 0
    15. }
    16. },
    17. "impacts": [
    18. ...
    19. ],
    20. "diagnosis": [
    21. ...
    22. }
    23. }
    24. }

    Current value of the setting cluster.max_shards_per_node

    Current number of open shards across the cluster

  5. Update the cluster.max_shards_per_node setting with a proper value:

    1. resp = client.cluster.put_settings(
    2. persistent={
    3. "cluster.max_shards_per_node": 1200
    4. },
    5. )
    6. print(resp)
    1. response = client.cluster.put_settings(
    2. body: {
    3. persistent: {
    4. 'cluster.max_shards_per_node' => 1200
    5. }
    6. }
    7. )
    8. puts response
    1. const response = await client.cluster.putSettings({
    2. persistent: {
    3. "cluster.max_shards_per_node": 1200,
    4. },
    5. });
    6. console.log(response);
    1. PUT _cluster/settings
    2. {
    3. "persistent" : {
    4. "cluster.max_shards_per_node": 1200
    5. }
    6. }

    This increase should only be temporary. As a long-term solution, we recommend you add nodes to the oversharded data tier or reduce your cluster’s shard count on nodes that do not belong to the frozen tier.

  6. To verify that the change has fixed the issue, you can get the current status of the shards_capacity indicator by checking the data section of the health API:

    1. resp = client.health_report(
    2. feature="shards_capacity",
    3. )
    4. print(resp)
    1. response = client.health_report(
    2. feature: 'shards_capacity'
    3. )
    4. puts response
    1. const response = await client.healthReport({
    2. feature: "shards_capacity",
    3. });
    4. console.log(response);
    1. GET _health_report/shards_capacity

    The response will look like this:

    1. {
    2. "cluster_name": "...",
    3. "indicators": {
    4. "shards_capacity": {
    5. "status": "green",
    6. "symptom": "The cluster has enough room to add new shards.",
    7. "details": {
    8. "data": {
    9. "max_shards_in_cluster": 1000
    10. },
    11. "frozen": {
    12. "max_shards_in_cluster": 3000
    13. }
    14. }
    15. }
    16. }
    17. }
  7. When a long-term solution is in place, we recommend you reset the cluster.max_shards_per_node limit.

    1. resp = client.cluster.put_settings(
    2. persistent={
    3. "cluster.max_shards_per_node": None
    4. },
    5. )
    6. print(resp)
    1. response = client.cluster.put_settings(
    2. body: {
    3. persistent: {
    4. 'cluster.max_shards_per_node' => nil
    5. }
    6. }
    7. )
    8. puts response
    1. const response = await client.cluster.putSettings({
    2. persistent: {
    3. "cluster.max_shards_per_node": null,
    4. },
    5. });
    6. console.log(response);
    1. PUT _cluster/settings
    2. {
    3. "persistent" : {
    4. "cluster.max_shards_per_node": null
    5. }
    6. }

Check the current status of the cluster according the shards capacity indicator:

  1. resp = client.health_report(
  2. feature="shards_capacity",
  3. )
  4. print(resp)
  1. response = client.health_report(
  2. feature: 'shards_capacity'
  3. )
  4. puts response
  1. const response = await client.healthReport({
  2. feature: "shards_capacity",
  3. });
  4. console.log(response);
  1. GET _health_report/shards_capacity

The response will look like this:

  1. {
  2. "cluster_name": "...",
  3. "indicators": {
  4. "shards_capacity": {
  5. "status": "yellow",
  6. "symptom": "Cluster is close to reaching the configured maximum number of shards for data nodes.",
  7. "details": {
  8. "data": {
  9. "max_shards_in_cluster": 1000,
  10. "current_used_shards": 988
  11. },
  12. "frozen": {
  13. "max_shards_in_cluster": 3000
  14. }
  15. },
  16. "impacts": [
  17. ...
  18. ],
  19. "diagnosis": [
  20. ...
  21. }
  22. }
  23. }

Current value of the setting cluster.max_shards_per_node

Current number of open shards across the cluster

Using the cluster settings API, update the cluster.max_shards_per_node setting:

  1. resp = client.cluster.put_settings(
  2. persistent={
  3. "cluster.max_shards_per_node": 1200
  4. },
  5. )
  6. print(resp)
  1. response = client.cluster.put_settings(
  2. body: {
  3. persistent: {
  4. 'cluster.max_shards_per_node' => 1200
  5. }
  6. }
  7. )
  8. puts response
  1. const response = await client.cluster.putSettings({
  2. persistent: {
  3. "cluster.max_shards_per_node": 1200,
  4. },
  5. });
  6. console.log(response);
  1. PUT _cluster/settings
  2. {
  3. "persistent" : {
  4. "cluster.max_shards_per_node": 1200
  5. }
  6. }

This increase should only be temporary. As a long-term solution, we recommend you add nodes to the oversharded data tier or reduce your cluster’s shard count on nodes that do not belong to the frozen tier. To verify that the change has fixed the issue, you can get the current status of the shards_capacity indicator by checking the data section of the health API:

  1. resp = client.health_report(
  2. feature="shards_capacity",
  3. )
  4. print(resp)
  1. response = client.health_report(
  2. feature: 'shards_capacity'
  3. )
  4. puts response
  1. const response = await client.healthReport({
  2. feature: "shards_capacity",
  3. });
  4. console.log(response);
  1. GET _health_report/shards_capacity

The response will look like this:

  1. {
  2. "cluster_name": "...",
  3. "indicators": {
  4. "shards_capacity": {
  5. "status": "green",
  6. "symptom": "The cluster has enough room to add new shards.",
  7. "details": {
  8. "data": {
  9. "max_shards_in_cluster": 1200
  10. },
  11. "frozen": {
  12. "max_shards_in_cluster": 3000
  13. }
  14. }
  15. }
  16. }
  17. }

When a long-term solution is in place, we recommend you reset the cluster.max_shards_per_node limit.

  1. resp = client.cluster.put_settings(
  2. persistent={
  3. "cluster.max_shards_per_node": None
  4. },
  5. )
  6. print(resp)
  1. response = client.cluster.put_settings(
  2. body: {
  3. persistent: {
  4. 'cluster.max_shards_per_node' => nil
  5. }
  6. }
  7. )
  8. puts response
  1. const response = await client.cluster.putSettings({
  2. persistent: {
  3. "cluster.max_shards_per_node": null,
  4. },
  5. });
  6. console.log(response);
  1. PUT _cluster/settings
  2. {
  3. "persistent" : {
  4. "cluster.max_shards_per_node": null
  5. }
  6. }

Cluster is close to reaching the configured maximum number of shards for frozen nodes.

The cluster.max_shards_per_node.frozen cluster setting limits the maximum number of open shards for a cluster, only counting data nodes that belong to the frozen tier.

This symptom indicates that action should be taken, otherwise, either the creation of new indices or upgrading the cluster could be blocked.

If you’re confident your changes won’t destabilize the cluster, you can temporarily increase the limit using the cluster update settings API:

Elasticsearch Service Self-managed

Use Kibana

  1. Log in to the Elastic Cloud console.
  2. On the Elasticsearch Service panel, click the name of your deployment.

    If the name of your deployment is disabled your Kibana instances might be unhealthy, in which case please contact Elastic Support. If your deployment doesn’t include Kibana, all you need to do is enable it first.

  3. Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to Dev Tools > Console.

    Kibana Console

  4. Check the current status of the cluster according the shards capacity indicator:

    1. resp = client.health_report(
    2. feature="shards_capacity",
    3. )
    4. print(resp)
    1. response = client.health_report(
    2. feature: 'shards_capacity'
    3. )
    4. puts response
    1. const response = await client.healthReport({
    2. feature: "shards_capacity",
    3. });
    4. console.log(response);
    1. GET _health_report/shards_capacity

    The response will look like this:

    1. {
    2. "cluster_name": "...",
    3. "indicators": {
    4. "shards_capacity": {
    5. "status": "yellow",
    6. "symptom": "Cluster is close to reaching the configured maximum number of shards for frozen nodes.",
    7. "details": {
    8. "data": {
    9. "max_shards_in_cluster": 1000
    10. },
    11. "frozen": {
    12. "max_shards_in_cluster": 3000,
    13. "current_used_shards": 2998
    14. }
    15. },
    16. "impacts": [
    17. ...
    18. ],
    19. "diagnosis": [
    20. ...
    21. }
    22. }
    23. }

    Current value of the setting cluster.max_shards_per_node.frozen

    Current number of open shards used by frozen nodes across the cluster

  5. Update the cluster.max_shards_per_node.frozen setting:

    1. resp = client.cluster.put_settings(
    2. persistent={
    3. "cluster.max_shards_per_node.frozen": 3200
    4. },
    5. )
    6. print(resp)
    1. response = client.cluster.put_settings(
    2. body: {
    3. persistent: {
    4. 'cluster.max_shards_per_node.frozen' => 3200
    5. }
    6. }
    7. )
    8. puts response
    1. const response = await client.cluster.putSettings({
    2. persistent: {
    3. "cluster.max_shards_per_node.frozen": 3200,
    4. },
    5. });
    6. console.log(response);
    1. PUT _cluster/settings
    2. {
    3. "persistent" : {
    4. "cluster.max_shards_per_node.frozen": 3200
    5. }
    6. }

    This increase should only be temporary. As a long-term solution, we recommend you add nodes to the oversharded data tier or reduce your cluster’s shard count on nodes that belong to the frozen tier.

  6. To verify that the change has fixed the issue, you can get the current status of the shards_capacity indicator by checking the data section of the health API:

    1. resp = client.health_report(
    2. feature="shards_capacity",
    3. )
    4. print(resp)
    1. response = client.health_report(
    2. feature: 'shards_capacity'
    3. )
    4. puts response
    1. const response = await client.healthReport({
    2. feature: "shards_capacity",
    3. });
    4. console.log(response);
    1. GET _health_report/shards_capacity

    The response will look like this:

    1. {
    2. "cluster_name": "...",
    3. "indicators": {
    4. "shards_capacity": {
    5. "status": "green",
    6. "symptom": "The cluster has enough room to add new shards.",
    7. "details": {
    8. "data": {
    9. "max_shards_in_cluster": 1000
    10. },
    11. "frozen": {
    12. "max_shards_in_cluster": 3200
    13. }
    14. }
    15. }
    16. }
    17. }
  7. When a long-term solution is in place, we recommend you reset the cluster.max_shards_per_node.frozen limit.

    1. resp = client.cluster.put_settings(
    2. persistent={
    3. "cluster.max_shards_per_node.frozen": None
    4. },
    5. )
    6. print(resp)
    1. response = client.cluster.put_settings(
    2. body: {
    3. persistent: {
    4. 'cluster.max_shards_per_node.frozen' => nil
    5. }
    6. }
    7. )
    8. puts response
    1. const response = await client.cluster.putSettings({
    2. persistent: {
    3. "cluster.max_shards_per_node.frozen": null,
    4. },
    5. });
    6. console.log(response);
    1. PUT _cluster/settings
    2. {
    3. "persistent" : {
    4. "cluster.max_shards_per_node.frozen": null
    5. }
    6. }

Check the current status of the cluster according the shards capacity indicator:

  1. resp = client.health_report(
  2. feature="shards_capacity",
  3. )
  4. print(resp)
  1. response = client.health_report(
  2. feature: 'shards_capacity'
  3. )
  4. puts response
  1. const response = await client.healthReport({
  2. feature: "shards_capacity",
  3. });
  4. console.log(response);
  1. GET _health_report/shards_capacity
  1. {
  2. "cluster_name": "...",
  3. "indicators": {
  4. "shards_capacity": {
  5. "status": "yellow",
  6. "symptom": "Cluster is close to reaching the configured maximum number of shards for frozen nodes.",
  7. "details": {
  8. "data": {
  9. "max_shards_in_cluster": 1000
  10. },
  11. "frozen": {
  12. "max_shards_in_cluster": 3000,
  13. "current_used_shards": 2998
  14. }
  15. },
  16. "impacts": [
  17. ...
  18. ],
  19. "diagnosis": [
  20. ...
  21. }
  22. }
  23. }

Current value of the setting cluster.max_shards_per_node.frozen.

Current number of open shards used by frozen nodes across the cluster.

Using the cluster settings API, update the cluster.max_shards_per_node.frozen setting:

  1. resp = client.cluster.put_settings(
  2. persistent={
  3. "cluster.max_shards_per_node.frozen": 3200
  4. },
  5. )
  6. print(resp)
  1. response = client.cluster.put_settings(
  2. body: {
  3. persistent: {
  4. 'cluster.max_shards_per_node.frozen' => 3200
  5. }
  6. }
  7. )
  8. puts response
  1. const response = await client.cluster.putSettings({
  2. persistent: {
  3. "cluster.max_shards_per_node.frozen": 3200,
  4. },
  5. });
  6. console.log(response);
  1. PUT _cluster/settings
  2. {
  3. "persistent" : {
  4. "cluster.max_shards_per_node.frozen": 3200
  5. }
  6. }

This increase should only be temporary. As a long-term solution, we recommend you add nodes to the oversharded data tier or reduce your cluster’s shard count on nodes that belong to the frozen tier. To verify that the change has fixed the issue, you can get the current status of the shards_capacity indicator by checking the data section of the health API:

  1. resp = client.health_report(
  2. feature="shards_capacity",
  3. )
  4. print(resp)
  1. response = client.health_report(
  2. feature: 'shards_capacity'
  3. )
  4. puts response
  1. const response = await client.healthReport({
  2. feature: "shards_capacity",
  3. });
  4. console.log(response);
  1. GET _health_report/shards_capacity

The response will look like this:

  1. {
  2. "cluster_name": "...",
  3. "indicators": {
  4. "shards_capacity": {
  5. "status": "green",
  6. "symptom": "The cluster has enough room to add new shards.",
  7. "details": {
  8. "data": {
  9. "max_shards_in_cluster": 1000
  10. },
  11. "frozen": {
  12. "max_shards_in_cluster": 3200
  13. }
  14. }
  15. }
  16. }
  17. }

When a long-term solution is in place, we recommend you reset the cluster.max_shards_per_node.frozen limit.

  1. resp = client.cluster.put_settings(
  2. persistent={
  3. "cluster.max_shards_per_node.frozen": None
  4. },
  5. )
  6. print(resp)
  1. response = client.cluster.put_settings(
  2. body: {
  3. persistent: {
  4. 'cluster.max_shards_per_node.frozen' => nil
  5. }
  6. }
  7. )
  8. puts response
  1. const response = await client.cluster.putSettings({
  2. persistent: {
  3. "cluster.max_shards_per_node.frozen": null,
  4. },
  5. });
  6. console.log(response);
  1. PUT _cluster/settings
  2. {
  3. "persistent" : {
  4. "cluster.max_shards_per_node.frozen": null
  5. }
  6. }