elasticsearch-shard

In some cases the Lucene index or translog of a shard copy can become corrupted. The elasticsearch-shard command enables you to remove corrupted parts of the shard if a good copy of the shard cannot be recovered automatically or restored from backup.

You will lose the corrupted data when you run elasticsearch-shard. This tool should only be used as a last resort if there is no way to recover from another copy of the shard or restore a snapshot.

Synopsis

  1. bin/elasticsearch-shard remove-corrupted-data
  2. ([--index <Index>] [--shard-id <ShardId>] | [--dir <IndexPath>])
  3. [--truncate-clean-translog]
  4. [-E <KeyValuePair>]
  5. [-h, --help] ([-s, --silent] | [-v, --verbose])

Description

When Elasticsearch detects that a shard’s data is corrupted, it fails that shard copy and refuses to use it. Under normal conditions, the shard is automatically recovered from another copy. If no good copy of the shard is available and you cannot restore one from a snapshot, you can use elasticsearch-shard to remove the corrupted data and restore access to any remaining data in unaffected segments.

Stop Elasticsearch before running elasticsearch-shard.

To remove corrupted shard data use the remove-corrupted-data subcommand.

There are two ways to specify the path:

  • Specify the index name and shard name with the --index and --shard-id options.
  • Use the --dir option to specify the full path to the corrupted index or translog files.

Removing corrupted data

elasticsearch-shard analyses the shard copy and provides an overview of the corruption found. To proceed you must then confirm that you want to remove the corrupted data.

Back up your data before running elasticsearch-shard. This is a destructive operation that removes corrupted data from the shard.

  1. $ bin/elasticsearch-shard remove-corrupted-data --index my-index-000001 --shard-id 0
  2. WARNING: Elasticsearch MUST be stopped before running this tool.
  3. Please make a complete backup of your index before using this tool.
  4. Opening Lucene index at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
  5. >> Lucene index is corrupted at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
  6. Opening translog at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
  7. >> Translog is clean at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
  8. Corrupted Lucene index segments found - 32 documents will be lost.
  9. WARNING: YOU WILL LOSE DATA.
  10. Continue and remove docs from the index ? Y
  11. WARNING: 1 broken segments (containing 32 documents) detected
  12. Took 0.056 sec total.
  13. Writing...
  14. OK
  15. Wrote new segments file "segments_c"
  16. Marking index with the new history uuid : 0pIBd9VTSOeMfzYT6p0AsA
  17. Changing allocation id V8QXk-QXSZinZMT-NvEq4w to tjm9Ve6uTBewVFAlfUMWjA
  18. You should run the following command to allocate this shard:
  19. POST /_cluster/reroute
  20. {
  21. "commands" : [
  22. {
  23. "allocate_stale_primary" : {
  24. "index" : "index42",
  25. "shard" : 0,
  26. "node" : "II47uXW2QvqzHBnMcl2o_Q",
  27. "accept_data_loss" : false
  28. }
  29. }
  30. ]
  31. }
  32. You must accept the possibility of data loss by changing the `accept_data_loss` parameter to `true`.
  33. Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag from /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/

When you use elasticsearch-shard to drop the corrupted data, the shard’s allocation ID changes. After restarting the node, you must use the cluster reroute API to tell Elasticsearch to use the new ID. The elasticsearch-shard command shows the request that you need to submit.

You can also use the -h option to get a list of all options and parameters that the elasticsearch-shard tool supports.

Finally, you can use the --truncate-clean-translog option to truncate the shard’s translog even if it does not appear to be corrupt.