Flushing RAM chunk to a new disk chunk
FLUSH RAMCHUNK
FLUSH RAMCHUNK rtindex
FLUSH RAMCHUNK
forcibly creates a new disk chunk in an RT table.
Normally, RT table would flush and convert the contents of the RAM chunk into a new disk chunk automatically, once the RAM chunk reaches the maximum allowed rt_mem_limit size. However, for debugging and testing it might be useful to forcibly create a new disk chunk, and FLUSH RAMCHUNK
statement does exactly that.
Note that using FLUSH RAMCHUNK
increases RT table fragmentation. Most likely, you want to use FLUSH TABLE
instead. We suggest that you abstain from using just this statement unless you’re absolutely sure what you’re doing. As the right way is to issue FLUSH RAMCHUNK
with following OPTIMIZE command. Such combo allows to keep RT table fragmentation on minimum.
- SQL
SQL
FLUSH RAMCHUNK rt;
Response
Query OK, 0 rows affected (0.05 sec)
Flushing RAM chunk to disk
FLUSH TABLE
FLUSH TABLE rtindex
FLUSH TABLE
forcibly flushes RT table RAM chunk contents to disk.
Backing up an RT table is as simple as copying over its data files, followed by the binary log. However, recovering from that backup means that all the transactions in the log since the last successful RAM chunk write would need to be replayed. Those writes normally happen either on a clean shutdown, or periodically with a (big enough!) interval between writes specified in rt_flush_period directive. So such a backup made at an arbitrary point in time just might end up with way too much binary log data to replay.
FLUSH TABLE
forcibly writes the RAM chunk contents to disk, and also causes the subsequent cleanup of (now redundant) binary log files. Thus, recovering from a backup made just after FLUSH TABLE
should be almost instant.
- SQL
SQL
FLUSH TABLE rt;
Response
Query OK, 0 rows affected (0.05 sec)
Compacting a table
Over time, RT tables can become fragmented into many disk chunks and/or tainted with deleted, but unpurged data, impacting search performance. When that happens, they can be optimized. Basically, the optimization pass merges together disk chunks pairs, purging off documents suppressed previously by DELETEs.
Starting Manticore 4 it happens automaticaly by default, but you can also use the below commands to force table compaction.
OPTIMIZE TABLE
OPTIMIZE TABLE index_name [OPTION opt_name = opt_value [,...]]
OPTIMIZE
statement enqueues an RT table for optimization in a background thread.
- SQL
SQL
OPTIMIZE TABLE rt;
Number of optimized disk chunks
OPTIMIZE merges the RT table’s disk chunks down to the number which equals to # of CPU cores * 2
by default. The number of optimized disk chunks can be controlled with option cutoff
.
There’s also:
- server setting optimize_cutoff for overriding the above threshold
- per-table setting optimize_cutoff
- SQL
SQL
OPTIMIZE TABLE rt OPTION cutoff=4;
Running in foreground
If OPTION sync=1
is used (0 by default), the command will wait until the optimization process is done (in case the connection interrupts the optimization will continue to run on the server).
- SQL
SQL
OPTIMIZE TABLE rt OPTION sync=1;
Throttling the IO impact
Optimize can be a lengthy and IO intensive process, so to limit the impact, all the actual merge work is executed serially in a special background thread, and the OPTIMIZE
statement simply adds a job to its queue. Currently, there is no way to check the table or queue status (that might be added in the future to the SHOW TABLE STATUS
and SHOW STATUS
statements respectively). The optimization thread can be IO-throttled, you can control the maximum number of IOs per second and the maximum IO size with rt_merge_iops and rt_merge_maxiosize directives respectively.
The RT table being optimized stays online and available for both searching and updates at (almost) all times during the optimization. It gets locked for a very short time when a pair of disk chunks is merged successfully, to rename the old and the new files, and update the table header.
Optimizing clustered tables
As long as you don’t have auto_optimize disabled tables are optimized automatically
In case you are experiencing unexpected SSTs or want tables across all nodes of the cluster be binary identical you need to:
- Disable auto_optimize.
Optimize tables manually:
On one of the nodes drop the table from the cluster:
- SQL
SQL
```
ALTER CLUSTER mycluster DROP myindex;
```
Optimize the table:
- SQL
SQL
```
OPTIMIZE TABLE myindex;
```
Add back the table to the cluster:
- SQL
SQL
```
ALTER CLUSTER mycluster ADD myindex;
```
When the table is added back, the new files created by the optimize process will be replicated to the other nodes in the cluster. Any changes made locally to the table on other nodes will be lost.
Table data modifications (inserts, replaces, deletes, updates) should:
- either be postponed
- or directed to the node where the optimize process is running.
Note, while the table is out of the cluster, insert/replace/delete/update commands should refer to it without cluster name prefix (for SQL statements or cluster property fin case of a HTTP JSON request), otherwise they will fail. As soon as the table is added back to the cluster, writes can be resumed. At this point write operations on the table must include the cluster name prefix again, or they will fail. Search operations are available as usual during the process on any of the nodes.