Attaching a plain table to a real-time table
A plain table can be converted into a real-time table or added to an existing real-time table.
The first case is useful when you need to regenerated a real-time table completely which may be needed for example if tokenization settings need an update. Then preparing a plain table and converting it into a real-time table may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time table.
In the second you normally want to add a large bulk of new data to a real-time table and again creating a plain table with that data is easier than populating the existing real-time table.
Attaching table - general syntax
The ATTACH
statement allows to convert a plain table to be attached to an existing real-time table.
ATTACH TABLE plain_table TO TABLE rt_table [WITH TRUNCATE]
ATTACH TABLE
statement lets you move data from a plain table to an RT table.
After a successful ATTACH
the data originally stored in the source plain table becomes a part of the target RT table, and the source plain table becomes unavailable (until the next rebuild). ATTACH
does not result in any table data changes. Basically, it just renames the files (making the source table a new disk chunk of the target RT table) and updates the metadata. So it is a generally quick operation which might (frequently) complete as fast as under a second.
Note that when a table is attached to an empty RT table the fields, attributes and text processing settings (tokenizer, wordforms, etc) from the source table are copied over and take effect. The respective parts of the RT table definition from the configuration file will be ignored.
When TRUNCATE
option is used RT table gets truncated prior to attaching source plain table. This allows to make operation atomic or make sure that the attached source plain table will be the only data in the target RT table.
ATTACH TABLE
comes with a number of restrictions. Most notably, the target RT table is currently required to be either empty or have the same settings as the source plain table. In case the source plain table gets attached to a non-empty RT table the RT table data collected so far gets stored as a regular disk chunk and table being attached becomes the newest disk chunk and documents with same IDs get killed. The complete list is as follows:
- Target RT table needs to be either empty or have same settings
- Source plain table needs to have phrase_boundary_step\=0, stopword_step\=1.
- Example
Example
Before ATTACH the RT table is empty and has 3 fields:
mysql> DESC rt;
Empty set (0.00 sec)
mysql> SELECT * FROM rt;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| testfield | field |
| testattr | uint |
+-----------+---------+
3 rows in set (0.00 sec)
The plain table is not empty:
mysql> SELECT * FROM plain WHERE MATCH('test');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 1304 | 1 | 1313643256 |
| 2 | 1304 | 1 | 1313643256 |
| 3 | 1304 | 1 | 1313643256 |
| 4 | 1304 | 1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
Attaching:
mysql> ATTACH TABLE plain TO TABLE rt;
Query OK, 0 rows affected (0.00 sec)
The RT table now has 5 fields:
mysql> DESC rt;
+------------+-----------+
| Field | Type |
+------------+-----------+
| id | integer |
| title | field |
| content | field |
| group_id | uint |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)
And it’s not empty:
mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 1304 | 1 | 1313643256 |
| 2 | 1304 | 1 | 1313643256 |
| 3 | 1304 | 1 | 1313643256 |
| 4 | 1304 | 1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
The plain table was removed:
mysql> SELECT * FROM plain WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search
Importing table
If you decide to migrate from Plain mode) to RT mode) and in some other cases, real-time and percolate tables built in the Plain mode can be imported to Manticore running in the RT mode using the IMPORT TABLE
statement. The general syntax is as follows:
IMPORT TABLE table_name FROM 'path'
Executing this command makes all the table files of the specified table copied to data_dir. All the external table files such as wordforms, exceptions and stopwords are also copied to the same data_dir
. IMPORT TABLE
has the following limitations:
- paths to the external files that were originally specified in the config file must be absolute
- only real-time and percolate tables are supported
- plain tables need to be preliminarily (in the plain mode) converted to real-time tables via ATTACH TABLE
indexer —print-rt
If the above method for migrating plain table to RT table is not possible you may use indexer --print-rt
to dump data from plain table directly without the need to convert it to RT type table and then import dump into RT table right from command line.
This method has few limitations though:
- Only sql-based sources are supported
- MVAs are not supported
- bash
bash
/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql
mysql -P $9306 -h0 -e "truncate table my_rt_index"
mysql -P 9306 -h0 < /tmp/dump_regular.sql
rm /tmp/dump_regular.sql
Rotating a table
Table rotation is a procedure in which the searchd server looks upon new versions of defined tables in the configuration. Rotation is subject only to Plain mode of operation.
There can be two cases:
- for plain tables that are already loaded
- tables added in configuration, but not loaded yet
In the first case, indexer cannot put the new version of the table online as the running copy is locked and loaded by searchd
. In this case indexer
needs to be called with --rotate parameter. If rotate is used, indexer
creates new table files with .new.
in their names and sends a HUP signal to searchd
informing it about the new version. The searchd
will perform a lookup and will put in place the new version of the table and discard the old one. In some cases it might be desired to create the new version of the table but not perform the rotate as soon as possible. For example it might be desired to check first the health of the new table versions. In this case, indexer
can accept --nohup
parameter which will forbid sending the HUP signal to the server.
New tables can be loaded by rotation, however the regular handling of HUP signal is to check for new tables only if configuration has changed since server startup. If the table was already defined in the configuration, the table should be first created by running indexer
without rotation and perform RELOAD TABLES statement instead.
There are also two specialized statements can be used to perform rotations on tables:
RELOAD TABLE
RELOAD TABLE tbl [ FROM '/path/to/table_files' ];
RELOAD TABLE
allows you to rotate tables using SQL.
It has two modes of operation. First one (without specifying a path) makes Manticore server check for new table files in directory specified in path. New table files must have names tbl.new.sp?
.
And if you additionally specify a path, the server will look for the table files in the specified directory, will move them to the table path, rename from tbl.sp?
to tbl.new.sp?
and will rotate them.
mysql> RELOAD TABLE plain_table;
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new_table_files';
RELOAD TABLES
RELOAD TABLES;
Works same as system HUP signal. Initiates table rotation. Unlike regular HUP signalling (which can come from kill
or indexer ), the statement forces lookup on possible tables to rotate even if the configuration has no changes since the startup of the server.
Depending on the value of seamless_rotate setting, new queries might be shortly stalled; clients will receive temporary errors. Command is non-blocking (i.e., returns immediately).
mysql> RELOAD TABLES;
Query OK, 0 rows affected (0.01 sec)
Seamless rotate
The rotate assumes old table version is discarded and new table version is loaded and replace the existing one. During this swapping, the server needs also to serve incoming queries made on the table that is going to be updated. To not have stalls of the queries, the server implements by default a seamless rotate of the table as described below.
Tables may contain some data that needs to be precached in RAM. At the moment, .spa
, .spb
, .spi
and .spm
files are fully precached (they contain attribute data, blob attribute data, keyword table and killed row map, respectively.) Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:
- new queries are temporarily rejected (with “retry” error code);
searchd
waits for all currently running queries to finish;- old table is deallocated and its files are renamed;
- new table files are renamed and required RAM is allocated;
- new table attribute and dictionary data is preloaded to RAM;
searchd
resumes serving queries from new table.
However, if there’s a lot of attribute or dictionary data, then preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.
With seamless rotate enabled, rotation works as follows:
- new table RAM storage is allocated
- new table attribute and dictionary data is asynchronously preloaded to RAM
- on success, old table is deallocated and both tables’ files are renamed
- on failure, new table is deallocated
- at any given moment, queries are served either from old or new table copy
Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm
data need to be in RAM while preloading new copy). Average usage stays the same.
Example:
seamless_rotate = 1