Real-time table
- Creating a real-time table via JSON over HTTP:
👍 What you can do with a real-time table:
⛔ What you cannot do with a real-time table:
- Real-time table files structure
Plain table
- 👍 What you can do with a plain table:
- ⛔ What you cannot do with a plain table:
Plain and real-time table settings
- Defining table schema in a configuration file
Engine
Other settings

Real-time table

Real-time table is the main type of tables in Manticore. It allows adding, updating and deleting documents with immediate availability of the changes. Real-time table settings can be defined in a configuration file or online via CREATE/UPDATE/DELETE/ALTER commands.

Real-time table internally consists of one or multiple plain tables called chunks. There can be:

multiple disk chunks. They are stored on disk with the same structure as any plain table
single ram chunk. Stored in memory and used as an accumulator of changes

RAM chunk size is controlled by rt_mem_limit. Once the limit is exceeded the RAM chunk is flushed to disk in a form of a disk chunk. When there are too many disk chunks they can be merged into one for better performance using command OPTIMIZE.

SQL
JSON
PHP
Python
Javascript
Java
CONFIG

SQL JSON PHP Python Javascript Java CONFIG

CREATE TABLE products(title text, price float) morphology='stem_en';

POST /cli -d "CREATE TABLE products(title text, price float)  morphology='stem_en'"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
]);

utilsApi.sql('CREATE TABLE forum(title text, price float)')

res = await utilsApi.sql('CREATE TABLE forum(title text, price float)');

utilsApi.sql("CREATE TABLE forum(title text, price float)");

table products {
  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
  stored_fields = title
}

Response

Query OK, 0 rows affected (0.00 sec)

{
"total":0,
"error":"",
"warning":""
}

Creating a real-time table via JSON over HTTP:

👍 What you can do with a real-time table:

Add documents
Update attributes and full-text fields
Delete documents
Truncate table
Change schema online with help of the command ALTER
Define table in a configuration file
Use UUID for automatic ID provisioning

⛔ What you cannot do with a real-time table:

Index data with help of indexer
Link it with sources for easy indexing from external storages
Update it’s killlist_target, it’s just not needed as the real-time table takes controls of it automatically

Real-time table files structure

Extension	Description
`.lock`	lock file
`.ram`	RAM chunk
`.meta`	RT table headers
`..sp`	disk chunks (see plain table format)

Plain table

Plain table is a basic element for non-percolate searching. It can be specified only in a configuration file in the Plain mode). It’s not supported in the RT mode). It’s normally used together with a source to process data from an external storage and afterwards can be attached to a real-time table.

👍 What you can do with a plain table:

build it from an external storage with help of source and indexer
do an in-place update of an integer, float, string and MVA attribute
update it’s killlist_target

⛔ What you cannot do with a plain table:

insert more data into a table after it’s built
delete data from it
create/delete/alter a plain table online (you need to define it in a configuration file)
use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

Except numeric attributes (including MVA)), the rest of the data in a plain table is immutable. If you need to update/add new records you need to rebuild the table. While table is being rebuilt, existing table is still available for serving requests. When a new version of the table is ready, a process called rotation is performed which puts the new version online and discards the old one.

Plain table example

Plain table example

A plain table can be only defined in a configuration file. It’s not supported by command CREATE TABLE

source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }
table tbl {
  type   = plain
  source = source
  path   = /path/to/table
 }

Plain table building performance

Speed of plain indexing depends on several factors:

how fast the source can be providing the data
tokenization settings
your hardware (CPU, amount of RAM, disk performance)

Plain table building scenarios

Rebuild fully when needed

In the simplest usage scenario, we would use a single plain table which we just fully rebuild from time to time. It works fine for smaller data sets and if you are ready that:

the table will be not as fresh as data in the source
indexing duration grows with the data, the more data you have in the source the longer it will take to build the table

Main+delta

If you have a bigger data set and still want to use a plain table rather than Real-Time what you can do is:

make another smaller table for incremental indexing
combine the both using a distributed table

What it can give is you can rebuild the bigger table seldom (say once per week), save the position of the freshest indexed document and after that use the smaller table to process anything new or updated from your source. Since you will only need to fetch the updates from your storage you can do it much more frequently (say once per minute or even each few seconds).

But after a while the smaller indexing duration will become too high and that will be the moment when you need to rebuild the bigger table and empty the smaller one.

This is called main+delta schema and you can learn more about it in this interactive course.

When you build a smaller “delta” table it can get documents that are already in the “main” table. To let Manticore know that documents from the current table should take precedence there’s a mechanism called kill list and corresponding directive killlist_target.

More information on this topic can be found here.

Plain table files structure

Extension	Description
`.spa`	stores document attributes in row-wise mode
`.spb`	stores blob attributes in row-wise mode: strings, MVA, json
`.spc`	stores document attributes in columnar mode
`.spd`	stores matching document ID lists for each word ID
`.sph`	stores table header information
`.sphi`	stores histograms of attribute values
`.spi`	stores word lists (word IDs and pointers to `.spd` file)
`.spidx`	stores secondary indexes data
`.spk`	stores kill-lists
`.spl`	lock file
`.spm`	stores a bitmap of killed documents
`.spp`	stores hit (aka posting, aka word occurrence) lists for each word ID
`.spt`	stores additional data structures to speed up lookups by document ids
`.spe`	stores skip-lists to speed up doc-list filtering
`.spds`	stores document texts
`.tmp`	temporary files during index_settings_and_status
`.new.sp`	new version of a plain table before rotation
`.old.sp*`	old version of a plain table after rotation

Plain and real-time table settings

Defining table schema in a configuration file

table <index_name>[:<parent table name>] {
...
}

Plain
Real-time

Plain Real-time

table <table name> {
  type = plain
  path = /path/to/table
  source = <source_name>
  source = <another source_name>
  [stored_fields = <comma separated list of full-text fields that should be stored, all are stored by default, can be empty>]
}

table <table name> {
  type = rt
  path = /path/to/table
  rt_field = <full-text field name>
  rt_field = <another full-text field name>
  [rt_attr_uint = <integer field name>]
  [rt_attr_uint = <another integer field name, limit by N bits>:N]
  [rt_attr_bigint = <bigint field name>]
  [rt_attr_bigint = <another bigint field name>]
  [rt_attr_multi = <multi-integer (MVA) field name>]
  [rt_attr_multi = <another multi-integer (MVA) field name>]
  [rt_attr_multi_64 = <multi-bigint (MVA) field name>]
  [rt_attr_multi_64 = <another multi-bigint (MVA) field name>]
  [rt_attr_float = <float field name>]
  [rt_attr_float = <another float field name>]
  [rt_attr_bool = <boolean field name>]
  [rt_attr_bool = <another boolean field name>]
  [rt_attr_string = <string field name>]
  [rt_attr_string = <another string field name>]
  [rt_attr_json = <json field name>]
  [rt_attr_json = <another json field name>]
  [rt_attr_timestamp = <timestamp field name>]
  [rt_attr_timestamp = <another timestamp field name>]
  [stored_fields = <comma separated list of full-text fields that should be stored, all are stored by default, can be empty>]
  [rt_mem_limit = <RAM chunk max size, default 128M>]
  [optimize_cutoff = <max number of RT table disk chunks>]
}

Common plain and real-time tables settings

type

type = plain
type = rt

Table type: “plain” or “rt” (real-time)

Value: plain (default), rt

path

path = path/to/table

Absolute or relative path without extension where to store the table or where to look for it

Value: path to the table, mandatory

stored_fields

stored_fields = title, content

By default when a table is defined in a configuration file, full-text fields’ original content is both indexed and stored. This setting lets you specify the fields that should have their original values stored.

Value: comma separated list of full-text fields that should be stored. Empty value (i.e. stored_fields =) disables storing original values for all the fields.

Note, in case of a real-time table the fields listed in stored_fields should be also declared as rt_field.

Note also, that you don’t need to list attributes in stored_fields, since their original values are stored anyway. stored_fields can be only used for full-text fields.

See also docstore_block_size, docstore_compression for document storage compression options.

SQL
JSON
PHP
Python
Javascript
Java
CONFIG

SQL JSON PHP Python Javascript Java CONFIG

CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)

POST /cli -d "
CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)"

$params = [
    'body' => [
        'columns' => [
            'title'=>['type'=>'text'],
            'content'=>['type'=>'text', 'options' => ['indexed', 'stored']],
            'name'=>['type'=>'text', 'options' => ['indexed']],
            'price'=>['type'=>'float']
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);

utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)')

res = await utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)');

utilsApi.sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)");

table products {
  stored_fields = title, content # we want to store only "title" and "content", "name" shouldn't be stored
  type = rt
  path = tbl
  rt_field = title
  rt_field = content
  rt_field = name
  rt_attr_uint = price
}

stored_only_fields

stored_only_fields = title,content

List of fields that will be stored in the table, but will not be indexed. Similar to stored_fields except when a field is specified in stored_only_fields it is only stored, not indexed and can’t be searched with full-text queries. It can only be returned with search results.

Value: comma separated list of fields that should be stored only, not indexed. Default is empty. Note, in case of a real-time table the fields listed in stored_only_fields should be also declared as rt_field.

Note also, that you don’t need to list attributes in stored_only_fields, since their original values are stored anyway. If to compare stored_only_fields to string attributes the former (stored field):

is stored on disk and doesn’t require memory
is stored compressed
can be only fetched, you can’t sort/filter/group by the value

The latter (string attribute) is:

stored on disk and in memory
stored uncompressed
can be used for sorting, grouping, filtering and anything else you want to do with attributes.

Real-time table settings:

optimize_cutoff

Max number of RT table disk chunks. Read more here.

rt_field

rt_field = subject

Full-text fields to be indexed. The names must be unique. The order is preserved; and so field values in INSERT statements without an explicit list of inserted columns will have to be in the same order as configured.

Full-text field declaration. Multi-value, optional.

rt_attr_uint

rt_attr_uint = gid

Unsigned integer attribute declaration

Value: field_name or field_name:N, can be multiple records. N is the max number of bits to keep.

rt_attr_bigint

rt_attr_bigint = gid

BIGINT attribute declaration

Value: field name, multiple records allowed

rt_attr_multi

rt_attr_multi = tags

Multi-valued attribute (MVA) declaration. Declares the UNSIGNED INTEGER (unsigned 32-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional.