- Creating a table
- Data types
- Data types
Creating a table
Data types
Full-text fields and attributes
Manticore’s data types can be split into full-text fields and attributes.
Full-text fields
Full-text fields:
- can be indexed with natural language processing algorithms, therefore can be searched for keywords
- cannot be used for sorting or grouping
- original document’s content can be retrieved
- original document’s content can be used for highlighting
Full-text fields are represented data type text
. All the other data types are called “attributes”.
Attributes
Attributes are non-full-text values associated with each document that can be used to perform non-full-text filtering, sorting and grouping during search.
It is often desired to process full-text search results based not only on matching document ID and its rank, but on a number of other per-document values as well. For instance, one might need to sort news search results by date and then relevance, or search through products within specified price range, or limit blog search to posts made by selected users, or group results by month. To do that efficiently, Manticore enables not only full-text fields, but additional attributes to each document. It’s then possible to use them to filter, sort, or group full-text matches or search only by attributes.
The attributes, unlike full-text fields, are not full-text indexed. They are stored in the table, but it is not possible to search them as full-text.
A good example for attributes would be a forum posts table. Assume that only title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (ie. search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.
- SQL
- JSON
- PHP
- Python
- Javascript
- Java
- config
SQL JSON PHP Python Javascript Java config
CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp);
POST /cli -d "CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)"
$index = new \Manticoresearch\Index($client);
$index->setName('forum');
$index->create([
'title'=>['type'=>'text'],
'content'=>['type'=>'text'],
'author_id'=>['type'=>'int'],
'forum_id'=>['type'=>'int'],
'post_date'=>['type'=>'timestamp']
]);
utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)')
res = await utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)');
utilsApi.sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
table forum
{
type = rt
path = forum
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
rt_field = content
# this option needs to be specified for the field to be stored
stored_fields = title, content
rt_attr_uint = author_id
rt_attr_uint = forum_id
rt_attr_timestamp = post_date
}
This example shows running a full-text query filtered by author_id
, forum_id
and sorted by post_date
.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc
POST /search
{
"index": "forum",
"query":
{
"match_all": {},
"bool":
{
"must":
[
{ "equals": { "author_id": 123 } },
{ "in": { "forum_id": [1,3,7] } }
]
}
},
"sort": [ { "post_date": "desc" } ]
}
$client->search([
'index' => 'forum',
'query' =>
[
'match_all' => [],
'bool' => [
'must' => [
'equals' => ['author_id' => 123],
'in' => [
'forum_id' => [
1,3,7
]
]
]
]
],
'sort' => [
['post_date' => 'desc']
]
]);
searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]})
res = await searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]});
HashMap<String,Object> filters = new HashMap<String,Object>(){{
put("must", new HashMap<String,Object>(){{
put("equals",new HashMap<String,Integer>(){{
put("author_id",123);
}});
put("in",
new HashMap<String,Object>(){{
put("forum_id",new int[] {1,3,7});
}});
}});
}};
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("bool",filters);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("post_date","desc");}});
}});
SearchResponse searchResponse = searchApi.search(searchRequest);
Row-wise and columnar attribute storages
Manticore supports two types of attribute storages:
- row-wise - traditional storage available in Manticore Search out of the box
- columnar - provided by Manticore Columnar Library
As can be understood from their names, they store data differently. The traditional row-wise storage:
- stores attributes uncompressed
- all attributes of the same document are stored in one row close to each other
- rows are stored one by one
- accessing attributes is basically done by just multiplying rowid by stride (length of a single vector) and getting the requested attribute from the calculated memory location. It gives very low random access latency
- attributes have to be in memory to get acceptable performance, otherwise due to the row-wise nature of the storage Manticore may have to read from disk too much unneded data which is in many cases suboptimal.
With the columnar storage:
- each attribute is stored independently from all other attributes in its separate “column”
- storage is split into blocks of 65536 entries
- the blocks are stored compressed. This often allows to store just a few distinct values instead of storing all of them like in the row-wise storage. High compression ratio allows to read from disk faster and makes the memory requirement much lower
- when data is indexed, storage scheme is selected for each block independently. For example, if all values in a block are the same, it gets “const” storage and only one value is stored for the whole block. If there are less than 256 unique values per block, it gets “table” storage and stores indexes to a table of values instead of the values themselves
- search in a block can be early rejected if it’s clear the requested value is not present in the block.
The columnar storage was designed to handle large data volume that does not fit into RAM, so the recommendations are:
- if you have enough memory for all your attributes you will benefit from the row-wise storage
- otherwise the columnar storage can still give you decent performance with much lower memory footprint which will allow you to store much more documents in your table
How to switch between the storages
The traditional row-wise storage is default, so if you want everything to be stored in a row-wise fashion you don’t need to do anything when you create a table.
To enable the columnar storage you need to:
specify
engine='columnar'
in CREATE TABLE to make all attributes of the table columnar. Then if you want to keep a specific attribute row-wise you need to addengine='rowwise'
when you declare it. For example:create table tbl(title text, type int, price float engine='rowwise') engine='columnar'
specify
engine='columnar'
for a specific attribute inCREATE TABLE
to make it columnar. For example:create table tbl(title text, type int, price float engine='columnar');
or
create table tbl(title text, type int, price float engine='columnar') engine='rowwise';
in the plain mode you need to list attributes you want to be columnar in columnar_attrs.
Below is the list of data types supported by Manticore Search:
Document ID
The document identifier is a mandatory attribute. It must be a unique, signed 64-bit integer. While the document ID can be specified explicitly, it is still enabled even if not specified. Document IDs cannot be UPDATE‘ed.
- Explicit ID
- Implicit ID
Explicit ID Implicit ID
When you create a table you can specify ID explicitly, but no matter what datatype you use it will be always as said previously - signed 64-bit integer.
CREATE TABLE tbl(id bigint, content text);
DESC tbl;
You can also omit specifying ID at all, it will be enabled automatically.
CREATE TABLE tbl(content text);
DESC tbl;
Response
+---------+--------+----------------+
| Field | Type | Properties |
+---------+--------+----------------+
| id | bigint | |
| content | text | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
+---------+--------+----------------+
| Field | Type | Properties |
+---------+--------+----------------+
| id | bigint | |
| content | text | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
Character data types
General syntax:
string|text [stored|attribute] [indexed]
Properties:
indexed
- full-text indexed (can be used in full-text queries)stored
- stored in a docstore (stored on disk, not in RAM, lazy read)attribute
- makes it string attribute (can sort/group by it)
Specifying at least one property overrides all the default ones (see below), i.e. if you decide to use a custom combination of properties you need to list all the properties you want.
No properties specified:
string
and text
are aliases, but if you don’t specify any properties they by default means different things:
- just
string
by default meansattribute
(see details below). - just
text
by default meansstored
+indexed
(see details below).
Text
Text (just text
or text/string indexed
) data type forms the full-text part of the table. Text fields are indexed and can be searched for keywords.
Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations etc. Eventually a full-text table (a special data structure that enables quick searches for a keyword) gets built from that text.
Full-text fields can only be used in MATCH()
clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong and positions in the field. This allows to search a word inside each field and to use advanced operators like proximity. By default the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and it can be used in search result highlighting.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text);
POST /cli -d "CREATE TABLE products(title text)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text']
]);
utilsApi.sql('CREATE TABLE products(title text)')
res = await utilsApi.sql('CREATE TABLE products(title text)');
utilsApi.sql("CREATE TABLE products(title text)");
table products
{
type = rt
path = products
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
# this option needs to be specified for the field to be stored
stored_fields = title
}
This behavior can be overridden by explicitly specifying that the text is only indexed.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text indexed);
POST /cli -d "CREATE TABLE products(title text indexed)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text','options'=>['indexed']]
]);
utilsApi.sql('CREATE TABLE products(title text indexed)')
res = await utilsApi.sql('CREATE TABLE products(title text indexed)');
utilsApi.sql("CREATE TABLE products(title text indexed)");
table products
{
type = rt
path = products
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
}
Fields are named, and you can limit your searches to a single field (e.g. search through “title” only) or a subset of fields (eg. to “title” and “abstract” only). Manticore table format generally supports up to 256 full-text fields.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products where match('@title first');
POST /search
{
"index": "products",
"query":
{
"match": { "title": "first" }
}
}
$index->setName('products')->search('@title')->get();
searchApi.search({"index":"products","query":{"match":{"title":"first"}}})
res = await searchApi.search({"index":"products","query":{"match":{"title":"first"}}});
utilsApi.sql("CREATE TABLE products(title text indexed)");
String
Unlike full-text fields, string attributes (just string
or string/text attribute
) are stored as they are received and cannot be used in full-text searches. Instead they are returned in results, they can be used in WHERE
clause for comparison filtering or REGEX
and they can be used for sorting and aggregation. In general it’s not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.
If want to also index the string attribute, can specify both as string attribute indexed
. Will allow full-text searching, and works as an attribute.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, keys string);
POST /cli -d "CREATE TABLE products(title text, keys string)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'keys'=>['type'=>'string']
]);
utilsApi.sql('CREATE TABLE products(title text, keys string)')
res = await utilsApi.sql('CREATE TABLE products(title text, keys string)');
utilsApi.sql("CREATE TABLE products(title text, keys string)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_string = keys
}
MORE
Integer
Integer type allows storing 32 bit unsigned integer values.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, price int);
POST /cli -d "CREATE TABLE products(title text, price int)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'price'=>['type'=>'int']
]);
utilsApi.sql('CREATE TABLE products(title text, price int)')
res = await utilsApi.sql('CREATE TABLE products(title text, price int)');
utilsApi.sql("CREATE TABLE products(title text, price int)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_uint = type
}
Integers can be stored in shorter sizes than 32 bit by specifying a bit count. For example if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3)
. Bitcount integers perform slower than the full size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space they should be grouped at the end of attributes definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, flags bit(3), tags bit(2) );
POST /cli -d "CREATE TABLE products(title text, flags bit(3), tags bit(2))"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'flags'=>['type'=>'bit(3)'],
'tags'=>['type'=>'bit(2)']
]);
utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ')
res = await utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ');
utilsApi.sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_uint = flags:3
rt_attr_uint = tags:2
}
Big Integer
Big integers are 64-bit wide signed integers.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, price bigint );
POST /cli -d "CREATE TABLE products(title text, price bigint)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'price'=>['type'=>'bigint']
]);
utilsApi.sql('CREATE TABLE products(title text, price bigint )')
res = await utilsApi.sql('CREATE TABLE products(title text, price bigint )');
utilsApi.sql("CREATE TABLE products(title text, price bigint )");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_bigint = type
}
Boolean
Declares a boolean attribute. It’s equivalent to an integer attribute with bit count of 1.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, sold bool );
POST /cli -d "CREATE TABLE products(title text, sold bool)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'sold'=>['type'=>'bool']
]);
utilsApi.sql('CREATE TABLE products(title text, sold bool )')
res = await utilsApi.sql('CREATE TABLE products(title text, sold bool )');
utilsApi.sql("CREATE TABLE products(title text, sold bool )");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_bool = sold
}
Timestamps
Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, date timestamp);
POST /cli -d "CREATE TABLE products(title text, date timestamp)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'date'=>['type'=>'timestamp']
]);
utilsApi.sql('CREATE TABLE products(title text, date timestamp)')
res = await utilsApi.sql('CREATE TABLE products(title text, date timestamp)');
utilsApi.sql("CREATE TABLE products(title text, date timestamp)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_timestamp = date
}
Float
Real numbers are stored as 32-bit IEEE 754 single precision floats.
- SQL
- JSON
- PHP
- Python
- java
- javascript
- config
SQL JSON PHP Python java javascript config
CREATE TABLE products(title text, coeff float);
POST /cli -d "CREATE TABLE products(title text, coeff float)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'coeff'=>['type'=>'float']
]);
utilsApi.sql('CREATE TABLE products(title text, coeff float)')
utilsApi.sql("CREATE TABLE products(title text, coeff float)");
res = await utilsApi.sql('CREATE TABLE products(title text, coeff float)');
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_float = coeff
}
Unlike integer types, equal comparison of floats is forbidden due to rounding errors. A near equal can be used instead, by checking the absolute error margin.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select abs(a-b)<=0.00001 from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "eps": "abs(a-b)" }
}
$index->setName('products')->search('')->expression('eps','abs(a-b)')->get();
searchApi.search({"index":"products","query":{"match_all":{}},"expressions":{"eps":"abs(a-b)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"eps":"abs(a-b)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("ebs","abs(a-b)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Another alternative, which can also be used to perform IN(attr,val1,val2,val3)
is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. Example illustrates modifying IN(attr,2.0,2.5,3.5)
to work with integer values.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select in(ceil(attr*100),200,250,350) from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "inc": "in(ceil(attr*100),200,250,350)" }
}
$index->setName('products')->search('')->expression('inc','in(ceil(attr*100),200,250,350)')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("inc","in(ceil(attr*100),200,250,350)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
JSON
This data type allows storing JSON objects for schema-less data. It is not supported by the columnar storage, but since you can combine the both storages in the same table you can have it stored in the traditional storage instead.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, data json);
POST /cli -d "CREATE TABLE products(title text, data json)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'data'=>['type'=>'json']
]);
utilsApi.sql('CREATE TABLE products(title text, data json)')
res = await utilsApi.sql('CREATE TABLE products(title text, data json)');
utilsApi.sql'CREATE TABLE products(title text, data json)');
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_json = data
}
JSON properties can be used in most operations. There are also special functions such as ALL()), ANY()), GREATEST()), LEAST()) and INDEXOF()) that allow traversal of property arrays.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select indexof(x>2 for x in data.intarray) from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "idx": "indexof(x>2 for x in data.intarray)" }
}
$index->setName('products')->search('')->expression('idx','indexof(x>2 for x in data.intarray)')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Text properties are treated same as strings so it’s not possible to use them in full-text matches expressions, but string functions like REGEX()) can be used.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select regex(data.name, 'est') as c from products where c>0
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"range": { "c": { "gt": 0 } } }
},
"expressions": { "c": "regex(data.name, 'est')" }
}
$index->setName('products')->search('')->expression('idx',"regex(data.name, 'est')")->filter('c','gt',0)->get();
searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("range", new HashMap<String,Object>(){{
put("c", new HashMap<String,Object>(){{
put("gt",0);
}});
}});
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
In case of JSON properties, enforcing data type is required to be casted in some situations for proper functionality. For example in case of float values DOUBLE()) must be used for proper sorting.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products order by double(data.myfloat) desc
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"sort": [ { "double(data.myfloat)": { "order": "desc"} } ]
}
$index->setName('products')->search('')->sort('double(data.myfloat)','desc')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("double(data.myfloat)",new HashMap<String,String>(){{ put("order","desc");}});}});
}});
searchResponse = searchApi.search(searchRequest);
Multi-value integer (MVA)
Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. It can be used to store one-to-many numeric values like tags, product categories, properties.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, product_codes multi);
POST /cli -d "CREATE TABLE products(title text, product_codes multi)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'product_codes'=>['type'=>'multi']
]);
utilsApi.sql('CREATE TABLE products(title text, product_codes multi)')
res = await utilsApi.sql('CREATE TABLE products(title text, product_codes multi)');
utilsApi.sql("CREATE TABLE products(title text, product_codes multi)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_multi = product_codes
}
It supports filtering and aggregation, but not sorting. Filtering can be made of a condition that requires at least one element to pass (using ANY())) or all (ALL())).
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products where any(product_codes)=3
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"equals" : { "any(product_codes)": 3 }
}
}
$index->setName('products')->search('')->filter('any(product_codes)','equals',3)->get();
searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})'
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("equals",new HashMap<String,Integer>(){{
put("any(product_codes)",3);
}});
searchRequest.setQuery(query);
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Information like least) or greatest) element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select least(product_codes) l from products order by l asc
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"sort": [ { "product_codes":{ "order":"asc", "mode":"min" } } ]
}
}
$index->setName('products')->search('')->sort('product_codes','asc','min')->get();
searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("product_codes",new HashMap<String,String>(){{ put("order","asc");put("mode","min");}});}});
}});
searchResponse = searchApi.search(searchRequest);
When grouping by multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if the collection contains exactly 1 document having a ‘product_codes’ multi-value attribute with values 5, 7, and 11, grouping on ‘product_codes’ will produce 3 groups with COUNT(*)
equal to 1 and GROUPBY()
key values of 5, 7, and 11 respectively. Also note that grouping by multi-value attributes might lead to duplicate documents in the result set because each document can participate in many groups.
- SQL
SQL
insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;
Response
Query OK, 1 row affected (0.00 sec)
+------+----------+-----------+
| id | count(*) | groupby() |
+------+----------+-----------+
| 1 | 1 | 11 |
| 1 | 1 | 7 |
| 1 | 1 | 5 |
+------+----------+-----------+
3 rows in set (0.00 sec)
The order of the numbers inserted as values of multi-valued attributes is not preserved. Values are stored internally as a sorted set.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
insert into product values (1,'first',(4,2,1,3));
select * from products;
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title":"first",
"product_codes":[4,2,1,3]
}
}
POST /search
{
"index": "products",
"query": { "match_all": {} }
}
$index->addDocument([
"title"=>"first",
"product_codes"=>[4,2,1,3]
]);
$index->search('')-get();
indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}})
searchApi.search({"index":"products","query":{"match_all":{}}})
await indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}});
res = await searchApi.search({"index":"products","query":{"match_all":{}}});
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","first");
put("product_codes",new int[] {4,2,1,3});
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("products");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
System.out.println(searchResponse.toString() );
Response
Query OK, 1 row affected (0.00 sec)
+------+---------------+-------+
| id | product_codes | title |
+------+---------------+-------+
| 1 | 1,2,3,4 | first |
+------+---------------+-------+
1 row in set (0.01 sec)
{
"_index":"products",
"_id":1,
"created":true,
"result":"created",
"status":201
}
{
"took":0,
"timed_out":false,
"hits":{
"total":1,
"hits":[
{
"_id":"1",
"_score":1,
"_source":{
"product_codes":[
1,
2,
3,
4
],
"title":"first"
}
}
]
}
}
Array
(
[_index] => products
[_id] => 1
[created] => 1
[result] => created
[status] => 201
)
Array
(
[took] => 0
[timed_out] =>
[hits] => Array
(
[total] => 1
[hits] => Array
(
[0] => Array
(
[_id] => 1
[_score] => 1
[_source] => Array
(
[product_codes] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
)
[title] => first
)
)
)
)
)
{'created': True,
'found': None,
'id': 1,
'index': 'products',
'result': 'created'}
{'hits': {'hits': [{u'_id': u'1',
u'_score': 1,
u'_source': {u'product_codes': [1, 2, 3, 4],
u'title': u'first'}}],
'total': 1},
'profile': None,
'timed_out': False,
'took': 29}
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1,"_source":{"product_codes":[1,2,3,4],"title":"first"}}]}}
class SearchResponse {
took: 0
timedOut: false
hits: class SearchResponseHits {
total: 1
hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
aggregations: null
}
profile: null
}
Multi-value big integer
A data type type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, values multi64);
POST /cli -d "CREATE TABLE products(title text, values multi64)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'values'=>['type'=>'multi64']
]);
utilsApi.sql('CREATE TABLE products(title text, values multi64))')
res = await utilsApi.sql('CREATE TABLE products(title text, values multi64))');
utilsApi.sql("CREATE TABLE products(title text, values multi64))");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_multi_64 = values
}
Columnar attribute properties
When you use the columnar storage you can specify the following properties for the attributes.
fast_fetch
By default Manticore Columnar storage stores all attributes not only in columnar fashion, but in a special docstore row by row which enables fast execution of queries like SELECT * FROM ...
especially when you are fetching lots of records at once. But if you are sure you don’t need it or want to save disk space you can disable it by specifying fast_fetch='0'
when you create a table or (if you are defining a table in a config) use columnar_no_fast_fetch
as shown in the following example.
- RT mode
- Plain mode
RT mode Plain mode
create table t(a int, b int fast_fetch='0') engine='columnar'; desc t;
source min {
type = mysql
sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_query = select 1, 1 a, 1 b
sql_attr_uint = a
sql_attr_uint = b
}
table tbl {
path = tbl/col
source = min
columnar_attrs = *
columnar_no_fast_fetch = b
}
Response
+-------+--------+---------------------+
| Field | Type | Properties |
+-------+--------+---------------------+
| id | bigint | columnar fast_fetch |
| a | uint | columnar fast_fetch |
| b | uint | columnar |
+-------+--------+---------------------+
3 rows in set (0.00 sec)
+-------+--------+---------------------+
| Field | Type | Properties |
+-------+--------+---------------------+
| id | bigint | columnar fast_fetch |
| a | uint | columnar fast_fetch |
| b | uint | columnar |
+-------+--------+---------------------+
Data types
Full-text fields and attributes
Manticore’s data types can be split into full-text fields and attributes.
Full-text fields
Full-text fields:
- can be indexed with natural language processing algorithms, therefore can be searched for keywords
- cannot be used for sorting or grouping
- original document’s content can be retrieved
- original document’s content can be used for highlighting
Full-text fields are represented data type text
. All the other data types are called “attributes”.
Attributes
Attributes are non-full-text values associated with each document that can be used to perform non-full-text filtering, sorting and grouping during search.
It is often desired to process full-text search results based not only on matching document ID and its rank, but on a number of other per-document values as well. For instance, one might need to sort news search results by date and then relevance, or search through products within specified price range, or limit blog search to posts made by selected users, or group results by month. To do that efficiently, Manticore enables not only full-text fields, but additional attributes to each document. It’s then possible to use them to filter, sort, or group full-text matches or search only by attributes.
The attributes, unlike full-text fields, are not full-text indexed. They are stored in the table, but it is not possible to search them as full-text.
A good example for attributes would be a forum posts table. Assume that only title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (ie. search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.
- SQL
- JSON
- PHP
- Python
- Javascript
- Java
- config
SQL JSON PHP Python Javascript Java config
CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp);
POST /cli -d "CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)"
$index = new \Manticoresearch\Index($client);
$index->setName('forum');
$index->create([
'title'=>['type'=>'text'],
'content'=>['type'=>'text'],
'author_id'=>['type'=>'int'],
'forum_id'=>['type'=>'int'],
'post_date'=>['type'=>'timestamp']
]);
utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)')
res = await utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)');
utilsApi.sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
table forum
{
type = rt
path = forum
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
rt_field = content
# this option needs to be specified for the field to be stored
stored_fields = title, content
rt_attr_uint = author_id
rt_attr_uint = forum_id
rt_attr_timestamp = post_date
}
This example shows running a full-text query filtered by author_id
, forum_id
and sorted by post_date
.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc
POST /search
{
"index": "forum",
"query":
{
"match_all": {},
"bool":
{
"must":
[
{ "equals": { "author_id": 123 } },
{ "in": { "forum_id": [1,3,7] } }
]
}
},
"sort": [ { "post_date": "desc" } ]
}
$client->search([
'index' => 'forum',
'query' =>
[
'match_all' => [],
'bool' => [
'must' => [
'equals' => ['author_id' => 123],
'in' => [
'forum_id' => [
1,3,7
]
]
]
]
],
'sort' => [
['post_date' => 'desc']
]
]);
searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]})
res = await searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]});
HashMap<String,Object> filters = new HashMap<String,Object>(){{
put("must", new HashMap<String,Object>(){{
put("equals",new HashMap<String,Integer>(){{
put("author_id",123);
}});
put("in",
new HashMap<String,Object>(){{
put("forum_id",new int[] {1,3,7});
}});
}});
}};
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("bool",filters);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("post_date","desc");}});
}});
SearchResponse searchResponse = searchApi.search(searchRequest);
Row-wise and columnar attribute storages
Manticore supports two types of attribute storages:
- row-wise - traditional storage available in Manticore Search out of the box
- columnar - provided by Manticore Columnar Library
As can be understood from their names, they store data differently. The traditional row-wise storage:
- stores attributes uncompressed
- all attributes of the same document are stored in one row close to each other
- rows are stored one by one
- accessing attributes is basically done by just multiplying rowid by stride (length of a single vector) and getting the requested attribute from the calculated memory location. It gives very low random access latency
- attributes have to be in memory to get acceptable performance, otherwise due to the row-wise nature of the storage Manticore may have to read from disk too much unneded data which is in many cases suboptimal.
With the columnar storage:
- each attribute is stored independently from all other attributes in its separate “column”
- storage is split into blocks of 65536 entries
- the blocks are stored compressed. This often allows to store just a few distinct values instead of storing all of them like in the row-wise storage. High compression ratio allows to read from disk faster and makes the memory requirement much lower
- when data is indexed, storage scheme is selected for each block independently. For example, if all values in a block are the same, it gets “const” storage and only one value is stored for the whole block. If there are less than 256 unique values per block, it gets “table” storage and stores indexes to a table of values instead of the values themselves
- search in a block can be early rejected if it’s clear the requested value is not present in the block.
The columnar storage was designed to handle large data volume that does not fit into RAM, so the recommendations are:
- if you have enough memory for all your attributes you will benefit from the row-wise storage
- otherwise the columnar storage can still give you decent performance with much lower memory footprint which will allow you to store much more documents in your table
How to switch between the storages
The traditional row-wise storage is default, so if you want everything to be stored in a row-wise fashion you don’t need to do anything when you create a table.
To enable the columnar storage you need to:
specify
engine='columnar'
in CREATE TABLE to make all attributes of the table columnar. Then if you want to keep a specific attribute row-wise you need to addengine='rowwise'
when you declare it. For example:create table tbl(title text, type int, price float engine='rowwise') engine='columnar'
specify
engine='columnar'
for a specific attribute inCREATE TABLE
to make it columnar. For example:create table tbl(title text, type int, price float engine='columnar');
or
create table tbl(title text, type int, price float engine='columnar') engine='rowwise';
in the plain mode you need to list attributes you want to be columnar in columnar_attrs.
Below is the list of data types supported by Manticore Search:
Document ID
The document identifier is a mandatory attribute. It must be a unique, signed 64-bit integer. While the document ID can be specified explicitly, it is still enabled even if not specified. Document IDs cannot be UPDATE‘ed.
- Explicit ID
- Implicit ID
Explicit ID Implicit ID
When you create a table you can specify ID explicitly, but no matter what datatype you use it will be always as said previously - signed 64-bit integer.
CREATE TABLE tbl(id bigint, content text);
DESC tbl;
You can also omit specifying ID at all, it will be enabled automatically.
CREATE TABLE tbl(content text);
DESC tbl;
Response
+---------+--------+----------------+
| Field | Type | Properties |
+---------+--------+----------------+
| id | bigint | |
| content | text | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
+---------+--------+----------------+
| Field | Type | Properties |
+---------+--------+----------------+
| id | bigint | |
| content | text | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
Character data types
General syntax:
string|text [stored|attribute] [indexed]
Properties:
indexed
- full-text indexed (can be used in full-text queries)stored
- stored in a docstore (stored on disk, not in RAM, lazy read)attribute
- makes it string attribute (can sort/group by it)
Specifying at least one property overrides all the default ones (see below), i.e. if you decide to use a custom combination of properties you need to list all the properties you want.
No properties specified:
string
and text
are aliases, but if you don’t specify any properties they by default means different things:
- just
string
by default meansattribute
(see details below). - just
text
by default meansstored
+indexed
(see details below).
Text
Text (just text
or text/string indexed
) data type forms the full-text part of the table. Text fields are indexed and can be searched for keywords.
Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations etc. Eventually a full-text table (a special data structure that enables quick searches for a keyword) gets built from that text.
Full-text fields can only be used in MATCH()
clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong and positions in the field. This allows to search a word inside each field and to use advanced operators like proximity. By default the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and it can be used in search result highlighting.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text);
POST /cli -d "CREATE TABLE products(title text)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text']
]);
utilsApi.sql('CREATE TABLE products(title text)')
res = await utilsApi.sql('CREATE TABLE products(title text)');
utilsApi.sql("CREATE TABLE products(title text)");
table products
{
type = rt
path = products
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
# this option needs to be specified for the field to be stored
stored_fields = title
}
This behavior can be overridden by explicitly specifying that the text is only indexed.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text indexed);
POST /cli -d "CREATE TABLE products(title text indexed)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text','options'=>['indexed']]
]);
utilsApi.sql('CREATE TABLE products(title text indexed)')
res = await utilsApi.sql('CREATE TABLE products(title text indexed)');
utilsApi.sql("CREATE TABLE products(title text indexed)");
table products
{
type = rt
path = products
# when configuring fields via config, they are indexed (and not stored) by default
rt_field = title
}
Fields are named, and you can limit your searches to a single field (e.g. search through “title” only) or a subset of fields (eg. to “title” and “abstract” only). Manticore table format generally supports up to 256 full-text fields.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products where match('@title first');
POST /search
{
"index": "products",
"query":
{
"match": { "title": "first" }
}
}
$index->setName('products')->search('@title')->get();
searchApi.search({"index":"products","query":{"match":{"title":"first"}}})
res = await searchApi.search({"index":"products","query":{"match":{"title":"first"}}});
utilsApi.sql("CREATE TABLE products(title text indexed)");
String
Unlike full-text fields, string attributes (just string
or string/text attribute
) are stored as they are received and cannot be used in full-text searches. Instead they are returned in results, they can be used in WHERE
clause for comparison filtering or REGEX
and they can be used for sorting and aggregation. In general it’s not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.
If want to also index the string attribute, can specify both as string attribute indexed
. Will allow full-text searching, and works as an attribute.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, keys string);
POST /cli -d "CREATE TABLE products(title text, keys string)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'keys'=>['type'=>'string']
]);
utilsApi.sql('CREATE TABLE products(title text, keys string)')
res = await utilsApi.sql('CREATE TABLE products(title text, keys string)');
utilsApi.sql("CREATE TABLE products(title text, keys string)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_string = keys
}
MORE
Integer
Integer type allows storing 32 bit unsigned integer values.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, price int);
POST /cli -d "CREATE TABLE products(title text, price int)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'price'=>['type'=>'int']
]);
utilsApi.sql('CREATE TABLE products(title text, price int)')
res = await utilsApi.sql('CREATE TABLE products(title text, price int)');
utilsApi.sql("CREATE TABLE products(title text, price int)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_uint = type
}
Integers can be stored in shorter sizes than 32 bit by specifying a bit count. For example if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3)
. Bitcount integers perform slower than the full size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space they should be grouped at the end of attributes definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, flags bit(3), tags bit(2) );
POST /cli -d "CREATE TABLE products(title text, flags bit(3), tags bit(2))"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'flags'=>['type'=>'bit(3)'],
'tags'=>['type'=>'bit(2)']
]);
utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ')
res = await utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ');
utilsApi.sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_uint = flags:3
rt_attr_uint = tags:2
}
Big Integer
Big integers are 64-bit wide signed integers.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, price bigint );
POST /cli -d "CREATE TABLE products(title text, price bigint)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'price'=>['type'=>'bigint']
]);
utilsApi.sql('CREATE TABLE products(title text, price bigint )')
res = await utilsApi.sql('CREATE TABLE products(title text, price bigint )');
utilsApi.sql("CREATE TABLE products(title text, price bigint )");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_bigint = type
}
Boolean
Declares a boolean attribute. It’s equivalent to an integer attribute with bit count of 1.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, sold bool );
POST /cli -d "CREATE TABLE products(title text, sold bool)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'sold'=>['type'=>'bool']
]);
utilsApi.sql('CREATE TABLE products(title text, sold bool )')
res = await utilsApi.sql('CREATE TABLE products(title text, sold bool )');
utilsApi.sql("CREATE TABLE products(title text, sold bool )");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_bool = sold
}
Timestamps
Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, date timestamp);
POST /cli -d "CREATE TABLE products(title text, date timestamp)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'date'=>['type'=>'timestamp']
]);
utilsApi.sql('CREATE TABLE products(title text, date timestamp)')
res = await utilsApi.sql('CREATE TABLE products(title text, date timestamp)');
utilsApi.sql("CREATE TABLE products(title text, date timestamp)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_timestamp = date
}
Float
Real numbers are stored as 32-bit IEEE 754 single precision floats.
- SQL
- JSON
- PHP
- Python
- java
- javascript
- config
SQL JSON PHP Python java javascript config
CREATE TABLE products(title text, coeff float);
POST /cli -d "CREATE TABLE products(title text, coeff float)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'coeff'=>['type'=>'float']
]);
utilsApi.sql('CREATE TABLE products(title text, coeff float)')
utilsApi.sql("CREATE TABLE products(title text, coeff float)");
res = await utilsApi.sql('CREATE TABLE products(title text, coeff float)');
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_float = coeff
}
Unlike integer types, equal comparison of floats is forbidden due to rounding errors. A near equal can be used instead, by checking the absolute error margin.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select abs(a-b)<=0.00001 from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "eps": "abs(a-b)" }
}
$index->setName('products')->search('')->expression('eps','abs(a-b)')->get();
searchApi.search({"index":"products","query":{"match_all":{}},"expressions":{"eps":"abs(a-b)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"eps":"abs(a-b)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("ebs","abs(a-b)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Another alternative, which can also be used to perform IN(attr,val1,val2,val3)
is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. Example illustrates modifying IN(attr,2.0,2.5,3.5)
to work with integer values.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select in(ceil(attr*100),200,250,350) from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "inc": "in(ceil(attr*100),200,250,350)" }
}
$index->setName('products')->search('')->expression('inc','in(ceil(attr*100),200,250,350)')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("inc","in(ceil(attr*100),200,250,350)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
JSON
This data type allows storing JSON objects for schema-less data. It is not supported by the columnar storage, but since you can combine the both storages in the same table you can have it stored in the traditional storage instead.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, data json);
POST /cli -d "CREATE TABLE products(title text, data json)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'data'=>['type'=>'json']
]);
utilsApi.sql('CREATE TABLE products(title text, data json)')
res = await utilsApi.sql('CREATE TABLE products(title text, data json)');
utilsApi.sql'CREATE TABLE products(title text, data json)');
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_json = data
}
JSON properties can be used in most operations. There are also special functions such as ALL()), ANY()), GREATEST()), LEAST()) and INDEXOF()) that allow traversal of property arrays.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select indexof(x>2 for x in data.intarray) from products
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"expressions": { "idx": "indexof(x>2 for x in data.intarray)" }
}
$index->setName('products')->search('')->expression('idx','indexof(x>2 for x in data.intarray)')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Text properties are treated same as strings so it’s not possible to use them in full-text matches expressions, but string functions like REGEX()) can be used.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select regex(data.name, 'est') as c from products where c>0
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"range": { "c": { "gt": 0 } } }
},
"expressions": { "c": "regex(data.name, 'est')" }
}
$index->setName('products')->search('')->expression('idx',"regex(data.name, 'est')")->filter('c','gt',0)->get();
searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("range", new HashMap<String,Object>(){{
put("c", new HashMap<String,Object>(){{
put("gt",0);
}});
}});
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
In case of JSON properties, enforcing data type is required to be casted in some situations for proper functionality. For example in case of float values DOUBLE()) must be used for proper sorting.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products order by double(data.myfloat) desc
POST /search
{
"index": "products",
"query": { "match_all": {} } },
"sort": [ { "double(data.myfloat)": { "order": "desc"} } ]
}
$index->setName('products')->search('')->sort('double(data.myfloat)','desc')->get();
searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]})
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("double(data.myfloat)",new HashMap<String,String>(){{ put("order","desc");}});}});
}});
searchResponse = searchApi.search(searchRequest);
Multi-value integer (MVA)
Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. It can be used to store one-to-many numeric values like tags, product categories, properties.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, product_codes multi);
POST /cli -d "CREATE TABLE products(title text, product_codes multi)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'product_codes'=>['type'=>'multi']
]);
utilsApi.sql('CREATE TABLE products(title text, product_codes multi)')
res = await utilsApi.sql('CREATE TABLE products(title text, product_codes multi)');
utilsApi.sql("CREATE TABLE products(title text, product_codes multi)");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_multi = product_codes
}
It supports filtering and aggregation, but not sorting. Filtering can be made of a condition that requires at least one element to pass (using ANY())) or all (ALL())).
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select * from products where any(product_codes)=3
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"equals" : { "any(product_codes)": 3 }
}
}
$index->setName('products')->search('')->filter('any(product_codes)','equals',3)->get();
searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})'
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("equals",new HashMap<String,Integer>(){{
put("any(product_codes)",3);
}});
searchRequest.setQuery(query);
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
Information like least) or greatest) element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
select least(product_codes) l from products order by l asc
POST /search
{
"index": "products",
"query":
{
"match_all": {},
"sort": [ { "product_codes":{ "order":"asc", "mode":"min" } } ]
}
}
$index->setName('products')->search('')->sort('product_codes','asc','min')->get();
searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}})
res = await searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}});
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
add(new HashMap<String,String>(){{ put("product_codes",new HashMap<String,String>(){{ put("order","asc");put("mode","min");}});}});
}});
searchResponse = searchApi.search(searchRequest);
When grouping by multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if the collection contains exactly 1 document having a ‘product_codes’ multi-value attribute with values 5, 7, and 11, grouping on ‘product_codes’ will produce 3 groups with COUNT(*)
equal to 1 and GROUPBY()
key values of 5, 7, and 11 respectively. Also note that grouping by multi-value attributes might lead to duplicate documents in the result set because each document can participate in many groups.
- SQL
SQL
insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;
Response
Query OK, 1 row affected (0.00 sec)
+------+----------+-----------+
| id | count(*) | groupby() |
+------+----------+-----------+
| 1 | 1 | 11 |
| 1 | 1 | 7 |
| 1 | 1 | 5 |
+------+----------+-----------+
3 rows in set (0.00 sec)
The order of the numbers inserted as values of multi-valued attributes is not preserved. Values are stored internally as a sorted set.
- SQL
- JSON
- PHP
- Python
- javascript
- java
SQL JSON PHP Python javascript java
insert into product values (1,'first',(4,2,1,3));
select * from products;
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title":"first",
"product_codes":[4,2,1,3]
}
}
POST /search
{
"index": "products",
"query": { "match_all": {} }
}
$index->addDocument([
"title"=>"first",
"product_codes"=>[4,2,1,3]
]);
$index->search('')-get();
indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}})
searchApi.search({"index":"products","query":{"match_all":{}}})
await indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}});
res = await searchApi.search({"index":"products","query":{"match_all":{}}});
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","first");
put("product_codes",new int[] {4,2,1,3});
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("products");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
System.out.println(searchResponse.toString() );
Response
Query OK, 1 row affected (0.00 sec)
+------+---------------+-------+
| id | product_codes | title |
+------+---------------+-------+
| 1 | 1,2,3,4 | first |
+------+---------------+-------+
1 row in set (0.01 sec)
{
"_index":"products",
"_id":1,
"created":true,
"result":"created",
"status":201
}
{
"took":0,
"timed_out":false,
"hits":{
"total":1,
"hits":[
{
"_id":"1",
"_score":1,
"_source":{
"product_codes":[
1,
2,
3,
4
],
"title":"first"
}
}
]
}
}
Array
(
[_index] => products
[_id] => 1
[created] => 1
[result] => created
[status] => 201
)
Array
(
[took] => 0
[timed_out] =>
[hits] => Array
(
[total] => 1
[hits] => Array
(
[0] => Array
(
[_id] => 1
[_score] => 1
[_source] => Array
(
[product_codes] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
)
[title] => first
)
)
)
)
)
{'created': True,
'found': None,
'id': 1,
'index': 'products',
'result': 'created'}
{'hits': {'hits': [{u'_id': u'1',
u'_score': 1,
u'_source': {u'product_codes': [1, 2, 3, 4],
u'title': u'first'}}],
'total': 1},
'profile': None,
'timed_out': False,
'took': 29}
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1,"_source":{"product_codes":[1,2,3,4],"title":"first"}}]}}
class SearchResponse {
took: 0
timedOut: false
hits: class SearchResponseHits {
total: 1
hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
aggregations: null
}
profile: null
}
Multi-value big integer
A data type type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.
- SQL
- JSON
- PHP
- Python
- javascript
- java
- config
SQL JSON PHP Python javascript java config
CREATE TABLE products(title text, values multi64);
POST /cli -d "CREATE TABLE products(title text, values multi64)"
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
'title'=>['type'=>'text'],
'values'=>['type'=>'multi64']
]);
utilsApi.sql('CREATE TABLE products(title text, values multi64))')
res = await utilsApi.sql('CREATE TABLE products(title text, values multi64))');
utilsApi.sql("CREATE TABLE products(title text, values multi64))");
table products
{
type = rt
path = products
rt_field = title
stored_fields = title
rt_attr_multi_64 = values
}
Columnar attribute properties
When you use the columnar storage you can specify the following properties for the attributes.
fast_fetch
By default Manticore Columnar storage stores all attributes not only in columnar fashion, but in a special docstore row by row which enables fast execution of queries like SELECT * FROM ...
especially when you are fetching lots of records at once. But if you are sure you don’t need it or want to save disk space you can disable it by specifying fast_fetch='0'
when you create a table or (if you are defining a table in a config) use columnar_no_fast_fetch
as shown in the following example.
- RT mode
- Plain mode
RT mode Plain mode
create table t(a int, b int fast_fetch='0') engine='columnar'; desc t;
source min {
type = mysql
sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_query = select 1, 1 a, 1 b
sql_attr_uint = a
sql_attr_uint = b
}
table tbl {
path = tbl/col
source = min
columnar_attrs = *
columnar_no_fast_fetch = b
}
Response
+-------+--------+---------------------+
| Field | Type | Properties |
+-------+--------+---------------------+
| id | bigint | columnar fast_fetch |
| a | uint | columnar fast_fetch |
| b | uint | columnar |
+-------+--------+---------------------+
3 rows in set (0.00 sec)
+-------+--------+---------------------+
| Field | Type | Properties |
+-------+--------+---------------------+
| id | bigint | columnar fast_fetch |
| a | uint | columnar fast_fetch |
| b | uint | columnar |
+-------+--------+---------------------+