Grouping search results

It’s often useful to group search results to get per-group match counts or other aggregations. For instance, to draw a nice graph of how much matching blog posts were there per month or to group web search results by site or forum posts by author etc.

Manticore supports grouping of search results by one or multiple columns and computed expressions. The results can:

  • be sorted inside a group
  • have more than one row returned per group
  • have groups filtered
  • have groups sorted
  • be aggregated with help of the aggregation functions
  • SQL
  • JSON

SQL JSON

General syntax

  1. SELECT {* | SELECT_expr [, SELECT_expr ...]}
  2. ...
  3. GROUP BY {field_name | alias } [, ...]
  4. [HAVING where_condition]
  5. [WITHIN GROUP ORDER BY field_name {ASC | DESC} [, ...]]
  6. ...
  7. SELECT_expr: { field_name | function_name(...) }
  8. where_condition: {aggregation expression alias | COUNT(*)}

JSON query format supports currently a simple grouping that can retrieve the aggregate values and their count(*).

  1. {
  2. "index": "<index_name>",
  3. "limit": 0,
  4. "aggs": {
  5. "<aggr_name>": {
  6. "terms": {
  7. "field": "<attribute>",
  8. "size": <int value>
  9. }
  10. }
  11. }
  12. }

The normal query output returns the result set without grouping and can be hidden with limit (or size). The aggregation requires to set a size for the size of the result set group.

Just Grouping

Grouping is very simple - just add “GROUP BY smth” to the end of your SELECT query. The something can be:

  • any non-full-text field from the table: integer, float, string, MVA (multi-value attribute)
  • or if you used an alias in the SELECT list - you can GROUP BY it too

You can omit any aggregation functions in the SELECT list and it will work too:

  • SQL

SQL

  1. SELECT release_year FROM films GROUP BY release_year LIMIT 5;

Response

  1. +--------------+
  2. | release_year |
  3. +--------------+
  4. | 2004 |
  5. | 2002 |
  6. | 2001 |
  7. | 2005 |
  8. | 2000 |
  9. +--------------+

But in most cases you want to get something aggregated for each group, for example:

  • COUNT(*) to just get number of elements in each groups
  • or AVG(field) to get an average value of the field in the group
  • SQL1
  • SQL2
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL1 SQL2 JSON PHP Python Javascript Java

  1. SELECT release_year, count(*) FROM films GROUP BY release_year LIMIT 5;
  1. SELECT release_year, AVG(rental_rate) FROM films GROUP BY release_year LIMIT 5;
  1. POST /search -d '
  2. {
  3. "index" : "films",
  4. "limit": 0,
  5. "aggs" :
  6. {
  7. "release_year" :
  8. {
  9. "terms" :
  10. {
  11. "field":"release_year",
  12. "size":100
  13. }
  14. }
  15. }
  16. }
  17. '
  1. $index->setName('films');
  2. $search = $index->search('');
  3. $search->limit(0);
  4. $search->facet('release_year','release_year',100);
  5. $results = $search->get();
  6. print_r($results->getFacets());
  1. res =searchApi.search({"index":"films","limit":0,"aggs":{"release_year":{"terms":{"field":"release_year","size":100}}}})
  1. res = await searchApi.search({"index":"films","limit":0,"aggs":{"release_year":{"terms":{"field":"release_year","size":100}}}});
  1. HashMap<String,Object> aggs = new HashMap<String,Object>(){{
  2. put("release_year", new HashMap<String,Object>(){{
  3. put("terms", new HashMap<String,Object>(){{
  4. put("field","release_year");
  5. put("size",100);
  6. }});
  7. }});
  8. }};
  9. searchRequest = new SearchRequest();
  10. searchRequest.setIndex("films");
  11. searchRequest.setLimit(0);
  12. query = new HashMap<String,Object>();
  13. query.put("match_all",null);
  14. searchRequest.setQuery(query);
  15. searchRequest.setAggs(aggs);
  16. searchResponse = searchApi.search(searchRequest);

Response

  1. +--------------+----------+
  2. | release_year | count(*) |
  3. +--------------+----------+
  4. | 2004 | 108 |
  5. | 2002 | 108 |
  6. | 2001 | 91 |
  7. | 2005 | 93 |
  8. | 2000 | 97 |
  9. +--------------+----------+
  1. +--------------+------------------+
  2. | release_year | avg(rental_rate) |
  3. +--------------+------------------+
  4. | 2004 | 2.78629661 |
  5. | 2002 | 3.08259249 |
  6. | 2001 | 3.09989142 |
  7. | 2005 | 2.90397978 |
  8. | 2000 | 3.17556739 |
  9. +--------------+------------------+
  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 10000,
  6. "hits": [
  7. ]
  8. },
  9. "release_year": {
  10. "group_brand_id": {
  11. "buckets": [
  12. {
  13. "key": 2004,
  14. "doc_count": 108
  15. },
  16. {
  17. "key": 2002,
  18. "doc_count": 108
  19. },
  20. {
  21. "key": 2000,
  22. "doc_count": 97
  23. },
  24. {
  25. "key": 2005,
  26. "doc_count": 93
  27. },
  28. {
  29. "key": 2001,
  30. "doc_count": 91
  31. }
  32. ]
  33. }
  34. }
  35. }
  1. Array
  2. (
  3. [release_year] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => 2009
  10. [doc_count] => 99
  11. )
  12. [1] => Array
  13. (
  14. [key] => 2008
  15. [doc_count] => 102
  16. )
  17. [2] => Array
  18. (
  19. [key] => 2007
  20. [doc_count] => 93
  21. )
  22. [3] => Array
  23. (
  24. [key] => 2006
  25. [doc_count] => 103
  26. )
  27. [4] => Array
  28. (
  29. [key] => 2005
  30. [doc_count] => 93
  31. )
  32. [5] => Array
  33. (
  34. [key] => 2004
  35. [doc_count] => 108
  36. )
  37. [6] => Array
  38. (
  39. [key] => 2003
  40. [doc_count] => 106
  41. )
  42. [7] => Array
  43. (
  44. [key] => 2002
  45. [doc_count] => 108
  46. )
  47. [8] => Array
  48. (
  49. [key] => 2001
  50. [doc_count] => 91
  51. )
  52. [9] => Array
  53. (
  54. [key] => 2000
  55. [doc_count] => 97
  56. )
  57. )
  58. )
  59. )
  1. {'aggregations': {u'release_year': {u'buckets': [{u'doc_count': 99,
  2. u'key': 2009},
  3. {u'doc_count': 102,
  4. u'key': 2008},
  5. {u'doc_count': 93,
  6. u'key': 2007},
  7. {u'doc_count': 103,
  8. u'key': 2006},
  9. {u'doc_count': 93,
  10. u'key': 2005},
  11. {u'doc_count': 108,
  12. u'key': 2004},
  13. {u'doc_count': 106,
  14. u'key': 2003},
  15. {u'doc_count': 108,
  16. u'key': 2002},
  17. {u'doc_count': 91,
  18. u'key': 2001},
  19. {u'doc_count': 97,
  20. u'key': 2000}]}},
  21. 'hits': {'hits': [], 'max_score': None, 'total': 1000},
  22. 'profile': None,
  23. 'timed_out': False,
  24. 'took': 0}
  1. {"took":0,"timed_out":false,"aggregations":{"release_year":{"buckets":[{"key":2009,"doc_count":99},{"key":2008,"doc_count":102},{"key":2007,"doc_count":93},{"key":2006,"doc_count":103},{"key":2005,"doc_count":93},{"key":2004,"doc_count":108},{"key":2003,"doc_count":106},{"key":2002,"doc_count":108},{"key":2001,"doc_count":91},{"key":2000,"doc_count":97}]}},"hits":{"total":1000,"hits":[]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {release_year={buckets=[{key=2009, doc_count=99}, {key=2008, doc_count=102}, {key=2007, doc_count=93}, {key=2006, doc_count=103}, {key=2005, doc_count=93}, {key=2004, doc_count=108}, {key=2003, doc_count=106}, {key=2002, doc_count=108}, {key=2001, doc_count=91}, {key=2000, doc_count=97}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 1000
  8. hits: []
  9. }
  10. profile: null
  11. }
Sorting groups

By default the groups are not sorted and the next thing you normally want to do is to order them by something. For example the field you are grouping by:

  • SQL

SQL

  1. SELECT release_year, count(*) from films GROUP BY release_year ORDER BY release_year asc limit 5;

Response

  1. +--------------+----------+
  2. | release_year | count(*) |
  3. +--------------+----------+
  4. | 2000 | 97 |
  5. | 2001 | 91 |
  6. | 2002 | 108 |
  7. | 2003 | 106 |
  8. | 2004 | 108 |
  9. +--------------+----------+

Or vice-versa - by the aggregation:

  • by count(*) to see those groups that have most elements first
  • by avg(rental_rate) to see most rated movies first. Note that in the example it’s done via an alias: avg(rental_rate) is first mapped to avg in the SELECT list and then we just do ORDER BY avg
  • SQL1
  • SQL2

SQL1 SQL2

  1. SELECT release_year, count(*) FROM films GROUP BY release_year ORDER BY count(*) desc LIMIT 5;
  1. SELECT release_year, AVG(rental_rate) avg FROM films GROUP BY release_year ORDER BY avg desc LIMIT 5;

Response

  1. +--------------+----------+
  2. | release_year | count(*) |
  3. +--------------+----------+
  4. | 2004 | 108 |
  5. | 2002 | 108 |
  6. | 2003 | 106 |
  7. | 2006 | 103 |
  8. | 2008 | 102 |
  9. +--------------+----------+
  1. +--------------+------------+
  2. | release_year | avg |
  3. +--------------+------------+
  4. | 2006 | 3.26184368 |
  5. | 2000 | 3.17556739 |
  6. | 2001 | 3.09989142 |
  7. | 2002 | 3.08259249 |
  8. | 2008 | 2.99000049 |
  9. +--------------+------------+
GROUP BY multiple fields at once

In some cases you might want to group not by a single, but by multiple fields at once, for example movie’s category and year:

  • SQL

SQL

  1. SELECT category_id, release_year, count(*) FROM films GROUP BY category_id, release_year ORDER BY category_id ASC, release_year ASC;

Response

  1. +-------------+--------------+----------+
  2. | category_id | release_year | count(*) |
  3. +-------------+--------------+----------+
  4. | 1 | 2000 | 5 |
  5. | 1 | 2001 | 2 |
  6. | 1 | 2002 | 6 |
  7. | 1 | 2003 | 6 |
  8. | 1 | 2004 | 5 |
  9. | 1 | 2005 | 10 |
  10. | 1 | 2006 | 4 |
  11. | 1 | 2007 | 5 |
  12. | 1 | 2008 | 7 |
  13. | 1 | 2009 | 14 |
  14. | 2 | 2000 | 10 |
  15. | 2 | 2001 | 5 |
  16. | 2 | 2002 | 6 |
  17. | 2 | 2003 | 6 |
  18. | 2 | 2004 | 10 |
  19. | 2 | 2005 | 4 |
  20. | 2 | 2006 | 5 |
  21. | 2 | 2007 | 8 |
  22. | 2 | 2008 | 8 |
  23. | 2 | 2009 | 4 |
  24. +-------------+--------------+----------+
Give me N rows

Sometimes it’s useful to see not a single element per group, but multiple. This can be easily done with help of GROUP N BY. For example here you can see that we get 2 movies for each year rather than one which would simple GROUP BY release_year returned.

  • SQL

SQL

  1. SELECT release_year, title FROM films GROUP 2 BY release_year ORDER BY release_year DESC LIMIT 6;

Response

  1. +--------------+-----------------------------+
  2. | release_year | title |
  3. +--------------+-----------------------------+
  4. | 2009 | ALICE FANTASIA |
  5. | 2009 | ALIEN CENTER |
  6. | 2008 | AMADEUS HOLY |
  7. | 2008 | ANACONDA CONFESSIONS |
  8. | 2007 | ANGELS LIFE |
  9. | 2007 | ARACHNOPHOBIA ROLLERCOASTER |
  10. +--------------+-----------------------------+
Sorting inside a group

Another essential analytics demand is to sort elements within a group. For that there’s WITHIN GROUP ORDER BY ... {ASC|DESC} clause. For example let’s get the most rated film for each year. Note it works in parallel with just ORDER BY:

  • WITHIN GROUP ORDER BY sorts results inside a group
  • while just GROUP BY sorts the groups themselves

The work absolutely independently.

  • SQL

SQL

  1. SELECT release_year, title, rental_rate FROM films GROUP BY release_year WITHIN GROUP ORDER BY rental_rate DESC ORDER BY release_year DESC LIMIT 5;

Response

  1. +--------------+------------------+-------------+
  2. | release_year | title | rental_rate |
  3. +--------------+------------------+-------------+
  4. | 2009 | AMERICAN CIRCUS | 4.990000 |
  5. | 2008 | ANTHEM LUKE | 4.990000 |
  6. | 2007 | ATTACKS HATE | 4.990000 |
  7. | 2006 | ALADDIN CALENDAR | 4.990000 |
  8. | 2005 | AIRPLANE SIERRA | 4.990000 |
  9. +--------------+------------------+-------------+
Filter groups

HAVING expression is a useful clause to filter groups. If WHERE is applied before grouping HAVING works with the groups. For example let’s leave only those years when an average rental rate of the films of that year was higher than 3. We get only 4 years:

  • SQL

SQL

  1. SELECT release_year, avg(rental_rate) avg FROM films GROUP BY release_year HAVING avg > 3;

Response

  1. +--------------+------------+
  2. | release_year | avg |
  3. +--------------+------------+
  4. | 2002 | 3.08259249 |
  5. | 2001 | 3.09989142 |
  6. | 2000 | 3.17556739 |
  7. | 2006 | 3.26184368 |
  8. +--------------+------------+
GROUPBY()

There is a function GROUPBY() which returns the key of the current group. It’s useful in many cases especially when you GROUP BY an MVA) or a JSON value.

It can be also used in HAVING to for example leave only years 2000 and 2002.

Note that GROUPBY()is not recommended for use when you GROUP BY multiple fields at once. It will still work, but since the group key in this case is compound of field values it may look not the way you expect.

  • SQL

SQL

  1. SELECT release_year, count(*) FROM films GROUP BY release_year HAVING GROUPBY() IN (2000, 2002);

Response

  1. +--------------+----------+
  2. | release_year | count(*) |
  3. +--------------+----------+
  4. | 2002 | 108 |
  5. | 2000 | 97 |
  6. +--------------+----------+
Grouping by MVA (multi-value attributes)

Manticore supports grouping by MVA). To show how it works let’s create a table “shoes” with MVA “sizes” and insert few documents into it:

  1. create table shoes(title text, sizes multi);
  2. insert into shoes values(0,'nike',(40,41,42)),(0,'adidas',(41,43)),(0,'reebook',(42,43));

so we have:

  1. SELECT * FROM shoes;
  2. +---------------------+----------+---------+
  3. | id | sizes | title |
  4. +---------------------+----------+---------+
  5. | 1657851069130080265 | 40,41,42 | nike |
  6. | 1657851069130080266 | 41,43 | adidas |
  7. | 1657851069130080267 | 42,43 | reebook |
  8. +---------------------+----------+---------+

If we now GROUP BY “sizes” it will process all our multi-value attributes and will return aggregation for each, in this case just count:

  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL JSON PHP Python Javascript Java

  1. SELECT groupby() gb, count(*) FROM shoes GROUP BY sizes ORDER BY gb asc;
  1. POST /search -d '
  2. {
  3. "index" : "shoes",
  4. "limit": 0,
  5. "aggs" :
  6. {
  7. "sizes" :
  8. {
  9. "terms" :
  10. {
  11. "field":"sizes",
  12. "size":100
  13. }
  14. }
  15. }
  16. }
  17. '
  1. $index->setName('shoes');
  2. $search = $index->search('');
  3. $search->limit(0);
  4. $search->facet('sizes','sizes',100);
  5. $results = $search->get();
  6. print_r($results->getFacets());
  1. res =searchApi.search({"index":"shoes","limit":0,"aggs":{"sizes":{"terms":{"field":"sizes","size":100}}}})
  1. res = await searchApi.search({"index":"shoes","limit":0,"aggs":{"sizes":{"terms":{"field":"sizes","size":100}}}});
  1. HashMap<String,Object> aggs = new HashMap<String,Object>(){{
  2. put("release_year", new HashMap<String,Object>(){{
  3. put("terms", new HashMap<String,Object>(){{
  4. put("field","release_year");
  5. put("size",100);
  6. }});
  7. }});
  8. }};
  9. searchRequest = new SearchRequest();
  10. searchRequest.setIndex("films");
  11. searchRequest.setLimit(0);
  12. query = new HashMap<String,Object>();
  13. query.put("match_all",null);
  14. searchRequest.setQuery(query);
  15. searchRequest.setAggs(aggs);
  16. searchResponse = searchApi.search(searchRequest);

Response

  1. +------+----------+
  2. | gb | count(*) |
  3. +------+----------+
  4. | 40 | 1 |
  5. | 41 | 2 |
  6. | 42 | 2 |
  7. | 43 | 2 |
  8. +------+----------+
  1. {
  2. "took": 0,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 3,
  6. "hits": [
  7. ]
  8. },
  9. "aggregations": {
  10. "sizes": {
  11. "buckets": [
  12. {
  13. "key": 43,
  14. "doc_count": 2
  15. },
  16. {
  17. "key": 42,
  18. "doc_count": 2
  19. },
  20. {
  21. "key": 41,
  22. "doc_count": 2
  23. },
  24. {
  25. "key": 40,
  26. "doc_count": 1
  27. }
  28. ]
  29. }
  30. }
  31. }
  1. Array
  2. (
  3. [sizes] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => 43
  10. [doc_count] => 2
  11. )
  12. [1] => Array
  13. (
  14. [key] => 42
  15. [doc_count] => 2
  16. )
  17. [2] => Array
  18. (
  19. [key] => 41
  20. [doc_count] => 2
  21. )
  22. [3] => Array
  23. (
  24. [key] => 40
  25. [doc_count] => 1
  26. )
  27. )
  28. )
  29. )
  1. {'aggregations': {u'sizes': {u'buckets': [{u'doc_count': 2, u'key': 43},
  2. {u'doc_count': 2, u'key': 42},
  3. {u'doc_count': 2, u'key': 41},
  4. {u'doc_count': 1, u'key': 40}]}},
  5. 'hits': {'hits': [], 'max_score': None, 'total': 3},
  6. 'profile': None,
  7. 'timed_out': False,
  8. 'took': 0}
  1. {"took":0,"timed_out":false,"aggregations":{"sizes":{"buckets":[{"key":43,"doc_count":2},{"key":42,"doc_count":2},{"key":41,"doc_count":2},{"key":40,"doc_count":1}]}},"hits":{"total":3,"hits":[]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {release_year={buckets=[{key=43, doc_count=2}, {key=42, doc_count=2}, {key=41, doc_count=2}, {key=40, doc_count=1}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 3
  8. hits: []
  9. }
  10. profile: null
  11. }
Grouping by a JSON node

If you have a field of type JSON you can GROUP BY any node from it. To demonstrate it let’s create a table “products” with few documents each having color in the “meta” JSON field:

  1. create table products(title text, meta json);
  2. insert into products values(0,'nike','{"color":"red"}'),(0,'adidas','{"color":"red"}'),(0,'puma','{"color":"green"}');

This gives us:

  1. SELECT * FROM products;
  2. +---------------------+-------------------+--------+
  3. | id | meta | title |
  4. +---------------------+-------------------+--------+
  5. | 1657851069130080268 | {"color":"red"} | nike |
  6. | 1657851069130080269 | {"color":"red"} | adidas |
  7. | 1657851069130080270 | {"color":"green"} | puma |
  8. +---------------------+-------------------+--------+

To group the products by color we can just GROUP BY meta.color and to show the corresponding group key in the SELECT list we can use GROUPBY():

  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL JSON PHP Python Javascript Java

  1. SELECT groupby() color, count(*) from products GROUP BY meta.color;
  1. POST /search -d '
  2. {
  3. "index" : "products",
  4. "limit": 0,
  5. "aggs" :
  6. {
  7. "color" :
  8. {
  9. "terms" :
  10. {
  11. "field":"meta.color",
  12. "size":100
  13. }
  14. }
  15. }
  16. }
  17. '
  1. $index->setName('products');
  2. $search = $index->search('');
  3. $search->limit(0);
  4. $search->facet('meta.color','color',100);
  5. $results = $search->get();
  6. print_r($results->getFacets());
  1. res =searchApi.search({"index":"products","limit":0,"aggs":{"color":{"terms":{"field":"meta.color","size":100}}}})
  1. res = await searchApi.search({"index":"products","limit":0,"aggs":{"color":{"terms":{"field":"meta.color","size":100}}}});
  1. HashMap<String,Object> aggs = new HashMap<String,Object>(){{
  2. put("color", new HashMap<String,Object>(){{
  3. put("terms", new HashMap<String,Object>(){{
  4. put("field","meta.color");
  5. put("size",100);
  6. }});
  7. }});
  8. }};
  9. searchRequest = new SearchRequest();
  10. searchRequest.setIndex("products");
  11. searchRequest.setLimit(0);
  12. query = new HashMap<String,Object>();
  13. query.put("match_all",null);
  14. searchRequest.setQuery(query);
  15. searchRequest.setAggs(aggs);
  16. searchResponse = searchApi.search(searchRequest);

Response

  1. +-------+----------+
  2. | color | count(*) |
  3. +-------+----------+
  4. | red | 2 |
  5. | green | 1 |
  6. +-------+----------+
  1. {
  2. "took": 0,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 3,
  6. "hits": [
  7. ]
  8. },
  9. "aggregations": {
  10. "color": {
  11. "buckets": [
  12. {
  13. "key": "green",
  14. "doc_count": 1
  15. },
  16. {
  17. "key": "red",
  18. "doc_count": 2
  19. }
  20. ]
  21. }
  22. }
  23. }
  1. Array
  2. (
  3. [color] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => green
  10. [doc_count] => 1
  11. )
  12. [1] => Array
  13. (
  14. [key] => red
  15. [doc_count] => 2
  16. )
  17. )
  18. )
  19. )
  1. {'aggregations': {u'color': {u'buckets': [{u'doc_count': 1,
  2. u'key': u'green'},
  3. {u'doc_count': 2, u'key': u'red'}]}},
  4. 'hits': {'hits': [], 'max_score': None, 'total': 3},
  5. 'profile': None,
  6. 'timed_out': False,
  7. 'took': 0}
  1. {"took":0,"timed_out":false,"aggregations":{"color":{"buckets":[{"key":"green","doc_count":1},{"key":"red","doc_count":2}]}},"hits":{"total":3,"hits":[]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {color={buckets=[{key=green, doc_count=1}, {key=red, doc_count=2}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 3
  8. hits: []
  9. }
  10. profile: null
  11. }

Aggregation functions

Besides COUNT(*) which returns number of elements in each group you can use different other aggregation functions:

COUNT(DISTINCT field)

While COUNT(*) returns number of all elements in the group COUNT( DISTINCT field) returns number of different values of the field in the group which may be absolutely different from the total count: you can have 100 elements in the group, but all with the same value of some field. COUNT(DISTINCT field) helps to figure that out. To demonstrate it let’s create table “students” with student’s name, age and major:

  1. CREATE TABLE students(name text, age int, major string);
  2. INSERT INTO students values(0,'John',21,'arts'),(0,'William',22,'business'),(0,'Richard',21,'cs'),(0,'Rebecca',22,'cs'),(0,'Monica',21,'arts');

so we have:

  1. MySQL [(none)]> SELECT * from students;
  2. +---------------------+------+----------+---------+
  3. | id | age | major | name |
  4. +---------------------+------+----------+---------+
  5. | 1657851069130080271 | 21 | arts | John |
  6. | 1657851069130080272 | 22 | business | William |
  7. | 1657851069130080273 | 21 | cs | Richard |
  8. | 1657851069130080274 | 22 | cs | Rebecca |
  9. | 1657851069130080275 | 21 | arts | Monica |
  10. +---------------------+------+----------+---------+

In the example you can see that if we GROUP BY major and show both COUNT(*) and COUNT(DISTINCT age) it gets clear that there are 2 students that chose major “cs” and 2 unique ages, but for the major “arts” there are also 2 students, but only one unique age.

There can be at most one COUNT(DISTINCT) per query.

COUNT(DISTINCT) against a distributed table or a real-time table consisting of multiple disk chunks may return inaccurate results, but the result should be accurate for a distributed table consisting of local plain or real-time tables with the same schema (identical set/order of fields, but may be different tokenization settings).

  • SQL

SQL

  1. SELECT major, count(*), count(distinct age) FROM students GROUP BY major;

Response

  1. +----------+----------+---------------------+
  2. | major | count(*) | count(distinct age) |
  3. +----------+----------+---------------------+
  4. | arts | 2 | 1 |
  5. | business | 1 | 1 |
  6. | cs | 2 | 2 |
  7. +----------+----------+---------------------+
GROUP_CONCAT(field)

Often you want to understand better the contents of each group. You can use GROUP N BY for that, but it would return additional rows you might not want in the output. GROUP_CONCAT() enriches your grouping by concatenating values of some field in the group. Let’s take the previous example and improve it by getting all the ages in each group.

GROUP_CONCAT(field) returns the list comma-separated.

  • SQL

SQL

  1. SELECT major, count(*), count(distinct age), group_concat(age) FROM students GROUP BY major

Response

  1. +----------+----------+---------------------+-------------------+
  2. | major | count(*) | count(distinct age) | group_concat(age) |
  3. +----------+----------+---------------------+-------------------+
  4. | arts | 2 | 1 | 21,21 |
  5. | business | 1 | 1 | 22 |
  6. | cs | 2 | 2 | 21,22 |
  7. +----------+----------+---------------------+-------------------+
SUM(), MIN(), MAX(), AVG()

And of course you can get sum, average, minimal and maximum values in the group.

  • SQL

SQL

  1. SELECT release_year year, sum(rental_rate) sum, min(rental_rate) min, max(rental_rate) max, avg(rental_rate) avg FROM films GROUP BY release_year ORDER BY year asc LIMIT 5;

Response

  1. +------+------------+----------+----------+------------+
  2. | year | sum | min | max | avg |
  3. +------+------------+----------+----------+------------+
  4. | 2000 | 308.030029 | 0.990000 | 4.990000 | 3.17556739 |
  5. | 2001 | 282.090118 | 0.990000 | 4.990000 | 3.09989142 |
  6. | 2002 | 332.919983 | 0.990000 | 4.990000 | 3.08259249 |
  7. | 2003 | 310.940063 | 0.990000 | 4.990000 | 2.93339682 |
  8. | 2004 | 300.920044 | 0.990000 | 4.990000 | 2.78629661 |
  9. +------+------------+----------+----------+------------+

Grouping accuracy

Grouping is done in fixed memory, which depends on the max_matches setting. If max_matches allows for storage of all found groups, the results will be 100% accurate. However, if the value of max_matches is lower, the results will be less accurate.

When parallel processing is involved, it can become more complicated. When pseudo_sharding is enabled and/or when using an RT index with several disk chunks, each chunk or pseudo shard gets a result set that is no larger than max_matches. This can lead to inaccuracies in aggregates and group counts when the results sets from different threads are merged. To fix this, either a larger max_matches value or disabling parallel processing can be used.

Manticore will try to increase max_matches up to max_matches_increase_threshold if it detects that groupby may return inaccurate results. Detection is based on the number of unique values of the groupby attribute, which is retrieved from secondary indexes (if present).

To ensure accurate aggregates and/or group counts when using RT indexes or pseudo_sharding, accurate_aggregation can be enabled. This will try to increase max_matches up to the threshold, and if the threshold is not high enough, Manticore will disable parallel processing for the query.

  • SQL

SQL

  1. MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5;
  2. +------+----------+
  3. | year | count(*) |
  4. +------+----------+
  5. | 2004 | 108 |
  6. | 2002 | 108 |
  7. | 2001 | 91 |
  8. | 2005 | 93 |
  9. | 2000 | 97 |
  10. +------+----------+
  11. MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=1;
  12. +------+----------+
  13. | year | count(*) |
  14. +------+----------+
  15. | 2004 | 76 |
  16. +------+----------+
  17. MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=2;
  18. +------+----------+
  19. | year | count(*) |
  20. +------+----------+
  21. | 2004 | 76 |
  22. | 2002 | 74 |
  23. +------+----------+
  24. MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=3;
  25. +------+----------+
  26. | year | count(*) |
  27. +------+----------+
  28. | 2004 | 108 |
  29. | 2002 | 108 |
  30. | 2001 | 91 |
  31. +------+----------+

Faceted search

Faceted search is as essential function of a modern search application as autocomplete, spell correction and search keywords highlighting. Especially in E-commerce products.

Faceted search

It comes to the rescue when we deal with large amounts of data and various properties related to each other, whether it is size, color, manufacturer or something else. When querying large amounts of data search results often include large swaths of entries which does not fit user’s expectations. Faceted search allows an end-user to explicitly specify the dimensions that they want their search results to meet.

In Manticore Search there is an optimization that retains the result set of the original query and reuses it for each facet calculation. As the aggregations are applied to already calculated subset of documents they are fast and the total execution time can be in many cases only marginally bigger than the initial query. Facets can be added to any query and the facet can be any attribute or expression. A facet result contains the facet values along with the facet counts. Facets are available using SQL SELECT statement by declaring them at the very end of the query.

Aggregations

SQL

The facet values can come from an attribute, JSON property from a JSON attribute or expression. Facet values can be also aliased, however the alias must be unique across all result sets (main query results set and other facets results sets). The facet value is taken from the aggregated attribute/expression, but it can also come from another attribute/expression.

  1. FACET {expr_list} [BY {expr_list} ] [DISTINCT {field_name}] [ORDER BY {expr | FACET()} {ASC | DESC}] [LIMIT [offset,] count]

Multiple facet declarations need to be separated with a whitespace.

HTTP JSON

Facets can be defined in the aggs node:

  1. "aggs" :
  2. {
  3. "group name" :
  4. {
  5. "terms" :
  6. {
  7. "field":"attribute name",
  8. "size": 1000
  9. }
  10. }
  11. }

where:

  • group name is an alias given to the aggregation
  • field value must contain the name of the attribute or expression we are faceting
  • optional size specifies the maximum number of buckets to include into the result. When not specified it inherits the main query’s limit. More details can be found in section Size of facet result.

The result set will contain an aggregations node with the returned facets, where key is the aggregated value and doc_count is the aggregation count.

  1. "aggregations": {
  2. "group name": {
  3. "buckets": [
  4. {
  5. "key": 10,
  6. "doc_count": 1019
  7. },
  8. {
  9. "key": 9,
  10. "doc_count": 954
  11. },
  12. {
  13. "key": 8,
  14. "doc_count": 1021
  15. },
  16. {
  17. "key": 7,
  18. "doc_count": 1011
  19. },
  20. {
  21. "key": 6,
  22. "doc_count": 997
  23. }
  24. ]
  25. }
  26. }
  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL JSON PHP Python Javascript Java

  1. SELECT *, price AS aprice FROM facetdemo LIMIT 10 FACET price LIMIT 10 FACET brand_id LIMIT 5;
  1. POST /search -d '
  2. {
  3. "index" : "facetdemo",
  4. "query" : {"match_all" : {} },
  5. "limit": 5,
  6. "aggs" :
  7. {
  8. "group_property" :
  9. {
  10. "terms" :
  11. {
  12. "field":"price",
  13. }
  14. },
  15. "group_brand_id" :
  16. {
  17. "terms" :
  18. {
  19. "field":"brand_id",
  20. }
  21. }
  22. }
  23. }
  24. '
  1. $index->setName('facetdemo');
  2. $search = $index->search('');
  3. $search->limit(5);
  4. $search->facet('price','price');
  5. $search->facet('brand_id','group_brand_id');
  6. $results = $search->get();
  1. res =searchApi.search({"index":"facetdemo","query":{"match_all":{}},"limit":5,"aggs":{"group_property":{"terms":{"field":"price",}},"group_brand_id":{"terms":{"field":"brand_id"}}}})
  1. res = await searchApi.search({"index":"facetdemo","query":{"match_all":{}},"limit":5,"aggs":{"group_property":{"terms":{"field":"price",}},"group_brand_id":{"terms":{"field":"brand_id"}}}});
  1. aggs = new HashMap<String,Object>(){{
  2. put("group_property", new HashMap<String,Object>(){{
  3. put("sizes", new HashMap<String,Object>(){{
  4. put("field","price");
  5. }});
  6. }});
  7. put("group_brand_id", new HashMap<String,Object>(){{
  8. put("sizes", new HashMap<String,Object>(){{
  9. put("field","brand_id");
  10. }});
  11. }});
  12. }};
  13. searchRequest = new SearchRequest();
  14. searchRequest.setIndex("facetdemo");
  15. searchRequest.setLimit(5);
  16. query = new HashMap<String,Object>();
  17. query.put("match_all",null);
  18. searchRequest.setQuery(query);
  19. searchRequest.setAggs(aggs);
  20. searchResponse = searchApi.search(searchRequest);

Response

  1. +------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
  2. | id | price | brand_id | title | brand_name | property | j | categories | aprice |
  3. +------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 | 306 |
  5. | 2 | 400 | 10 | Product Three One | Brand Ten | Four_Three | {"prop1":69,"prop2":19,"prop3":"One"} | 13,14 | 400 |
  6. ...
  7. | 9 | 560 | 6 | Product Two Five | Brand Six | Eight_Two | {"prop1":90,"prop2":84,"prop3":"One"} | 13,14 | 560 |
  8. | 10 | 229 | 9 | Product Three Eight | Brand Nine | Seven_Three | {"prop1":84,"prop2":39,"prop3":"One"} | 12,13 | 229 |
  9. +------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
  10. 10 rows in set (0.00 sec)
  11. +-------+----------+
  12. | price | count(*) |
  13. +-------+----------+
  14. | 306 | 7 |
  15. | 400 | 13 |
  16. ...
  17. | 229 | 9 |
  18. | 595 | 10 |
  19. +-------+----------+
  20. 10 rows in set (0.00 sec)
  21. +----------+----------+
  22. | brand_id | count(*) |
  23. +----------+----------+
  24. | 1 | 1013 |
  25. | 10 | 998 |
  26. | 5 | 1007 |
  27. | 8 | 1033 |
  28. | 7 | 965 |
  29. +----------+----------+
  30. 5 rows in set (0.00 sec)
  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 10000,
  6. "hits": [
  7. {
  8. "_id": "1",
  9. "_score": 1,
  10. "_source": {
  11. "price": 197,
  12. "brand_id": 10,
  13. "brand_name": "Brand Ten",
  14. "categories": [
  15. 10
  16. ]
  17. }
  18. },
  19. ...
  20. {
  21. "_id": "5",
  22. "_score": 1,
  23. "_source": {
  24. "price": 805,
  25. "brand_id": 7,
  26. "brand_name": "Brand Seven",
  27. "categories": [
  28. 11,
  29. 12,
  30. 13
  31. ]
  32. }
  33. }
  34. ]
  35. },
  36. "aggregations": {
  37. "group_property": {
  38. "buckets": [
  39. {
  40. "key": 1000,
  41. "doc_count": 11
  42. },
  43. {
  44. "key": 999,
  45. "doc_count": 12
  46. },
  47. ...
  48. {
  49. "key": 991,
  50. "doc_count": 7
  51. }
  52. ]
  53. },
  54. "group_brand_id": {
  55. "buckets": [
  56. {
  57. "key": 10,
  58. "doc_count": 1019
  59. },
  60. {
  61. "key": 9,
  62. "doc_count": 954
  63. },
  64. {
  65. "key": 8,
  66. "doc_count": 1021
  67. },
  68. {
  69. "key": 7,
  70. "doc_count": 1011
  71. },
  72. {
  73. "key": 6,
  74. "doc_count": 997
  75. }
  76. ]
  77. }
  78. }
  79. }
  1. Array
  2. (
  3. [price] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => 1000
  10. [doc_count] => 11
  11. )
  12. [1] => Array
  13. (
  14. [key] => 999
  15. [doc_count] => 12
  16. )
  17. [2] => Array
  18. (
  19. [key] => 998
  20. [doc_count] => 7
  21. )
  22. [3] => Array
  23. (
  24. [key] => 997
  25. [doc_count] => 14
  26. )
  27. [4] => Array
  28. (
  29. [key] => 996
  30. [doc_count] => 8
  31. )
  32. )
  33. )
  34. [group_brand_id] => Array
  35. (
  36. [buckets] => Array
  37. (
  38. [0] => Array
  39. (
  40. [key] => 10
  41. [doc_count] => 1019
  42. )
  43. [1] => Array
  44. (
  45. [key] => 9
  46. [doc_count] => 954
  47. )
  48. [2] => Array
  49. (
  50. [key] => 8
  51. [doc_count] => 1021
  52. )
  53. [3] => Array
  54. (
  55. [key] => 7
  56. [doc_count] => 1011
  57. )
  58. [4] => Array
  59. (
  60. [key] => 6
  61. [doc_count] => 997
  62. )
  63. )
  64. )
  65. )
  1. {'aggregations': {u'group_brand_id': {u'buckets': [{u'doc_count': 1019,
  2. u'key': 10},
  3. {u'doc_count': 954,
  4. u'key': 9},
  5. {u'doc_count': 1021,
  6. u'key': 8},
  7. {u'doc_count': 1011,
  8. u'key': 7},
  9. {u'doc_count': 997,
  10. u'key': 6}]},
  11. u'group_property': {u'buckets': [{u'doc_count': 11,
  12. u'key': 1000},
  13. {u'doc_count': 12,
  14. u'key': 999},
  15. {u'doc_count': 7,
  16. u'key': 998},
  17. {u'doc_count': 14,
  18. u'key': 997},
  19. {u'doc_count': 8,
  20. u'key': 996}]}},
  21. 'hits': {'hits': [{u'_id': u'1',
  22. u'_score': 1,
  23. u'_source': {u'brand_id': 10,
  24. u'brand_name': u'Brand Ten',
  25. u'categories': [10],
  26. u'price': 197,
  27. u'property': u'Six',
  28. u'title': u'Product Eight One'}},
  29. {u'_id': u'2',
  30. u'_score': 1,
  31. u'_source': {u'brand_id': 6,
  32. u'brand_name': u'Brand Six',
  33. u'categories': [12, 13, 14],
  34. u'price': 671,
  35. u'property': u'Four',
  36. u'title': u'Product Nine Seven'}},
  37. {u'_id': u'3',
  38. u'_score': 1,
  39. u'_source': {u'brand_id': 3,
  40. u'brand_name': u'Brand Three',
  41. u'categories': [13, 14, 15],
  42. u'price': 92,
  43. u'property': u'Six',
  44. u'title': u'Product Five Four'}},
  45. {u'_id': u'4',
  46. u'_score': 1,
  47. u'_source': {u'brand_id': 10,
  48. u'brand_name': u'Brand Ten',
  49. u'categories': [11],
  50. u'price': 713,
  51. u'property': u'Five',
  52. u'title': u'Product Eight Nine'}},
  53. {u'_id': u'5',
  54. u'_score': 1,
  55. u'_source': {u'brand_id': 7,
  56. u'brand_name': u'Brand Seven',
  57. u'categories': [11, 12, 13],
  58. u'price': 805,
  59. u'property': u'Two',
  60. u'title': u'Product Ten Three'}}],
  61. 'max_score': None,
  62. 'total': 10000},
  63. 'profile': None,
  64. 'timed_out': False,
  65. 'took': 4}
  1. {"took":0,"timed_out":false,"hits":{"total":10000,"hits":[{"_id":"1","_score":1,"_source":{"price":197,"brand_id":10,"brand_name":"Brand Ten","categories":[10],"title":"Product Eight One","property":"Six"}},{"_id":"2","_score":1,"_source":{"price":671,"brand_id":6,"brand_name":"Brand Six","categories":[12,13,14],"title":"Product Nine Seven","property":"Four"}},{"_id":"3","_score":1,"_source":{"price":92,"brand_id":3,"brand_name":"Brand Three","categories":[13,14,15],"title":"Product Five Four","property":"Six"}},{"_id":"4","_score":1,"_source":{"price":713,"brand_id":10,"brand_name":"Brand Ten","categories":[11],"title":"Product Eight Nine","property":"Five"}},{"_id":"5","_score":1,"_source":{"price":805,"brand_id":7,"brand_name":"Brand Seven","categories":[11,12,13],"title":"Product Ten Three","property":"Two"}}]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {group_property={buckets=[{key=1000, doc_count=11}, {key=999, doc_count=12}, {key=998, doc_count=7}, {key=997, doc_count=14}, {key=996, doc_count=8}]}, group_brand_id={buckets=[{key=10, doc_count=1019}, {key=9, doc_count=954}, {key=8, doc_count=1021}, {key=7, doc_count=1011}, {key=6, doc_count=997}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 10000
  8. hits: [{_id=1, _score=1, _source={price=197, brand_id=10, brand_name=Brand Ten, categories=[10], title=Product Eight One, property=Six}}, {_id=2, _score=1, _source={price=671, brand_id=6, brand_name=Brand Six, categories=[12, 13, 14], title=Product Nine Seven, property=Four}}, {_id=3, _score=1, _source={price=92, brand_id=3, brand_name=Brand Three, categories=[13, 14, 15], title=Product Five Four, property=Six}}, {_id=4, _score=1, _source={price=713, brand_id=10, brand_name=Brand Ten, categories=[11], title=Product Eight Nine, property=Five}}, {_id=5, _score=1, _source={price=805, brand_id=7, brand_name=Brand Seven, categories=[11, 12, 13], title=Product Ten Three, property=Two}}]
  9. }
  10. profile: null
  11. }

Faceting by aggregation over another attribute

Data can be faceted by aggregating another attribute or expression. For example if the documents contain both the brand id and name, we can return in facet the brand names, but aggregate the brand ids. This can be done by using FACET {expr1} BY {expr2}

  • SQL

SQL

  1. SELECT * FROM facetdemo FACET brand_name by brand_id;

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 |
  5. | 2 | 400 | 10 | Product Three One | Brand Ten | Four_Three | {"prop1":69,"prop2":19,"prop3":"One"} | 13,14 |
  6. ....
  7. | 19 | 855 | 1 | Product Seven Two | Brand One | Eight_Seven | {"prop1":63,"prop2":78,"prop3":"One"} | 10,11,12 |
  8. | 20 | 31 | 9 | Product Four One | Brand Nine | Ten_Four | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14 |
  9. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  10. 20 rows in set (0.00 sec)
  11. +-------------+----------+
  12. | brand_name | count(*) |
  13. +-------------+----------+
  14. | Brand One | 1013 |
  15. | Brand Ten | 998 |
  16. | Brand Five | 1007 |
  17. | Brand Nine | 944 |
  18. | Brand Two | 990 |
  19. | Brand Six | 1039 |
  20. | Brand Three | 1016 |
  21. | Brand Four | 994 |
  22. | Brand Eight | 1033 |
  23. | Brand Seven | 965 |
  24. +-------------+----------+
  25. 10 rows in set (0.00 sec)

Faceting without duplicates

If you need to remove duplicates from the buckets FACET returns you can use DISTINCT field_name where field_name is the field by which you want to do the deduplication. It can be also id (and it is by default) if you make a FACET query against a distributed table and are not sure you have unique ids in the tables (the tables should be local and have the same schema).

If you have multiple FACET declarations in your query field_name should be the same in all of them.

DISTINCT returns additional column count(distinct ...) before column count(*), so you can get the both results without the need to make another query.

  • SQL

SQL

  1. SELECT brand_name, property FROM facetdemo FACET brand_name distinct property;

Response

  1. +-------------+----------+
  2. | brand_name | property |
  3. +-------------+----------+
  4. | Brand Nine | Four |
  5. | Brand Ten | Four |
  6. | Brand One | Five |
  7. | Brand Seven | Nine |
  8. | Brand Seven | Seven |
  9. | Brand Three | Seven |
  10. | Brand Nine | Five |
  11. | Brand Three | Eight |
  12. | Brand Two | Eight |
  13. | Brand Six | Eight |
  14. | Brand Ten | Four |
  15. | Brand Ten | Two |
  16. | Brand Four | Ten |
  17. | Brand One | Nine |
  18. | Brand Four | Eight |
  19. | Brand Nine | Seven |
  20. | Brand Four | Five |
  21. | Brand Three | Four |
  22. | Brand Four | Two |
  23. | Brand Four | Eight |
  24. +-------------+----------+
  25. 20 rows in set (0.00 sec)
  26. +-------------+--------------------------+----------+
  27. | brand_name | count(distinct property) | count(*) |
  28. +-------------+--------------------------+----------+
  29. | Brand Nine | 3 | 3 |
  30. | Brand Ten | 2 | 3 |
  31. | Brand One | 2 | 2 |
  32. | Brand Seven | 2 | 2 |
  33. | Brand Three | 3 | 3 |
  34. | Brand Two | 1 | 1 |
  35. | Brand Six | 1 | 1 |
  36. | Brand Four | 4 | 5 |
  37. +-------------+--------------------------+----------+
  38. 8 rows in set (0.00 sec)

Facet over expressions

Facets can aggregate over expressions. A classic example is segmentation of price by certain ranges:

  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL JSON PHP Python Javascript Java

  1. SELECT * FROM facetdemo FACET INTERVAL(price,200,400,600,800) AS price_range ;
  1. POST /search -d '
  2. {
  3. "index": "facetdemo",
  4. "query":
  5. {
  6. "match_all": {}
  7. },
  8. "expressions":
  9. {
  10. "price_range": "INTERVAL(price,200,400,600,800)"
  11. },
  12. "aggs":
  13. {
  14. "group_property":
  15. {
  16. "terms":
  17. {
  18. "field": "price_range"
  19. }
  20. }
  21. }
  22. }
  1. $index->setName('facetdemo');
  2. $search = $index->search('');
  3. $search->limit(5);
  4. $search->expression('price_range','INTERVAL(price,200,400,600,800)');
  5. $search->facet('price_range','group_property');
  6. $results = $search->get();
  7. print_r($results->getFacets());
  1. res =searchApi.search({"index":"facetdemo","query":{"match_all":{}},"expressions":{"price_range":"INTERVAL(price,200,400,600,800)"},"aggs":{"group_property":{"terms":{"field":"price_range"}}}})
  1. res = await searchApi.search({"index":"facetdemo","query":{"match_all":{}},"expressions":{"price_range":"INTERVAL(price,200,400,600,800)"},"aggs":{"group_property":{"terms":{"field":"price_range"}}}});
  1. searchRequest = new SearchRequest();
  2. expressions = new HashMap<String,Object>(){{
  3. put("price_range","INTERVAL(price,200,400,600,800)");
  4. }};
  5. searchRequest.setExpressions(expressions);
  6. aggs = new HashMap<String,Object>(){{
  7. put("group_property", new HashMap<String,Object>(){{
  8. put("sizes", new HashMap<String,Object>(){{
  9. put("field","price_range");
  10. }});
  11. }});
  12. }};
  13. searchRequest.setIndex("facetdemo");
  14. searchRequest.setLimit(5);
  15. query = new HashMap<String,Object>();
  16. query.put("match_all",null);
  17. searchRequest.setQuery(query);
  18. searchRequest.setAggs(aggs);
  19. searchResponse = searchApi.search(searchRequest);

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories | price_range |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 | 1 |
  5. ...
  6. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  7. 20 rows in set (0.00 sec)
  8. +-------------+----------+
  9. | price_range | count(*) |
  10. +-------------+----------+
  11. | 0 | 1885 |
  12. | 3 | 1973 |
  13. | 4 | 2100 |
  14. | 2 | 1999 |
  15. | 1 | 2043 |
  16. +-------------+----------+
  17. 5 rows in set (0.01 sec)
  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 10000,
  6. "hits": [
  7. {
  8. "_id": "1",
  9. "_score": 1,
  10. "_source": {
  11. "price": 197,
  12. "brand_id": 10,
  13. "brand_name": "Brand Ten",
  14. "categories": [
  15. 10
  16. ],
  17. "price_range": 0
  18. }
  19. },
  20. ...
  21. {
  22. "_id": "20",
  23. "_score": 1,
  24. "_source": {
  25. "price": 227,
  26. "brand_id": 3,
  27. "brand_name": "Brand Three",
  28. "categories": [
  29. 12,
  30. 13
  31. ],
  32. "price_range": 1
  33. }
  34. }
  35. ]
  36. },
  37. "aggregations": {
  38. "group_property": {
  39. "buckets": [
  40. {
  41. "key": 4,
  42. "doc_count": 2100
  43. },
  44. {
  45. "key": 3,
  46. "doc_count": 1973
  47. },
  48. {
  49. "key": 2,
  50. "doc_count": 1999
  51. },
  52. {
  53. "key": 1,
  54. "doc_count": 2043
  55. },
  56. {
  57. "key": 0,
  58. "doc_count": 1885
  59. }
  60. ]
  61. }
  62. }
  63. }
  1. Array
  2. (
  3. [group_property] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => 4
  10. [doc_count] => 2100
  11. )
  12. [1] => Array
  13. (
  14. [key] => 3
  15. [doc_count] => 1973
  16. )
  17. [2] => Array
  18. (
  19. [key] => 2
  20. [doc_count] => 1999
  21. )
  22. [3] => Array
  23. (
  24. [key] => 1
  25. [doc_count] => 2043
  26. )
  27. [4] => Array
  28. (
  29. [key] => 0
  30. [doc_count] => 1885
  31. )
  32. )
  33. )
  34. )
  1. {'aggregations': {u'group_brand_id': {u'buckets': [{u'doc_count': 1019,
  2. u'key': 10},
  3. {u'doc_count': 954,
  4. u'key': 9},
  5. {u'doc_count': 1021,
  6. u'key': 8},
  7. {u'doc_count': 1011,
  8. u'key': 7},
  9. {u'doc_count': 997,
  10. u'key': 6}]},
  11. u'group_property': {u'buckets': [{u'doc_count': 11,
  12. u'key': 1000},
  13. {u'doc_count': 12,
  14. u'key': 999},
  15. {u'doc_count': 7,
  16. u'key': 998},
  17. {u'doc_count': 14,
  18. u'key': 997},
  19. {u'doc_count': 8,
  20. u'key': 996}]}},
  21. 'hits': {'hits': [{u'_id': u'1',
  22. u'_score': 1,
  23. u'_source': {u'brand_id': 10,
  24. u'brand_name': u'Brand Ten',
  25. u'categories': [10],
  26. u'price': 197,
  27. u'property': u'Six',
  28. u'title': u'Product Eight One'}},
  29. {u'_id': u'2',
  30. u'_score': 1,
  31. u'_source': {u'brand_id': 6,
  32. u'brand_name': u'Brand Six',
  33. u'categories': [12, 13, 14],
  34. u'price': 671,
  35. u'property': u'Four',
  36. u'title': u'Product Nine Seven'}},
  37. {u'_id': u'3',
  38. u'_score': 1,
  39. u'_source': {u'brand_id': 3,
  40. u'brand_name': u'Brand Three',
  41. u'categories': [13, 14, 15],
  42. u'price': 92,
  43. u'property': u'Six',
  44. u'title': u'Product Five Four'}},
  45. {u'_id': u'4',
  46. u'_score': 1,
  47. u'_source': {u'brand_id': 10,
  48. u'brand_name': u'Brand Ten',
  49. u'categories': [11],
  50. u'price': 713,
  51. u'property': u'Five',
  52. u'title': u'Product Eight Nine'}},
  53. {u'_id': u'5',
  54. u'_score': 1,
  55. u'_source': {u'brand_id': 7,
  56. u'brand_name': u'Brand Seven',
  57. u'categories': [11, 12, 13],
  58. u'price': 805,
  59. u'property': u'Two',
  60. u'title': u'Product Ten Three'}}],
  61. 'max_score': None,
  62. 'total': 10000},
  63. 'profile': None,
  64. 'timed_out': False,
  65. 'took': 0}
  1. {"took":0,"timed_out":false,"hits":{"total":10000,"hits":[{"_id":"1","_score":1,"_source":{"price":197,"brand_id":10,"brand_name":"Brand Ten","categories":[10],"title":"Product Eight One","property":"Six","price_range":0}},{"_id":"2","_score":1,"_source":{"price":671,"brand_id":6,"brand_name":"Brand Six","categories":[12,13,14],"title":"Product Nine Seven","property":"Four","price_range":3}},{"_id":"3","_score":1,"_source":{"price":92,"brand_id":3,"brand_name":"Brand Three","categories":[13,14,15],"title":"Product Five Four","property":"Six","price_range":0}},{"_id":"4","_score":1,"_source":{"price":713,"brand_id":10,"brand_name":"Brand Ten","categories":[11],"title":"Product Eight Nine","property":"Five","price_range":3}},{"_id":"5","_score":1,"_source":{"price":805,"brand_id":7,"brand_name":"Brand Seven","categories":[11,12,13],"title":"Product Ten Three","property":"Two","price_range":4}},{"_id":"6","_score":1,"_source":{"price":420,"brand_id":2,"brand_name":"Brand Two","categories":[10,11],"title":"Product Two One","property":"Six","price_range":2}},{"_id":"7","_score":1,"_source":{"price":412,"brand_id":9,"brand_name":"Brand Nine","categories":[10],"title":"Product Four Nine","property":"Eight","price_range":2}},{"_id":"8","_score":1,"_source":{"price":300,"brand_id":9,"brand_name":"Brand Nine","categories":[13,14,15],"title":"Product Eight Four","property":"Five","price_range":1}},{"_id":"9","_score":1,"_source":{"price":728,"brand_id":1,"brand_name":"Brand One","categories":[11],"title":"Product Nine Six","property":"Four","price_range":3}},{"_id":"10","_score":1,"_source":{"price":622,"brand_id":3,"brand_name":"Brand Three","categories":[10,11],"title":"Product Six Seven","property":"Two","price_range":3}},{"_id":"11","_score":1,"_source":{"price":462,"brand_id":5,"brand_name":"Brand Five","categories":[10,11],"title":"Product Ten Two","property":"Eight","price_range":2}},{"_id":"12","_score":1,"_source":{"price":939,"brand_id":7,"brand_name":"Brand Seven","categories":[12,13],"title":"Product Nine Seven","property":"Six","price_range":4}},{"_id":"13","_score":1,"_source":{"price":948,"brand_id":8,"brand_name":"Brand Eight","categories":[12],"title":"Product Ten One","property":"Six","price_range":4}},{"_id":"14","_score":1,"_source":{"price":900,"brand_id":9,"brand_name":"Brand Nine","categories":[12,13,14],"title":"Product Ten Nine","property":"Three","price_range":4}},{"_id":"15","_score":1,"_source":{"price":224,"brand_id":3,"brand_name":"Brand Three","categories":[13],"title":"Product Two Six","property":"Four","price_range":1}},{"_id":"16","_score":1,"_source":{"price":713,"brand_id":10,"brand_name":"Brand Ten","categories":[12],"title":"Product Two Four","property":"Six","price_range":3}},{"_id":"17","_score":1,"_source":{"price":510,"brand_id":2,"brand_name":"Brand Two","categories":[10],"title":"Product Ten Two","property":"Seven","price_range":2}},{"_id":"18","_score":1,"_source":{"price":702,"brand_id":10,"brand_name":"Brand Ten","categories":[12,13],"title":"Product Nine One","property":"Three","price_range":3}},{"_id":"19","_score":1,"_source":{"price":836,"brand_id":4,"brand_name":"Brand Four","categories":[10,11,12],"title":"Product Four Five","property":"Two","price_range":4}},{"_id":"20","_score":1,"_source":{"price":227,"brand_id":3,"brand_name":"Brand Three","categories":[12,13],"title":"Product Three Four","property":"Ten","price_range":1}}]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {group_property={buckets=[{key=4, doc_count=2100}, {key=3, doc_count=1973}, {key=2, doc_count=1999}, {key=1, doc_count=2043}, {key=0, doc_count=1885}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 10000
  8. hits: [{_id=1, _score=1, _source={price=197, brand_id=10, brand_name=Brand Ten, categories=[10], title=Product Eight One, property=Six, price_range=0}}, {_id=2, _score=1, _source={price=671, brand_id=6, brand_name=Brand Six, categories=[12, 13, 14], title=Product Nine Seven, property=Four, price_range=3}}, {_id=3, _score=1, _source={price=92, brand_id=3, brand_name=Brand Three, categories=[13, 14, 15], title=Product Five Four, property=Six, price_range=0}}, {_id=4, _score=1, _source={price=713, brand_id=10, brand_name=Brand Ten, categories=[11], title=Product Eight Nine, property=Five, price_range=3}}, {_id=5, _score=1, _source={price=805, brand_id=7, brand_name=Brand Seven, categories=[11, 12, 13], title=Product Ten Three, property=Two, price_range=4}}]
  9. }
  10. profile: null
  11. }

Facet over multi-level grouping

Facets can aggregate over multi-level grouping, the result set being the same as the query would perform a multi-level grouping:

  • SQL

SQL

  1. SELECT *,INTERVAL(price,200,400,600,800) AS price_range FROM facetdemo
  2. FACET price_range AS price_range,brand_name ORDER BY brand_name asc;

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories | price_range |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 | 1 |
  5. ...
  6. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
  7. 20 rows in set (0.00 sec)
  8. +--------------+-------------+----------+
  9. | fprice_range | brand_name | count(*) |
  10. +--------------+-------------+----------+
  11. | 1 | Brand Eight | 197 |
  12. | 4 | Brand Eight | 235 |
  13. | 3 | Brand Eight | 203 |
  14. | 2 | Brand Eight | 201 |
  15. | 0 | Brand Eight | 197 |
  16. | 4 | Brand Five | 230 |
  17. | 2 | Brand Five | 197 |
  18. | 1 | Brand Five | 204 |
  19. | 3 | Brand Five | 193 |
  20. | 0 | Brand Five | 183 |
  21. | 1 | Brand Four | 195 |
  22. ...

Ordering in facet result

Facets support ORDER BY clause as same as a standard query. Each facet can have it’s or own ordering and the facet ordering doesn’t affect in any way the ordering of the main result set, which is ordered by the main query’s ORDER BY. Sorting can be made on attribute name, count (using COUNT(*)) or special FACET() function can be used, which provides the aggregated data values.

  • SQL

SQL

  1. SELECT * FROM facetdemo
  2. FACET brand_name BY brand_id ORDER BY FACET() ASC
  3. FACET brand_name BY brand_id ORDER BY brand_name ASC
  4. FACET brand_name BY brand_id order BY COUNT(*) DESC;

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 |
  5. ...
  6. | 20 | 31 | 9 | Product Four One | Brand Nine | Ten_Four | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14 |
  7. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  8. 20 rows in set (0.01 sec)
  9. +-------------+----------+
  10. | brand_name | count(*) |
  11. +-------------+----------+
  12. | Brand One | 1013 |
  13. | Brand Two | 990 |
  14. | Brand Three | 1016 |
  15. | Brand Four | 994 |
  16. | Brand Five | 1007 |
  17. | Brand Six | 1039 |
  18. | Brand Seven | 965 |
  19. | Brand Eight | 1033 |
  20. | Brand Nine | 944 |
  21. | Brand Ten | 998 |
  22. +-------------+----------+
  23. 10 rows in set (0.01 sec)
  24. +-------------+----------+
  25. | brand_name | count(*) |
  26. +-------------+----------+
  27. | Brand Eight | 1033 |
  28. | Brand Five | 1007 |
  29. | Brand Four | 994 |
  30. | Brand Nine | 944 |
  31. | Brand One | 1013 |
  32. | Brand Seven | 965 |
  33. | Brand Six | 1039 |
  34. | Brand Ten | 998 |
  35. | Brand Three | 1016 |
  36. | Brand Two | 990 |
  37. +-------------+----------+
  38. 10 rows in set (0.01 sec)
  39. +-------------+----------+
  40. | brand_name | count(*) |
  41. +-------------+----------+
  42. | Brand Six | 1039 |
  43. | Brand Eight | 1033 |
  44. | Brand Three | 1016 |
  45. | Brand One | 1013 |
  46. | Brand Five | 1007 |
  47. | Brand Ten | 998 |
  48. | Brand Four | 994 |
  49. | Brand Two | 990 |
  50. | Brand Seven | 965 |
  51. | Brand Nine | 944 |
  52. +-------------+----------+
  53. 10 rows in set (0.01 sec)

Size of facet result

By default each facet result set is limited to 20 values. The number of facet values can be controlled with LIMIT clause individually for each facet by providing either a number of values to return in format LIMIT count or with an offset as LIMIT offset, count.

The maximum facet values that can be returned is limited by the query’s max_matches setting. In case dynamic max_matches (limiting max_matches to offset+per page for better performance) is wanted to be implemented, it must be taken in account that a too low max_matches value can hurt the number of facet values. In this case, a minimum max_matches value should be used good enough to cover the number of facet values.

  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java

SQL JSON PHP Python Javascript Java

  1. SELECT * FROM facetdemo
  2. FACET brand_name BY brand_id ORDER BY FACET() ASC LIMIT 0,1
  3. FACET brand_name BY brand_id ORDER BY brand_name ASC LIMIT 2,4
  4. FACET brand_name BY brand_id order BY COUNT(*) DESC LIMIT 4;
  1. POST /search -d '
  2. {
  3. "index" : "facetdemo",
  4. "query" : {"match_all" : {} },
  5. "limit": 5,
  6. "aggs" :
  7. {
  8. "group_property" :
  9. {
  10. "terms" :
  11. {
  12. "field":"price",
  13. "size":1,
  14. }
  15. },
  16. "group_brand_id" :
  17. {
  18. "terms" :
  19. {
  20. "field":"brand_id",
  21. "size":3
  22. }
  23. }
  24. }
  25. }
  26. '
  1. $index->setName('facetdemo');
  2. $search = $index->search('');
  3. $search->limit(5);
  4. $search->facet('price','price',1);
  5. $search->facet('brand_id','group_brand_id',3);
  6. $results = $search->get();
  7. print_r($results->getFacets());
  1. res =searchApi.search({"index":"facetdemo","query":{"match_all":{}},"limit":5,"aggs":{"group_property":{"terms":{"field":"price","size":1,}},"group_brand_id":{"terms":{"field":"brand_id","size":3}}}})
  1. res = await searchApi.search({"index":"facetdemo","query":{"match_all":{}},"limit":5,"aggs":{"group_property":{"terms":{"field":"price","size":1,}},"group_brand_id":{"terms":{"field":"brand_id","size":3}}}});
  1. searchRequest = new SearchRequest();
  2. aggs = new HashMap<String,Object>(){{
  3. put("group_property", new HashMap<String,Object>(){{
  4. put("sizes", new HashMap<String,Object>(){{
  5. put("field","price");
  6. put("size",1);
  7. }});
  8. }});
  9. put("group_brand_id", new HashMap<String,Object>(){{
  10. put("sizes", new HashMap<String,Object>(){{
  11. put("field","brand_id");
  12. put("size",3);
  13. }});
  14. }});
  15. }};
  16. searchRequest.setIndex("facetdemo");
  17. searchRequest.setLimit(5);
  18. query = new HashMap<String,Object>();
  19. query.put("match_all",null);
  20. searchRequest.setQuery(query);
  21. searchRequest.setAggs(aggs);
  22. searchResponse = searchApi.search(searchRequest);

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 |
  5. ...
  6. | 20 | 31 | 9 | Product Four One | Brand Nine | Ten_Four | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14 |
  7. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  8. 20 rows in set (0.01 sec)
  9. +-------------+----------+
  10. | brand_name | count(*) |
  11. +-------------+----------+
  12. | Brand One | 1013 |
  13. +-------------+----------+
  14. 1 rows in set (0.01 sec)
  15. +-------------+----------+
  16. | brand_name | count(*) |
  17. +-------------+----------+
  18. | Brand Four | 994 |
  19. | Brand Nine | 944 |
  20. | Brand One | 1013 |
  21. | Brand Seven | 965 |
  22. +-------------+----------+
  23. 4 rows in set (0.01 sec)
  24. +-------------+----------+
  25. | brand_name | count(*) |
  26. +-------------+----------+
  27. | Brand Six | 1039 |
  28. | Brand Eight | 1033 |
  29. | Brand Three | 1016 |
  30. +-------------+----------+
  31. 3 rows in set (0.01 sec)
  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "hits": {
  5. "total": 10000,
  6. "hits": [
  7. {
  8. "_id": "1",
  9. "_score": 1,
  10. "_source": {
  11. "price": 197,
  12. "brand_id": 10,
  13. "brand_name": "Brand Ten",
  14. "categories": [
  15. 10
  16. ]
  17. }
  18. },
  19. ...
  20. {
  21. "_id": "5",
  22. "_score": 1,
  23. "_source": {
  24. "price": 805,
  25. "brand_id": 7,
  26. "brand_name": "Brand Seven",
  27. "categories": [
  28. 11,
  29. 12,
  30. 13
  31. ]
  32. }
  33. }
  34. ]
  35. },
  36. "aggregations": {
  37. "group_property": {
  38. "buckets": [
  39. {
  40. "key": 1000,
  41. "doc_count": 11
  42. }
  43. ]
  44. },
  45. "group_brand_id": {
  46. "buckets": [
  47. {
  48. "key": 10,
  49. "doc_count": 1019
  50. },
  51. {
  52. "key": 9,
  53. "doc_count": 954
  54. },
  55. {
  56. "key": 8,
  57. "doc_count": 1021
  58. }
  59. ]
  60. }
  61. }
  62. }
  1. Array
  2. (
  3. [price] => Array
  4. (
  5. [buckets] => Array
  6. (
  7. [0] => Array
  8. (
  9. [key] => 1000
  10. [doc_count] => 11
  11. )
  12. )
  13. )
  14. [group_brand_id] => Array
  15. (
  16. [buckets] => Array
  17. (
  18. [0] => Array
  19. (
  20. [key] => 10
  21. [doc_count] => 1019
  22. )
  23. [1] => Array
  24. (
  25. [key] => 9
  26. [doc_count] => 954
  27. )
  28. [2] => Array
  29. (
  30. [key] => 8
  31. [doc_count] => 1021
  32. )
  33. )
  34. )
  35. )
  1. {'aggregations': {u'group_brand_id': {u'buckets': [{u'doc_count': 1019,
  2. u'key': 10},
  3. {u'doc_count': 954,
  4. u'key': 9},
  5. {u'doc_count': 1021,
  6. u'key': 8}]},
  7. u'group_property': {u'buckets': [{u'doc_count': 11,
  8. u'key': 1000}]}},
  9. 'hits': {'hits': [{u'_id': u'1',
  10. u'_score': 1,
  11. u'_source': {u'brand_id': 10,
  12. u'brand_name': u'Brand Ten',
  13. u'categories': [10],
  14. u'price': 197,
  15. u'property': u'Six',
  16. u'title': u'Product Eight One'}},
  17. {u'_id': u'2',
  18. u'_score': 1,
  19. u'_source': {u'brand_id': 6,
  20. u'brand_name': u'Brand Six',
  21. u'categories': [12, 13, 14],
  22. u'price': 671,
  23. u'property': u'Four',
  24. u'title': u'Product Nine Seven'}},
  25. {u'_id': u'3',
  26. u'_score': 1,
  27. u'_source': {u'brand_id': 3,
  28. u'brand_name': u'Brand Three',
  29. u'categories': [13, 14, 15],
  30. u'price': 92,
  31. u'property': u'Six',
  32. u'title': u'Product Five Four'}},
  33. {u'_id': u'4',
  34. u'_score': 1,
  35. u'_source': {u'brand_id': 10,
  36. u'brand_name': u'Brand Ten',
  37. u'categories': [11],
  38. u'price': 713,
  39. u'property': u'Five',
  40. u'title': u'Product Eight Nine'}},
  41. {u'_id': u'5',
  42. u'_score': 1,
  43. u'_source': {u'brand_id': 7,
  44. u'brand_name': u'Brand Seven',
  45. u'categories': [11, 12, 13],
  46. u'price': 805,
  47. u'property': u'Two',
  48. u'title': u'Product Ten Three'}}],
  49. 'max_score': None,
  50. 'total': 10000},
  51. 'profile': None,
  52. 'timed_out': False,
  53. 'took': 0}
  1. {"took":0,"timed_out":false,"hits":{"total":10000,"hits":[{"_id":"1","_score":1,"_source":{"price":197,"brand_id":10,"brand_name":"Brand Ten","categories":[10],"title":"Product Eight One","property":"Six"}},{"_id":"2","_score":1,"_source":{"price":671,"brand_id":6,"brand_name":"Brand Six","categories":[12,13,14],"title":"Product Nine Seven","property":"Four"}},{"_id":"3","_score":1,"_source":{"price":92,"brand_id":3,"brand_name":"Brand Three","categories":[13,14,15],"title":"Product Five Four","property":"Six"}},{"_id":"4","_score":1,"_source":{"price":713,"brand_id":10,"brand_name":"Brand Ten","categories":[11],"title":"Product Eight Nine","property":"Five"}},{"_id":"5","_score":1,"_source":{"price":805,"brand_id":7,"brand_name":"Brand Seven","categories":[11,12,13],"title":"Product Ten Three","property":"Two"}}]}}
  1. class SearchResponse {
  2. took: 0
  3. timedOut: false
  4. aggregations: {group_property={buckets=[{key=1000, doc_count=11}]}, group_brand_id={buckets=[{key=10, doc_count=1019}, {key=9, doc_count=954}, {key=8, doc_count=1021}]}}
  5. hits: class SearchResponseHits {
  6. maxScore: null
  7. total: 10000
  8. hits: [{_id=1, _score=1, _source={price=197, brand_id=10, brand_name=Brand Ten, categories=[10], title=Product Eight One, property=Six}}, {_id=2, _score=1, _source={price=671, brand_id=6, brand_name=Brand Six, categories=[12, 13, 14], title=Product Nine Seven, property=Four}}, {_id=3, _score=1, _source={price=92, brand_id=3, brand_name=Brand Three, categories=[13, 14, 15], title=Product Five Four, property=Six}}, {_id=4, _score=1, _source={price=713, brand_id=10, brand_name=Brand Ten, categories=[11], title=Product Eight Nine, property=Five}}, {_id=5, _score=1, _source={price=805, brand_id=7, brand_name=Brand Seven, categories=[11, 12, 13], title=Product Ten Three, property=Two}}]
  9. }
  10. profile: null
  11. }

Returned result set

When using SQL, a search with facets returns a multiple result sets response. The MySQL client/library/connector used must have support (most do) for multiple result sets in order to be able to access the facet result sets.

Performance

Internally, the FACET is a shorthand for executing a multi-query where the first query contains the main search query and the rest of the queries in the batch have each a clustering. As in the case of multi-query, the common query optimization can kick-in for a faceted search, meaning the search query is executed only once and the facets operates on the search query result, each facet adding only a fraction of time to the total query time.

To check if the faceted search ran in an optimized mode can be seen in query log, where all the logged queries will contain a xN string, where N is the number of queries that ran in the optimized group or checking the output of SHOW META statement which will exhibit a multiplier metric:

  • SQL

SQL

  1. SELECT * FROM facetdemo FACET brand_id FACET price FACET categories;
  2. SHOW META LIKE 'multiplier';

Response

  1. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  2. | id | price | brand_id | title | brand_name | property | j | categories |
  3. +------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
  4. | 1 | 306 | 1 | Product Ten Three | Brand One | Six_Ten | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11 |
  5. ...
  6. +----------+----------+
  7. | brand_id | count(*) |
  8. +----------+----------+
  9. | 1 | 1013 |
  10. ...
  11. +-------+----------+
  12. | price | count(*) |
  13. +-------+----------+
  14. | 306 | 7 |
  15. ...
  16. +------------+----------+
  17. | categories | count(*) |
  18. +------------+----------+
  19. | 10 | 2436 |
  20. ...
  21. +---------------+-------+
  22. | Variable_name | Value |
  23. +---------------+-------+
  24. | multiplier | 4 |
  25. +---------------+-------+
  26. 1 row in set (0.00 sec)

Geo search

One of the greatest features of Manticore Search is the ability to combine full-text searching with geo location. For example a retailer can offer a search where user looks for a product and the result set can tell which is the closest shop that has the product in stock so user can go in store and pick it up. A travel site can provide results based on a search limited to a certain area and results to be sorted by the distance from a point (‘search museums near a hotel’ for example).

To perform geo searching, a document needs to contain pairs of latitude/longitude coordinates. The coordinates can be stored as float attributes. If the document has multiple locations, it may be convenient to use a json attribute to store coordinate pairs.

  1. table myrt
  2. {
  3. ...
  4. rt_attr_float = lat
  5. rt_attr_float = lon
  6. ...
  7. }

The coordinates can be stored as degrees or radians.

Performing distance calculation

To find out the distance between two points the GEODIST()) function can be used. GEODIST requires two pairs of coordinates as first four parameters.

The 5th parameter in a simplified JSON format can configure certain aspects of the function. By default, GEODIST expects coordinates to be in radians, but in=degrees can be added to allow using degrees at input. The coordinates for which we perform the geo distance must have same time (degrees or radians) as the ones stored in the table, otherwise results will be misleading.

The calculated distance is by default in meters, but with out option it can be transformed to kilometers, feets or miles. Lastly, by default a calculation method called adaptive is used. An alternative method based on haversine algorithm is available, however this one is slower and less precise.

The result of the function - the distance - can be used in ORDER BY clause to sort the results

  1. SELECT *,GEODIST(40.7643929, -73.9997683, lat,lon, {in=degrees, out=miles}) AS distance FROM myindex WHERE MATCH('...') ORDER BY distance ASC, WEIGHT() DESC;

Or to limit the results to a radial area around the point:

  1. SELECT *,GEODIST(40.7643929, -73.9997683, lat,lon, {in=degrees, out=miles}) AS distance FROM myindex WHERE MATCH('...') AND distance <1000 ORDER BY WEIGHT(), DISTANCE ASC;

Searching in polygons

Another geo search functionality is the ability to check if a location belongs to an area. A special function will construct a polygon object which is used in another function that test if a set of coordinates belongs to it or not.

For creating the polygon two functions are available:

  • GEOPOLY2D()) - creates a polygon that takes in account the Earth’s curvature
  • POLY2D()) - creates a simple polygon in plain space

POLY2D can be used for geo searches if the area has sides shorter than 500km (for 3-4 sides, for polygons with more sides lower values should be considered). For areas with longer sides usage of GEOPOLY2D is mandatory for keeping results accurate. GEOPOLY2D also expects coordinates as latitude/longitude pairs in degrees, using radians will provide results in plain space (like POLY2D).

CONTAINS()) expects at input a polygon and a set of coordinates and output 1 if the point is inside the polygon or 0 otherwise.

  1. SELECT *,CONTAINS(GEOPOLY2D(40.76439, -73.9997, 42.21211, -73.999, 42.21211, -76.123, 40.76439, -76.123), 41.5445, -74.973) AS inside FROM myindex WHERE MATCH('...') AND inside=1;