Rank feature query
Rank feature query
Boosts the relevance score of documents based on the numeric value of a rank_feature or rank_features field.
The rank_feature
query is typically used in the should
clause of a bool query so its relevance scores are added to other scores from the bool
query.
With positive_score_impact
set to false
for a rank_feature
or rank_features
field, we recommend that every document that participates in a query has a value for this field. Otherwise, if a rank_feature
query is used in the should clause, it doesn’t add anything to a score of a document with a missing value, but adds some boost for a document containing a feature. This is contrary to what we want – as we consider these features negative, we want to rank documents containing them lower than documents missing them.
Unlike the function_score query or other ways to change relevance scores, the rank_feature
query efficiently skips non-competitive hits when the track_total_hits parameter is not true
. This can dramatically improve query speed.
Rank feature functions
To calculate relevance scores based on rank feature fields, the rank_feature
query supports the following mathematical functions:
If you don’t know where to start, we recommend using the saturation
function. If no function is provided, the rank_feature
query uses the saturation
function by default.
Example request
Index setup
To use the rank_feature
query, your index must include a rank_feature or rank_features field mapping. To see how you can set up an index for the rank_feature
query, try the following example.
Create a test
index with the following field mappings:
pagerank
, a rank_feature field which measures the importance of a websiteurl_length
, a rank_feature field which contains the length of the website’s URL. For this example, a long URL correlates negatively to relevance, indicated by apositive_score_impact
value offalse
.topics
, a rank_features field which contains a list of topics and a measure of how well each document is connected to this topic
PUT /test
{
"mappings": {
"properties": {
"pagerank": {
"type": "rank_feature"
},
"url_length": {
"type": "rank_feature",
"positive_score_impact": false
},
"topics": {
"type": "rank_features"
}
}
}
}
Index several documents to the test
index.
PUT /test/_doc/1?refresh
{
"url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
"content": "Rio 2016",
"pagerank": 50.3,
"url_length": 42,
"topics": {
"sports": 50,
"brazil": 30
}
}
PUT /test/_doc/2?refresh
{
"url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
"content": "Formula One motor race held on 13 November 2016",
"pagerank": 50.3,
"url_length": 47,
"topics": {
"sports": 35,
"formula one": 65,
"brazil": 20
}
}
PUT /test/_doc/3?refresh
{
"url": "https://en.wikipedia.org/wiki/Deadpool_(film)",
"content": "Deadpool is a 2016 American superhero film",
"pagerank": 50.3,
"url_length": 37,
"topics": {
"movies": 60,
"super hero": 65
}
}
Example query
The following query searches for 2016
and boosts relevance scores based on pagerank
, url_length
, and the sports
topic.
GET /test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "2016"
}
}
],
"should": [
{
"rank_feature": {
"field": "pagerank"
}
},
{
"rank_feature": {
"field": "url_length",
"boost": 0.1
}
},
{
"rank_feature": {
"field": "topics.sports",
"boost": 0.4
}
}
]
}
}
}
Top-level parameters for rank_feature
field
(Required, string) rank_feature or rank_features field used to boost relevance scores.
boost
(Optional, float) Floating point number used to decrease or increase relevance scores. Defaults to 1.0
.
Boost values are relative to the default value of 1.0
. A boost value between 0
and 1.0
decreases the relevance score. A value greater than 1.0
increases the relevance score.
saturation
(Optional, function object) Saturation function used to boost relevance scores based on the value of the rank feature field
. If no function is provided, the rank_feature
query defaults to the saturation
function. See Saturation for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
log
(Optional, function object) Logarithmic function used to boost relevance scores based on the value of the rank feature field
. See Logarithm for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
sigmoid
(Optional, function object) Sigmoid function used to boost relevance scores based on the value of the rank feature field
. See Sigmoid for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
linear
(Optional, function object) Linear function used to boost relevance scores based on the value of the rank feature field
. See Linear for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
Notes
Saturation
The saturation
function gives a score equal to S / (S + pivot)
, where S
is the value of the rank feature field and pivot
is a configurable pivot value so that the result will be less than 0.5
if S
is less than pivot and greater than 0.5
otherwise. Scores are always (0,1)
.
If the rank feature has a negative score impact then the function will be computed as pivot / (S + pivot)
, which decreases when S
increases.
GET /test/_search
{
"query": {
"rank_feature": {
"field": "pagerank",
"saturation": {
"pivot": 8
}
}
}
}
If a pivot
value is not provided, Elasticsearch computes a default value equal to the approximate geometric mean of all rank feature values in the index. We recommend using this default value if you haven’t had the opportunity to train a good pivot value.
GET /test/_search
{
"query": {
"rank_feature": {
"field": "pagerank",
"saturation": {}
}
}
}
Logarithm
The log
function gives a score equal to log(scaling_factor + S)
, where S
is the value of the rank feature field and scaling_factor
is a configurable scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET /test/_search
{
"query": {
"rank_feature": {
"field": "pagerank",
"log": {
"scaling_factor": 4
}
}
}
}
Sigmoid
The sigmoid
function is an extension of saturation
which adds a configurable exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are (0,1)
.
The exponent
must be positive and is typically in [0.5, 1]
. A good value should be computed via training. If you don’t have the opportunity to do so, we recommend you use the saturation
function instead.
GET /test/_search
{
"query": {
"rank_feature": {
"field": "pagerank",
"sigmoid": {
"pivot": 7,
"exponent": 0.6
}
}
}
}
Linear
The linear
function is the simplest function, and gives a score equal to the indexed value of S
, where S
is the value of the rank feature field. If a rank feature field is indexed with "positive_score_impact": true
, its indexed value is equal to S
and rounded to preserve only 9 significant bits for the precision. If a rank feature field is indexed with "positive_score_impact": false
, its indexed value is equal to 1/S
and rounded to preserve only 9 significant bits for the precision.
GET /test/_search
{
"query": {
"rank_feature": {
"field": "pagerank",
"linear": {}
}
}
}