Deploy and manage Learning To Rank models
Deploy and manage Learning To Rank models
This feature was introduced in version 8.12.0 and is only available to certain subscription levels. For more information, see https://www.elastic.co/subscriptions.
Train and deploy a model using Eland
Typically, the XGBoost model training process uses standard Python data science tools like Pandas and scikit-learn.
We have developed an example notebook available in the elasticsearch-labs
repo. This interactive Python notebook details an end-to-end model training and deployment workflow.
We highly recommend using eland in your workflow, because it provides important functionalities for working with LTR in Elasticsearch. Use eland to:
- Configure feature extraction
- Extract features for training
- Deploy the model in Elasticsearch
Configure feature extraction in Eland
Feature extractors are defined using templated queries. Eland provides the eland.ml.ltr.QueryFeatureExtractor
to define these feature extractors directly in Python:
from eland.ml.ltr import QueryFeatureExtractor
feature_extractors=[
# We want to use the BM25 score of the match query for the title field as a feature:
QueryFeatureExtractor(
feature_name="title_bm25",
query={"match": {"title": "{{query}}"}}
),
# We want to use the the number of matched terms in the title field as a feature:
QueryFeatureExtractor(
feature_name="title_matched_term_count",
query={
"script_score": {
"query": {"match": {"title": "{{query}}"}},
"script": {"source": "return _termStats.matchedTermsCount();"},
}
},
),
# We can use a script_score query to get the value
# of the field rating directly as a feature:
QueryFeatureExtractor(
feature_name="popularity",
query={
"script_score": {
"query": {"exists": {"field": "popularity"}},
"script": {"source": "return doc['popularity'].value;"},
}
},
),
# We extract the number of terms in the query as feature.
QueryFeatureExtractor(
feature_name="query_term_count",
query={
"script_score": {
"query": {"match": {"title": "{{query}}"}},
"script": {"source": "return _termStats.uniqueTermsCount();"},
}
},
),
]
Tern statistics as features
It is very common for an LTR model to leverage raw term statistics as features. To extract this information, you can use the term statistics feature provided as part of the script_score query.
Once the feature extractors have been defined, they are wrapped in an eland.ml.ltr.LTRModelConfig
object for use in later training steps:
from eland.ml.ltr import LTRModelConfig
ltr_config = LTRModelConfig(feature_extractors)
Extracting features for training
Building your dataset is a critical step in the training process. This involves extracting relevant features and adding them to your judgment list. We recommend using Eland’s eland.ml.ltr.FeatureLogger
helper class for this process.
from eland.ml.ltr import FeatureLogger
# Create a feature logger that will be used to query {es} to retrieve the features:
feature_logger = FeatureLogger(es_client, MOVIE_INDEX, ltr_config)
The FeatureLogger provides an extract_features
method which enables you to extract features for a list of specific documents from your judgment list. At the same time, you can pass query parameters to the feature extractors defined earlier:
feature_logger.extract_features(
query_params={"query": "foo"},
doc_ids=["doc-1", "doc-2"]
)
Our example notebook explains how to use the FeatureLogger
to build a training dataset, by adding features to a judgment list.
Notes on feature extraction
- We strongly advise against implementing feature extraction on your own. It’s crucial to maintain consistency in feature extraction between the training environment and inference in Elasticsearch. By using eland tooling, which is developed and tested in tandem with Elasticsearch, you can ensure that they function together consistently.
- Feature extraction is performed by executing queries on the Elasticsearch server. This could put a lot of stress on your cluster, especially when your judgment list contains a lot of examples or you have many features. Our feature logger implementation is designed to minimize the number of search requests sent to the server and reduce load. However, it might be best to build your training dataset using an Elasticsearch cluster that is isolated from any user-facing, production traffic.
Deploy your model into Elasticsearch
Once your model is trained you will be able to deploy it in your Elasticsearch cluster. You can use Eland’s MLModel.import_ltr_model method
:
from eland.ml import MLModel
LEARNING_TO_RANK_MODEL_ID="ltr-model-xgboost"
MLModel.import_ltr_model(
es_client=es_client,
model=ranker,
model_id=LEARNING_TO_RANK_MODEL_ID,
ltr_model_config=ltr_config,
es_if_exists="replace",
)
This method will serialize the trained model and the Learning To Rank configuration (including feature extraction) in a format that Elasticsearch can understand. The model is then deployed to Elasticsearch using the Create Trained Models API.
The following types of models are currently supported for LTR with Elasticsearch:
More model types will be supported in the future.
Learning To Rank model management
Once your model is deployed in Elasticsearch you can manage it using the trained model APIs. You’re now ready to work with your LTR model as a rescorer at search time.