Logging feature scores

Feature values need to be logged in order to train a model. This is a crucial component of the Learning to Rank plugin—as you search, feature values from the feature sets are logged so that they can be used for training. This allows models that effectively predict relevance using that set of features to be discovered.

sltr query

The sltr query is the primary method for running features and evaluating models. When logging, an sltr query is used to execute each feature query and retrieve the feature scores. A feature set structure that works with the hello-ltr demo schema is shown in the following example request:

  1. PUT _ltr/_featureset/more_movie_features
  2. {
  3. "name": "more_movie_features",
  4. "features": [
  5. {
  6. "name": "body_query",
  7. "params": [
  8. "keywords"
  9. ],
  10. "template": {
  11. "match": {
  12. "overview": "{{keywords}}"
  13. }
  14. }
  15. },
  16. {
  17. "name": "title_query",
  18. "params": [
  19. "keywords"
  20. ],
  21. "template": {
  22. "match": {
  23. "title": "{{keywords}}"
  24. }
  25. }
  26. }
  27. ]
  28. }

copy

Common use cases

Common use cases for logging feature sets are described in the following sections.

Joining feature values with a judgment list

If the judgment list is already available, you can join feature values for each keyword/document pair to create a complete training set. For example, consider the following judgment list:

  1. grade,keywords,docId
  2. 4,rambo,7555
  3. 3,rambo,1370
  4. 3,rambo,1369
  5. 4,rocky,4241

copy

The feature values need to be retrieved for all documents that have a judgment for each search term, one search term at a time. For example, starting with a rambo search, a filter can be created for the associated document as follows:

  1. {
  2. "filter": [
  3. {"terms": {
  4. "_id": ["7555", "1370", "1369"]
  5. }}
  6. ]
  7. }

copy

The Learning to Rank plugin must point to the features to be logged. The sltr query, which is part of the plugin, can be used for this purpose. The sltr query has a _name (the named queries feature) used to reference it, refers to the previously created feature set more_movie_features, and passes the search keyword rambo and any other required parameters, as shown in the following example query:

  1. {
  2. "sltr": {
  3. "_name": "logged_featureset",
  4. "featureset": "more_movie_features",
  5. "params": {
  6. "keywords": "rambo"
  7. }
  8. }
  9. }

copy

Searching with LTR provides an sltr query to use for executing a model. This sltr query is used as a mechanism to direct the Learning to Rank plugin to the feature set requiring logging.

To avoid influencing the score, the sltr query is injected as a filter, as shown in the following example:

  1. {
  2. "query": {
  3. "bool": {
  4. "filter": [
  5. {
  6. "terms": {
  7. "_id": [
  8. "7555",
  9. "1370",
  10. "1369"
  11. ]
  12. }
  13. },
  14. {
  15. "sltr": {
  16. "_name": "logged_featureset",
  17. "featureset": "more_movie_features",
  18. "params": {
  19. "keywords": "rambo"
  20. }
  21. }
  22. }
  23. ]
  24. }
  25. }
  26. }

copy

Executing this query returns the three expected hits. The next step is to enable feature logging to refer to the sltr query to be logged.

The logging identifies the sltr query, runs the feature set’s queries, scores each document, and returns those scores as computed fields for each document, as shown in the following example logging structure:

  1. "ext": {
  2. "ltr_log": {
  3. "log_specs": {
  4. "name": "log_entry1",
  5. "named_query": "logged_featureset"
  6. }
  7. }
  8. }

copy

The log extension supports the following arguments:

  • name: The name of the log entry to fetch from each document.
  • named_query: The named query that corresponds to an sltr query.
  • rescore_index: If the sltr query is in a rescore phase, then this is the index of the query in the rescore list.
  • missing_as_zero: Produces a 0 for missing features (when the feature does not match). Default is false.

To enable the log to locate an sltr query, either during the normal query phase or during rescoring, either named_query or rescore_index must be set.

The full example request is as follows:

  1. POST tmdb/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "filter": [
  6. {
  7. "terms": {
  8. "_id": ["7555", "1370", "1369"]
  9. }
  10. },
  11. {
  12. "sltr": {
  13. "_name": "logged_featureset",
  14. "featureset": "more_movie_features",
  15. "params": {
  16. "keywords": "rambo"
  17. }
  18. }}
  19. ]
  20. }
  21. },
  22. "ext": {
  23. "ltr_log": {
  24. "log_specs": {
  25. "name": "log_entry1",
  26. "named_query": "logged_featureset"
  27. }
  28. }
  29. }
  30. }

copy

Each document now contains a log entry, as shown in the following example:

  1. {
  2. "_index": "tmdb",
  3. "_type": "movie",
  4. "_id": "1370",
  5. "_score": 20.291,
  6. "_source": {
  7. ...
  8. },
  9. "fields": {
  10. "_ltrlog": [
  11. {
  12. "log_entry1": [
  13. {"name": "title_query"
  14. "value": 9.510193},
  15. {"name": "body_query
  16. "value": 10.7808075}
  17. ]
  18. }
  19. ]
  20. },
  21. "matched_queries": [
  22. "logged_featureset"
  23. ]
  24. }

copy

The judgment list can be joined with the feature values to produce a training set. For the line corresponding to document 1370 with keyword rambo, the following can be added:

  1. > 4 qid:1 1:9.510193 2:10.7808075

copy

Repeat this process for all of your queries.

For large judgment lists, it is recommended to batch the logs for multiple queries. You can use multi-search capabilities for this purpose.

Logging values for a live feature set

If you are running in production with a model being executed within an sltr query, a live model may appear similar to the following example request:

  1. POST tmdb/_search
  2. {
  3. "query": {
  4. "match": {
  5. "_all": "rambo"
  6. }
  7. },
  8. "rescore": {
  9. "query": {
  10. "rescore_query": {
  11. "sltr": {
  12. "params": {
  13. "keywords": "rambo"
  14. },
  15. "model": "my_model"
  16. }
  17. }
  18. }
  19. }
  20. }

copy

See Searching with LTR for information about model execution.

To log the feature values for the query, apply the appropriate logging spec to reference the sltr query, as shown in the following example:

  1. "ext": {
  2. "ltr_log": {
  3. "log_specs": {
  4. "name": "log_entry1",
  5. "rescore_index": 0
  6. }
  7. }
  8. }

copy

The example logs the features in the response, enabling future model retraining using the same feature set.

Modifying and logging an existing feature set

Feature sets can be expanded. For example, as shown in the following example request, if a new feature, such as user_rating, needs to be incorporated, it can be added to the existing feature set more_movie_features:

  1. PUT _ltr/_feature/user_rating/_addfeatures
  2. {
  3. "features": [
  4. "name": "user_rating",
  5. "params": [],
  6. "template_language": "mustache",
  7. "template" : {
  8. "function_score": {
  9. "functions": {
  10. "field": "vote_average"
  11. },
  12. "query": {
  13. "match_all": {}
  14. }
  15. }
  16. }
  17. ]
  18. }

copy

See Working with features for more information.

When logging is performed, the new feature is included in the output, as shown in the following example:

  1. {
  2. "log_entry1": [
  3. {
  4. "name": "title_query",
  5. "value": 9.510193
  6. },
  7. {
  8. "name": "body_query",
  9. "value": 10.7808075
  10. },
  11. {
  12. "name": "user_rating",
  13. "value": 7.8
  14. }
  15. ]
  16. }

copy

Logging values for a proposed feature set

You can create a completely new feature set for experimental purposes, for example, other_movie_features, as shown in the following example request:

  1. PUT _ltr/_featureset/other_movie_features
  2. {
  3. "name": "other_movie_features",
  4. "features": [
  5. {
  6. "name": "cast_query",
  7. "params": [
  8. "keywords"
  9. ],
  10. "template": {
  11. "match": {
  12. "cast.name": "{{keywords}}"
  13. }
  14. }
  15. },
  16. {
  17. "name": "genre_query",
  18. "params": [
  19. "keywords"
  20. ],
  21. "template": {
  22. "match": {
  23. "genres.name": "{{keywords}}"
  24. }
  25. }
  26. }
  27. ]
  28. }

copy

The feature set, other_movie_features, can be logged alongside the live production set, more_movie_features, by appending it as another filter, as shown in the following example request:

  1. POST tmdb/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "filter": [
  6. { "sltr": {
  7. "_name": "logged_featureset",
  8. "featureset": "other_movie_features",
  9. "params": {
  10. "keywords": "rambo"
  11. }
  12. }},
  13. {"match": {
  14. "_all": "rambo"
  15. }}
  16. ]
  17. }
  18. },
  19. "rescore": {
  20. "query": {
  21. "rescore_query": {
  22. "sltr": {
  23. "params": {
  24. "keywords": "rambo"
  25. },
  26. "model": "my_model"
  27. }
  28. }
  29. }
  30. }
  31. }

copy

You can continue adding as many feature sets as needed for logging.

Logging scenarios

Once you have covered the basics, you can consider some real-life feature logging scenarios.

First, logging is used to develop judgment lists from user analytics to capture the exact value of a feature at the precise time of interaction. For instance, you may want to know the recency, title score, and other values at the precise time of a user’s interaction. This would help you analyze which features or factors had relevance while training. To achieve this, you can build a comprehensive feature set for future experimentation.

Second, logging can be used to retrain a model in which you already have confidence. You may want to keep your models up to date with a shifting index because models can lose their effectiveness over time. You may have A/B testing in place or be monitoring business metrics and notice gradual degradation in model performance.

Third, logging is used during model development. You may have a judgment list but want to iterate heavily with a local copy of OpenSearch. This allows for extensive experimentation with new features, adding and removing them from the feature sets as needed. While this process may result in being slightly out of sync with the live index, the goal is to arrive at a set of satisfactory model parameters. Once this is achieved, the model can be trained with production data to confirm that the level of performance remains acceptable.

Next steps

Learn more about training models in the Uploading a trained model documentation.