Working with features

The following sections describe the specific functionality provided by the Learning to Rank plugin. This information will help you build and upload features for your learning to rank (LTR) system. See ML ranking core concepts and Scope of the plugin for more information about the Learning to Rank plugin’s roles and functionality.

Understanding the role of features in the Learning to Rank plugin

The Learning to Rank plugin defines a feature as an OpenSearch query. When you execute an OpenSearch query using your search terms and other relevant parameters, the resulting score is the value that can be used in your training data. For example, a feature may include basic match queries on fields such as title:

  1. {
  2. "query": {
  3. "match": {
  4. "title": "{{keywords}}"
  5. }
  6. }
  7. }

copy

In addition to simple query-based features, you can also use document properties, such as popularity, as features. For example, you can use a function score query to get the average movie rating:

  1. {
  2. "query": {
  3. "function_score": {
  4. "functions": {
  5. "field": "vote_average"
  6. },
  7. "query": {
  8. "match_all": {}
  9. }
  10. }
  11. }
  12. }

copy

Another example is a query based on location, such as a geodistance filter:

  1. {
  2. "query": {
  3. "bool" : {
  4. "must" : {
  5. "match_all" : {}
  6. },
  7. "filter" : {
  8. "geo_distance" : {
  9. "distance" : "200km",
  10. "pin.location" : {
  11. "lat" : "{{users_lat}}",
  12. "lon" : "{{users_lon}}"
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }

copy

These types of queries are the building blocks that the ranking f function you are training combines mathematically to determine a relevance score.

Using Mustache templates in LTR queries

The features in LTR queries use Mustache templates. This allows you to insert variables into your search queries. For example, you could have a query that uses {{keywords}} to insert your search terms. Or you could use {{users_lat}} and {{users_lon}} to include the location. This gives you the flexibility to personalize your search.

Uploading and naming features

The Learning to Rank plugin enables you to create and modify features. After you define your features, you can log them for use in model training. By combining the logged feature data with your judgment list, you can train a model. Once the model is ready, you can upload it and then apply it to your search queries.

Initializing the default feature store

The Learning to Rank plugin uses a feature store to store metadata about your features and models. Typically, there is one feature store per major search implementation, for example, Wikipedia as compared to Wikitravel.

For most uses cases, you can use the default feature store and avoid managing multiple feature stores. To initialize the default feature store, run the following request:

  1. PUT _ltr

copy

If you need to start again from the beginning, you can delete the default feature store by using the following operation:

  1. DELETE _ltr

copy

Deleting the feature store removes all existing feature and model data.

The default feature store is used throughout the rest of this guide.

Working with features and feature sets

A feature set is a collection of features that have been grouped together. You can use feature sets to log multiple feature values for offline training. When creating a new model, you copy the relevant feature set into the model definition.

Creating feature sets

To create a feature set, you can send a POST request. When creating the feature set, you provide a name and an optional list of features, as shown in the following example request:

  1. POST _ltr/_featureset/more_movie_features
  2. {
  3. "featureset": {
  4. "features": [
  5. {
  6. "name": "title_query",
  7. "params": [
  8. "keywords"
  9. ],
  10. "template_language": "mustache",
  11. "template": {
  12. "match": {
  13. "title": "{{keywords}}"
  14. }
  15. }
  16. },
  17. {
  18. "name": "title_query_boost",
  19. "params": [
  20. "some_multiplier"
  21. ],
  22. "template_language": "derived_expression",
  23. "template": "title_query * some_multiplier"
  24. },
  25. {
  26. "name": "custom_title_query_boost",
  27. "params": [
  28. "some_multiplier"
  29. ],
  30. "template_language": "script_feature",
  31. "template": {
  32. "lang": "painless",
  33. "source": "params.feature_vector.get('title_query') * (long)params.some_multiplier",
  34. "params": {
  35. "some_multiplier": "some_multiplier"
  36. }
  37. }
  38. }
  39. ]
  40. }
  41. }

copy

Managing feature sets

To fetch a specific feature set, you can use the following request:

  1. GET _ltr/_featureset/more_movie_features

copy

To see a list of all defined feature sets, you can use the following request:

  1. GET _ltr/_featureset

copy

If you have many feature sets, you can filter the list by using a prefix, as shown in the following example request:

  1. GET _ltr/_featureset?prefix=mor

copy

This returns only the feature sets with names starting with mor.

If you need to start over, you can delete a feature set using the following request:

  1. DELETE _ltr/_featureset/more_movie_features

copy

Validating features

When adding new features, you should validate that the features work as expected. You can do this by adding a validation block in your feature creation request. This allows the Learning to Rank plugin to run the query before adding the feature, catching any issues early. If you do not run this validation, you may not discover until later that the query, while valid JSON, contains a malformed OpenSearch query.

To run validation, you can specify the test parameters and the index to use, as shown in the following example validation block:

  1. "validation": {
  2. "params": {
  3. "keywords": "rambo"
  4. },
  5. "index": "tmdb"
  6. },

copy

Place the validation block alongside your feature set definition. In the following example, the match query is malformed (curly brackets are missing in the Mustache template). The validation fails, returning an error:

  1. {
  2. "validation": {
  3. "params": {
  4. "keywords": "rambo"
  5. },
  6. "index": "tmdb"
  7. },
  8. "featureset": {
  9. "features": [
  10. {
  11. "name": "title_query",
  12. "params": [
  13. "keywords"
  14. ],
  15. "template_language": "mustache",
  16. "template": {
  17. "match": {
  18. "title": "{{keywords"
  19. }
  20. }
  21. }
  22. ]
  23. }
  24. }

copy

Expanding feature sets

You may not initially know which features are the most useful. In these cases, you can later add new features to an existing feature set for logging and model evaluation. For example, if you want to create a user_rating feature, you can use the Feature Set Append API, as shown in the following example request:

  1. POST /_ltr/_featureset/my_featureset/_addfeatures
  2. {
  3. "features": [{
  4. "name": "user_rating",
  5. "params": [],
  6. "template_language": "mustache",
  7. "template" : {
  8. "function_score": {
  9. "functions": {
  10. "field": "vote_average"
  11. },
  12. "query": {
  13. "match_all": {}
  14. }
  15. }
  16. }
  17. }]
  18. }

copy

Enforcing unique feature names

The Learning to Rank plugin enforces unique names for each feature. This is because some model training libraries refer to features by name. In the preceding example, you could not add a new user_rating feature without causing an error because that feature name is already in use.

Treating feature sets as lists

Feature sets are more like ordered lists than simple sets. Each feature has both a name and an ordinal position. Some LTR training applications, such as RankLib, refer to features by their ordinal position (for example, 1st feature, 2nd feature). Others may use the feature name. When working with logged features, you may need to handle both the ordinal and the name because the ordinal is preserved to maintain the list order.

Next steps

Learn about feature engineering and advanced functionality.