Inference processor
Inference processor
Uses a pre-trained data frame analytics model to infer against the data that is being ingested in the pipeline.
Table 25. Inference Options
Name | Required | Default | Description |
---|---|---|---|
| yes | - | (String) The ID or alias for the trained model. |
| no |
| (String) Field added to incoming documents to contain results objects. |
| no | If defined the model’s default field map | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration. |
| no | The default settings defined in the model | (Object) Contains the inference type and its options. There are two types: regression and classification. |
| no | - | Description of the processor. Useful for describing the purpose of the processor or its configuration. |
| no | - | Conditionally execute the processor. See Conditionally run a processor. |
| no |
| Ignore failures for the processor. See Handling pipeline failures. |
| no | - | Handle failures for the processor. See Handling pipeline failures. |
| no | - | Identifier for the processor. Useful for debugging and metrics. |
{
"inference": {
"model_id": "flight_delay_regression-1571767128603",
"target_field": "FlightDelayMin_prediction_infer",
"field_map": {
"your_field": "my_field"
},
"inference_config": { "regression": {} }
}
}
Regression configuration options
Regression configuration for inference.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the results_field
value of the data frame analytics job that was used to train the model, which defaults to <dependent_variable>_prediction
.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
Classification configuration options
Classification configuration for inference.
num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the results_field
value of the data frame analytics job that was used to train the model, which defaults to <dependent_variable>_prediction
.
top_classes_results_field
(Optional, string) Specifies the field to which the top classes are written. Defaults to top_classes
.
prediction_field_type
(Optional, string) Specifies the type of the predicted field to write. Acceptable values are: string
, number
, boolean
. When boolean
is provided 1.0
is transformed to true
and 0.0
to false
.
inference_config
examples
"inference":{
"model_id":"my_model_id"
"inference_config": {
"regression": {
"results_field": "my_regression"
}
}
}
This configuration specifies a regression
inference and the results are written to the my_regression
field contained in the target_field
results object.
"inference":{
"model_id":"my_model_id"
"inference_config": {
"classification": {
"num_top_classes": 2,
"results_field": "prediction",
"top_classes_results_field": "probabilities"
}
}
}
This configuration specifies a classification
inference. The number of categories for which the predicted probabilities are reported is 2 (num_top_classes
). The result is written to the prediction
field and the top classes to the probabilities
field. Both fields are contained in the target_field
results object.
Refer to the language identification trained model documentation for a full example.
Feature importance object mapping
To get the full benefit of aggregating and searching for feature importance, update your index mapping of the feature importance result field as you can see below:
"ml.inference.feature_importance": {
"type": "nested",
"dynamic": true,
"properties": {
"feature_name": {
"type": "keyword"
},
"importance": {
"type": "double"
}
}
}
The mapping field name for feature importance (in the example above, it is ml.inference.feature_importance
) is compounded as follows:
<ml.inference.target_field>
.<inference.tag>
.feature_importance
<ml.inference.target_field>
: defaults toml.inference
.<inference.tag>
: if is not provided in the processor definition, then it is not part of the field path.
For example, if you provide a tag foo
in the definition as you can see below:
{
"tag": "foo",
...
}
Then, the feature importance value is written to the ml.inference.foo.feature_importance
field.
You can also specify the target field as follows:
{
"tag": "foo",
"target_field": "my_field"
}
In this case, feature importance is exposed in the my_field.foo.feature_importance
field.