Create trained models API
Create trained models API
Creates a trained model.
Models created in version 7.8.0 are not backwards compatible with older node versions. If in a mixed cluster environment, all nodes must be at least 7.8.0 to use a model stored by a 7.8.0 node.
Request
PUT _ml/trained_models/<model_id>
Prerequisites
Requires the manage_ml
cluster privilege. This privilege is included in the machine_learning_admin
built-in role.
Description
The create trained model API enables you to supply a trained model that is not created by data frame analytics.
Path parameters
<model_id>
(Required, string) The unique identifier of the trained model.
Query parameters
defer_definition_decompression
(Optional, boolean) If set to true
and a compressed_definition
is provided, the request defers definition decompression and skips relevant validations. This deferral is useful for systems or users that know a good byte size estimate for their model and know that their model is valid and likely won’t fail during inference.
Request body
compressed_definition
(Required, string) The compressed (GZipped and Base64 encoded) inference definition of the model. If compressed_definition
is specified, then definition
cannot be specified.
definition
(Required, object) The inference definition for the model. If definition
is specified, then compressed_definition
cannot be specified.
Properties of definition
preprocessors
(Optional, object) Collection of preprocessors. See Preprocessor examples.
Properties of
preprocessors
frequency_encoding
(Required, object) Defines a frequency encoding for a field.
Properties of
frequency_encoding
feature_name
(Required, string) The name of the resulting feature.
field
(Required, string) The field name to encode.
frequency_map
(Required, object map of string:double) Object that maps the field value to the frequency encoded value.
custom
(Optional, Boolean) Boolean value indicating if the analytics job created the preprocessor or if a user provided it. This adjusts the feature importance calculation. When
true
, the feature importance calculation returns importance for the processed feature. Whenfalse
, the total importance of the original field is returned. Default isfalse
.one_hot_encoding
(Required, object) Defines a one hot encoding map for a field.
Properties of
one_hot_encoding
field
(Required, string) The field name to encode.
hot_map
(Required, object map of strings) String map of “field_value: one_hot_column_name”.
custom
(Optional, Boolean) Boolean value indicating if the analytics job created the preprocessor or if a user provided it. This adjusts the feature importance calculation. When
true
, the feature importance calculation returns importance for the processed feature. Whenfalse
, the total importance of the original field is returned. Default isfalse
.target_mean_encoding
(Required, object) Defines a target mean encoding for a field.
Properties of
target_mean_encoding
default_value
(Required, double) The feature value if the field value is not in the
target_map
.feature_name
(Required, string) The name of the resulting feature.
field
(Required, string) The field name to encode.
target_map
(Required, object map of string:double) Object that maps the field value to the target mean value.
custom
(Optional, Boolean) Boolean value indicating if the analytics job created the preprocessor or if a user provided it. This adjusts the feature importance calculation. When
true
, the feature importance calculation returns importance for the processed feature. Whenfalse
, the total importance of the original field is returned. Default isfalse
.
trained_model
(Required, object) The definition of the trained model.
Properties of
trained_model
tree
(Required, object) The definition for a binary decision tree.
Properties of
tree
classification_labels
(Optional, string) An array of classification labels (used for
classification
).feature_names
(Required, string) Features expected by the tree, in their expected order.
target_type
(Required, string) String indicating the model target type;
regression
orclassification
.tree_structure
(Required, object) An array of
tree_node
objects. The nodes must be in ordinal order by theirtree_node.node_index
value.tree_node
(Required, object) The definition of a node in a tree.
There are two major types of nodes: leaf nodes and not-leaf nodes.
- Leaf nodes only need
node_index
andleaf_value
defined. - All other nodes need
split_feature
,left_child
,right_child
,threshold
,decision_type
, anddefault_left
defined.
Properties of
tree_node
decision_type
(Optional, string) Indicates the positive value (in other words, when to choose the left node) decision type. Supported
lt
,lte
,gt
,gte
. Defaults tolte
.default_left
(Optional, Boolean) Indicates whether to default to the left when the feature is missing. Defaults to
true
.leaf_value
(Optional, double) The leaf value of the of the node, if the value is a leaf (in other words, no children).
left_child
(Optional, integer) The index of the left child.
node_index
(Integer) The index of the current node.
right_child
(Optional, integer) The index of the right child.
split_feature
(Optional, integer) The index of the feature value in the feature array.
split_gain
(Optional, double) The information gain from the split.
threshold
(Optional, double) The decision threshold with which to compare the feature value.
- Leaf nodes only need
ensemble
(Optional, object) The definition for an ensemble model. See Model examples.
Properties of
ensemble
aggregate_output
(Required, object) An aggregated output object that defines how to aggregate the outputs of the
trained_models
. Supported objects areweighted_mode
,weighted_sum
, andlogistic_regression
. See Aggregated output example.Properties of
aggregate_output
logistic_regression
(Optional, object) This
aggregated_output
type works with binary classification (classification for values [0, 1]). It multiplies the outputs (in the case of theensemble
model, the inference model values) by the suppliedweights
. The resulting vector is summed and passed to a sigmoid function. The result of thesigmoid
function is considered the probability of class 1 (P_1
), consequently, the probability of class 0 is1 - P_1
. The class with the highest probability (either 0 or 1) is then returned. For more information about logistic regression, see this wiki article.Properties of
logistic_regression
weights
(Required, double) The weights to multiply by the input values (the inference values of the trained models).
weighted_sum
(Optional, object) This
aggregated_output
type works with regression. The weighted sum of the input values.Properties of
weighted_sum
weights
(Required, double) The weights to multiply by the input values (the inference values of the trained models).
weighted_mode
(Optional, object) This
aggregated_output
type works with regression or classification. It takes a weighted vote of the input values. The most common input value (taking the weights into account) is returned.Properties of
weighted_mode
weights
(Required, double) The weights to multiply by the input values (the inference values of the trained models).
exponent
(Optional, object) This
aggregated_output
type works with regression. It takes a weighted sum of the input values and passes the result to an exponent function (e^x
wherex
is the sum of the weighted values).Properties of
exponent
weights
(Required, double) The weights to multiply by the input values (the inference values of the trained models).
classification_labels
(Optional, string) An array of classification labels.
feature_names
(Optional, string) Features expected by the ensemble, in their expected order.
target_type
(Required, string) String indicating the model target type;
regression
orclassification.
trained_models
(Required, object) An array of
trained_model
objects. Supported trained models aretree
andensemble
.
description
(Optional, string) A human-readable description of the inference trained model.
estimated_heap_memory_usage_bytes
(Optional, integer) [7.16.0] Deprecated in 7.16.0. Replaced by model_size_bytes
estimated_operations
(Optional, integer) The estimated number of operations to use the trained model during inference. This property is supported only if defer_definition_decompression
is true
or the model definition is not supplied.
inference_config
(Required, object) The default configuration for inference. This can be either a regression
or classification
configuration. It must match the underlying definition.trained_model
‘s target_type
.
Properties of inference_config
regression
(Optional, object) Regression configuration for inference.
Properties of regression inference
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to
predicted_value
.
classification
(Optional, object) Classification configuration for inference.
Properties of classification inference
num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
prediction_field_type
(Optional, string) Specifies the type of the predicted field to write. Acceptable values are:
string
,number
,boolean
. Whenboolean
is provided1.0
is transformed totrue
and0.0
tofalse
.results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to
predicted_value
.top_classes_results_field
(Optional, string) Specifies the field to which the top classes are written. Defaults to
top_classes
.
input
(Required, object) The input field names for the model definition.
Properties of input
field_names
(Required, string) An array of input field names for the model.
metadata
(Optional, object) An object map that contains metadata about the model.
model_size_bytes
(Optional, integer) The estimated memory usage in bytes to keep the trained model in memory. This property is supported only if defer_definition_decompression
is true
or the model definition is not supplied.
tags
(Optional, string) An array of tags to organize the model.
Examples
Preprocessor examples
The example below shows a frequency_encoding
preprocessor object:
{
"frequency_encoding":{
"field":"FlightDelayType",
"feature_name":"FlightDelayType_frequency",
"frequency_map":{
"Carrier Delay":0.6007414737092798,
"NAS Delay":0.6007414737092798,
"Weather Delay":0.024573576178086153,
"Security Delay":0.02476631010889467,
"No Delay":0.6007414737092798,
"Late Aircraft Delay":0.6007414737092798
}
}
}
The next example shows a one_hot_encoding
preprocessor object:
{
"one_hot_encoding":{
"field":"FlightDelayType",
"hot_map":{
"Carrier Delay":"FlightDelayType_Carrier Delay",
"NAS Delay":"FlightDelayType_NAS Delay",
"No Delay":"FlightDelayType_No Delay",
"Late Aircraft Delay":"FlightDelayType_Late Aircraft Delay"
}
}
}
This example shows a target_mean_encoding
preprocessor object:
{
"target_mean_encoding":{
"field":"FlightDelayType",
"feature_name":"FlightDelayType_targetmean",
"target_map":{
"Carrier Delay":39.97465788139886,
"NAS Delay":39.97465788139886,
"Security Delay":203.171206225681,
"Weather Delay":187.64705882352948,
"No Delay":39.97465788139886,
"Late Aircraft Delay":39.97465788139886
},
"default_value":158.17995752420433
}
}
Model examples
The first example shows a trained_model
object:
{
"tree":{
"feature_names":[
"DistanceKilometers",
"FlightTimeMin",
"FlightDelayType_NAS Delay",
"Origin_targetmean",
"DestRegion_targetmean",
"DestCityName_targetmean",
"OriginAirportID_targetmean",
"OriginCityName_frequency",
"DistanceMiles",
"FlightDelayType_Late Aircraft Delay"
],
"tree_structure":[
{
"decision_type":"lt",
"threshold":9069.33437193022,
"split_feature":0,
"split_gain":4112.094574306927,
"node_index":0,
"default_left":true,
"left_child":1,
"right_child":2
},
...
{
"node_index":9,
"leaf_value":-27.68987349695448
},
...
],
"target_type":"regression"
}
}
The following example shows an ensemble
model object:
"ensemble":{
"feature_names":[
...
],
"trained_models":[
{
"tree":{
"feature_names":[],
"tree_structure":[
{
"decision_type":"lte",
"node_index":0,
"leaf_value":47.64069875778043,
"default_left":false
}
],
"target_type":"regression"
}
},
...
],
"aggregate_output":{
"weighted_sum":{
"weights":[
...
]
}
},
"target_type":"regression"
}
Aggregated output example
Example of a logistic_regression
object:
"aggregate_output" : {
"logistic_regression" : {
"weights" : [2.0, 1.0, .5, -1.0, 5.0, 1.0, 1.0]
}
}
Example of a weighted_sum
object:
"aggregate_output" : {
"weighted_sum" : {
"weights" : [1.0, -1.0, .5, 1.0, 5.0]
}
}
Example of a weighted_mode
object:
"aggregate_output" : {
"weighted_mode" : {
"weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
}
}
Example of an exponent
object:
"aggregate_output" : {
"exponent" : {
"weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
}
}
Trained models JSON schema
For the full JSON schema of trained models, click here.