- metrics
- Training metrics
- Predefined metrics:
accuracy
[source] [test]accuracy_thresh
[source] [test]top_k_accuracy
[source] [test]dice
[source] [test]error_rate
[source] [test]mean_squared_error
[source] [test]mean_absolute_error
[source] [test]mean_squared_logarithmic_error
[source] [test]exp_rmspe
[source] [test]root_mean_squared_error
[source] [test]fbeta
[source] [test]explained_variance
[source] [test]r2_score
[source] [test]
class
RMSE
[source] [test]class
ExpRMSPE
[source] [test]class
Precision
[source] [test]class
Recall
[source] [test]class
FBeta
[source] [test]class
R2Score
[source] [test]class
ExplainedVariance
[source] [test]class
MatthewsCorreff
[source] [test]class
KappaScore
[source] [test]class
ConfusionMatrix
[source] [test]class
MultiLabelFbeta
[source] [test]auc_roc_score
[source] [test]roc_curve
[source] [test]class
AUROC
[source] [test]
- Creating your own metric
metrics
Useful metrics for training
Training metrics
Metrics for training fastai models are simply functions that take input
and target
tensors, and return some metric of interest for training. You can write your own metrics by defining a function of that type, and passing it to Learner
in the metrics
parameter, or use one of the following pre-defined functions.
Predefined metrics:
accuracy
[source][test]
accuracy
(input
:Tensor
,targs
:Tensor
) →Rank0Tensor
Tests found foraccuracy
:
pytest -sv tests/test_metrics.py::test_accuracy
[source]pytest -sv tests/test_vision_train.py::test_accuracy
[source]
To run tests please refer to this guide.
Computes accuracy with targs
when input
is bs * n_classes.
Warning: This metric is intended for classification of objects belonging to a single class.
preds = tensor([0.4, 0.6], [0.3, 0.7], [0.2, 0.8], [0.6, 0.4], [0.9, 0.1]) # bs = 5, n = 2
ys = tensor([1], [0], [1], [0], [1])
accuracy(preds, ys)
tensor(0.6000)
accuracy_thresh
[source][test]
accuracy_thresh
(y_pred
:Tensor
,y_true
:Tensor
,thresh
:float
=0.5
,sigmoid
:bool
=True
) →Rank0Tensor
Tests found foraccuracy_thresh
:
pytest -sv tests/test_metrics.py::test_accuracy_thresh
[source]
To run tests please refer to this guide.
Computes accuracy when y_pred
and y_true
are the same size.
Predictions are compared to thresh
after sigmoid
is maybe applied. Then we count the numbers that match the targets.
Note: This function is intended for one-hot-encoded targets (often in a multiclassification problem).
preds = tensor([0.4, 0.6], [0.3, 0.7], [0.2, 0.8], [0.6, 0.4], [0.9, 0.1])
ys = tensor([0, 1], [1, 0], [0, 1], [1, 0], [0, 1])
accuracy_thresh(preds, ys, thresh=0.65, sigmoid=False)
tensor(0.4000)
top_k_accuracy
[source][test]
top_k_accuracy
(input
:Tensor
,targs
:Tensor
,k
:int
=5
) →Rank0Tensor
Tests found fortop_k_accuracy
:
pytest -sv tests/test_metrics.py::test_top_k_accuracy
[source]
To run tests please refer to this guide.
Computes the Top-k accuracy (target is in the top k predictions).
dice
[source][test]
dice
(input
:Tensor
,targs
:Tensor
,iou
:bool
=False
,eps
:float
=1e-08
) →Rank0Tensor
Tests found fordice
:
pytest -sv tests/test_metrics.py::test_dice
[source]pytest -sv tests/test_metrics.py::test_dice_iou
[source]
To run tests please refer to this guide.
Dice coefficient metric for binary target. If iou=True, returns iou metric, classic for segmentation problems.
dice = frac{2(TP)}{2(TP) + FP + FN}
where TP, FP and FN are the number of true positives, false positives and false negatives.
preds = tensor([0.4, 0.6], [0.3, 0.7], [0.2, 0.8], [0.6, 0.4], [0.9, 0.1])
ys = tensor([1], [0], [1], [0], [1])
dice(preds, ys) # TP = 2, FP = 1, FN = 1
tensor(0.6667)
error_rate
[source][test]
error_rate
(input
:Tensor
,targs
:Tensor
) →Rank0Tensor
Tests found forerror_rate
:
pytest -sv tests/test_metrics.py::test_error_rate
[source]pytest -sv tests/test_vision_train.py::test_error_rate
[source]
To run tests please refer to this guide.
1 - accuracy
mean_squared_error
[source][test]
mean_squared_error
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found formean_squared_error
:
pytest -sv tests/test_metrics.py::test_mse
[source]
To run tests please refer to this guide.
Mean squared error between pred
and targ
.
mean_absolute_error
[source][test]
mean_absolute_error
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found formean_absolute_error
:
pytest -sv tests/test_metrics.py::test_mae
[source]
To run tests please refer to this guide.
Mean absolute error between pred
and targ
.
mean_squared_logarithmic_error
[source][test]
mean_squared_logarithmic_error
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found formean_squared_logarithmic_error
:
pytest -sv tests/test_metrics.py::test_msle
[source]
To run tests please refer to this guide.
Mean squared logarithmic error between pred
and targ
.
exp_rmspe
[source][test]
exp_rmspe
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found forexp_rmspe
:
pytest -sv tests/test_metrics.py::test_exp_rmspe
[source]pytest -sv tests/test_metrics.py::test_exp_rmspe_num_of_ele
[source]
To run tests please refer to this guide.
Exp RMSE between pred
and targ
.
root_mean_squared_error
[source][test]
root_mean_squared_error
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found forroot_mean_squared_error
:
pytest -sv tests/test_metrics.py::test_rmse
[source]
To run tests please refer to this guide.
Root mean squared error between pred
and targ
.
fbeta
[source][test]
fbeta
(y_pred
:Tensor
,y_true
:Tensor
,thresh
:float
=0.2
,beta
:float
=2
,eps
:float
=1e-09
,sigmoid
:bool
=True
) →Rank0Tensor
Tests found forfbeta
:
pytest -sv tests/test_metrics.py::test_fbeta
[source]
To run tests please refer to this guide.
Computes the f_beta between preds
and targets
beta
determines the value of the fbeta applied, eps
is there for numeric stability. If sigmoid=True
, a sigmoid is applied to the predictions before comparing them to thresh
then to the targets. See the F1 score wikipedia page for details on the fbeta score.
{F_beta} = (1+beta^2)frac{precision cdot recall}{(beta^2 cdot precision) + recall}
preds = tensor([0.6, 0.8, 0.2, 0.4, 0.9]).view(1, 5) # TP =2, FP = 1, FN = 1
ys = tensor([1, 0, 0, 1, 1]).view(1, 5)
fbeta(preds, ys, thresh=0.5, sigmoid=False)
tensor(0.6667)
Note: This function is intended for one-hot-encoded targets (often in a multiclassification problem).
explained_variance
[source][test]
explained_variance
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found forexplained_variance
:
pytest -sv tests/test_metrics.py::test_explained_variance
[source]
To run tests please refer to this guide.
Explained variance between pred
and targ
.
Explained Variance = 1 - frac{Var( targ - pred )}{Var( targ )}
preds = tensor([0.10, .20, .30, .40, .50])
ys = tensor([0.12, .17, .25, .44, .56]) # predictions are close to the truth
explained_variance(preds, ys)
tensor(0.9374)
r2_score
[source][test]
r2_score
(pred
:Tensor
,targ
:Tensor
) →Rank0Tensor
Tests found forr2_score
:
pytest -sv tests/test_metrics.py::test_r2_score
[source]
To run tests please refer to this guide.
R2 score (coefficient of determination) between pred
and targ
.
{R^2} = 1 - frac{sum( targ - pred )^2}{sum( targ - overline{targ})^2}
where $overline{targ}$ is the mean of the targ tensor.
r2_score(preds, ys)
tensor(0.9351)
The following metrics are classes, don’t forget to instantiate them when you pass them to a Learner
.
class
RMSE
[source][test]
RMSE
() ::RegMetrics
No tests found forRMSE
. To contribute a test please refer to this guide and this discussion.
Computes the root mean squared error.
class
ExpRMSPE
[source][test]
ExpRMSPE
() ::RegMetrics
No tests found forExpRMSPE
. To contribute a test please refer to this guide and this discussion.
Computes the exponential of the root mean square error.
class
Precision
[source][test]
Precision
(average
:Optional
[str
]='binary'
,pos_label
:int
=1
,eps
:float
=1e-09
) ::CMScores
No tests found forPrecision
. To contribute a test please refer to this guide and this discussion.
Computes the Precision.
class
Recall
[source][test]
Recall
(average
:Optional
[str
]='binary'
,pos_label
:int
=1
,eps
:float
=1e-09
) ::CMScores
No tests found forRecall
. To contribute a test please refer to this guide and this discussion.
Computes the Recall.
class
FBeta
[source][test]
FBeta
(average
:Optional
[str
]='binary'
,pos_label
:int
=1
,eps
:float
=1e-09
,beta
:float
=2
) ::CMScores
No tests found forFBeta
. To contribute a test please refer to this guide and this discussion.
Computes the Fbeta
score.
class
R2Score
[source][test]
R2Score
() ::RegMetrics
No tests found forR2Score
. To contribute a test please refer to this guide and this discussion.
Computes the R2 score (coefficient of determination).
class
ExplainedVariance
[source][test]
ExplainedVariance
() ::RegMetrics
No tests found forExplainedVariance
. To contribute a test please refer to this guide and this discussion.
Computes the explained variance.
class
MatthewsCorreff
[source][test]
MatthewsCorreff
() ::ConfusionMatrix
No tests found forMatthewsCorreff
. To contribute a test please refer to this guide and this discussion.
Computes the Matthews correlation coefficient.
Ref.: https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/metrics/classification.py
class
KappaScore
[source][test]
KappaScore
(weights
:Optional
[str
]=None
) ::ConfusionMatrix
No tests found forKappaScore
. To contribute a test please refer to this guide and this discussion.
Computes the rate of agreement (Cohens Kappa).
Ref.: https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/metrics/classification.py
KappaScore
supports linear and quadratic weights on the off-diagonal cells in the ConfusionMatrix
, in addition to the default unweighted calculation treating all misclassifications as equally weighted. Leaving KappaScore
‘s weights
attribute as None
returns the unweighted Kappa score. Updating weights
to “linear” means off-diagonal ConfusionMatrix elements are weighted in linear proportion to their distance from the diagonal; “quadratic” means weights are squared proportional to their distance from the diagonal. Specify linear or quadratic weights, if using, by first creating an instance of the metric and then updating the weights
attribute, similar to as follows:
kappa = KappaScore()
kappa.weights = "quadratic"
learn = cnn_learner(data, model, metrics=[error_rate, kappa])
class
ConfusionMatrix
[source][test]
ConfusionMatrix
() ::Callback
No tests found forConfusionMatrix
. To contribute a test please refer to this guide and this discussion.
Computes the confusion matrix.
class
MultiLabelFbeta
[source][test]
MultiLabelFbeta
(beta
=2
,eps
=1e-15
,thresh
=0.3
,sigmoid
=True
,average
='micro'
) ::Callback
No tests found forMultiLabelFbeta
. To contribute a test please refer to this guide and this discussion.
Computes the fbeta score for multilabel classification
MultiLabelFbeta
implements mutlilabel classification fbeta score similar to scikit-learn’s as a LearnerCallback
. Average options: [“micro”, “macro”, “weighted”, “none”]. Intended to use with one-hot encoded targets with 1s and 0s.
show_doc(auc_roc_score, title_level=3)
auc_roc_score
[source][test]
auc_roc_score
(input
:Tensor
,targ
:Tensor
) No tests found forauc_roc_score
. To contribute a test please refer to this guide and this discussion.
Computes the area under the receiver operator characteristic (ROC) curve using the trapezoid method. Restricted binary classification tasks.
auc_roc_score
computes the AUC score for the ROC curve similarly to scikit-learn using the trapezoid method, effectively summarizing the curve information in a single number. See Wikipedia’s page for more information on this.
jekyll_note("Instead of passing this method to the learner's metrics directly, make use of the AUROC() class.")
Note: Instead of passing this method to the learner’s metrics directly, make use of the AUROC() class.
show_doc(roc_curve, title_level=3)
roc_curve
[source][test]
roc_curve
(input
:Tensor
,targ
:Tensor
) No tests found forroc_curve
. To contribute a test please refer to this guide and this discussion.
Computes the receiver operator characteristic (ROC) curve by determining the true positive ratio (TPR) and false positive ratio (FPR) for various classification thresholds. Restricted binary classification tasks.
roc_curve
generates the ROC curve similarly to scikit-learn. See Wikipedia’s page for more information on the ROC curve.
jekyll_note("Instead of passing this method to the learner's metrics directly, make use of the AUROC() class.")
Note: Instead of passing this method to the learner’s metrics directly, make use of the AUROC() class.
show_doc(AUROC, title_level=3)
class
AUROC
[source][test]
AUROC
() ::Callback
No tests found forAUROC
. To contribute a test please refer to this guide and this discussion.
Computes the area under the curve (AUC) score based on the receiver operator characteristic (ROC) curve. Restricted to binary classification tasks.
AUROC
creates a Callback
for computing the AUC score for the ROC curve with auc_roc_score
at the end of each epoch, given that averaging over batches is incorrect in case of the AUROC. See Wikipedia’s page for more information on the AUROC.
Creating your own metric
Creating a new metric can be as simple as creating a new function. If your metric is an average over the total number of elements in your dataset, just write the function that will compute it on a batch (taking pred
and targ
as arguments). It will then be automatically averaged over the batches (taking their different sizes into account).
Sometimes metrics aren’t simple averages however. If we take the example of precision for instance, we have to divide the number of true positives by the number of predictions we made for that class. This isn’t an average over the number of elements we have in the dataset, we only consider those where we made a positive prediction for a specific thing. Computing the precision for each batch, then averaging them will yield to a result that may be close to the real value, but won’t be it exactly (and it really depends on how you deal with special case of 0 positive predictions).
This why in fastai, every metric is implemented as a callback. If you pass a regular function, the library transforms it to a proper callback called AverageCallback
. The callback metrics are only called during the validation phase, and only for the following events:
on_epoch_begin
(for initialization)on_batch_begin
(if we need to have a look at the input/target and maybe modify them)on_batch_end
(to analyze the last results and update our computation)on_epoch_end
(to wrap up the final result that should be added tolast_metrics
)
As an example, the following code is the exact implementation of the AverageMetric
callback that transforms a function like accuracy
into a metric callback.
class AverageMetric(Callback):
"Wrap a `func` in a callback for metrics computation."
def __init__(self, func):
# If it's a partial, use func.func
name = getattr(func,'func',func).__name__
self.func, self.name = func, name
def on_epoch_begin(self, **kwargs):
"Set the inner value to 0."
self.val, self.count = 0.,0
def on_batch_end(self, last_output, last_target, **kwargs):
"Update metric computation with `last_output` and `last_target`."
if not is_listy(last_target): last_target=[last_target]
self.count += last_target[0].size(0)
val = self.func(last_output, *last_target)
self.val += last_target[0].size(0) * val.detach().cpu()
def on_epoch_end(self, last_metrics, **kwargs):
"Set the final result in `last_metrics`."
return add_metrics(last_metrics, self.val/self.count)
Here add_metrics
is a convenience function that will return the proper dictionary for us:
{'last_metrics': last_metrics + [self.val/self.count]}
And here is another example that properly computes the precision for a given class.
class Precision(Callback):
def on_epoch_begin(self, **kwargs):
self.correct, self.total = 0, 0
def on_batch_end(self, last_output, last_target, **kwargs):
preds = last_output.argmax(1)
self.correct += ((preds==0) * (last_target==0)).float().sum()
self.total += (preds==0).float().sum()
def on_epoch_end(self, last_metrics, **kwargs):
return add_metrics(last_metrics, self.correct/self.total)
The following custom callback class example measures peak RAM usage during each epoch:
import tracemalloc
class TraceMallocMetric(Callback):
def __init__(self):
super().__init__()
self.name = "peak RAM"
def on_epoch_begin(self, **kwargs):
tracemalloc.start()
def on_epoch_end(self, last_metrics, **kwargs):
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return add_metrics(last_metrics, torch.tensor(peak))
To deploy it, you need to pass an instance of this custom metric in the metrics
argument:
learn = cnn_learner(data, model, metrics=[accuracy, TraceMallocMetric()])
learn.fit_one_cycle(3, max_lr=1e-2)
And then the output changes to:
Total time: 00:54
epoch train_loss valid_loss accuracy peak RAM
1 0.333352 0.084342 0.973800 2395541.000000
2 0.096196 0.038386 0.988300 2342145.000000
3 0.048722 0.029234 0.990200 2342680.000000
As mentioner earlier, using the metrics
argument with a custom metrics class is limited in the number of phases of the callback system it can access, it can only return one numerical value and as you can see its output is hardcoded to have 6 points of precision in the output, even if the number is an int.
To overcome these limitations callback classes should be used instead.
For example, the following class:
- uses phases not available for the metric classes
- it reports 3 columns, instead of just one
- its column report ints, instead of floats
import tracemalloc
class TraceMallocMultiColMetric(LearnerCallback):
_order=-20 # Needs to run before the recorder
def __init__(self, learn):
super().__init__(learn)
self.train_max = 0
def on_train_begin(self, **kwargs):
self.learn.recorder.add_metric_names(['used', 'max_used', 'peak'])
def on_batch_end(self, train, **kwargs):
# track max memory usage during the train phase
if train:
current, peak = tracemalloc.get_traced_memory()
self.train_max = max(self.train_max, current)
def on_epoch_begin(self, **kwargs):
tracemalloc.start()
def on_epoch_end(self, last_metrics, **kwargs):
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return add_metrics(last_metrics, [current, self.train_max, peak])
Note, that it subclasses LearnerCallback
and not Callback
, since the former provides extra features not available in the latter.
Also _order=-20
is crucial - without it the custom columns will not be added - it tells the callback system to run this callback before the recorder system.
To deploy it, you need to pass the name of the class (not an instance!) of the class in the callback_fns
argument. This is because the learn
object doesn’t exist yet, and it’s required to instantiate TraceMallocMultiColMetric
. The system will do it for us automatically as soon as the learn object has been created.
learn = cnn_learner(data, model, metrics=[accuracy], callback_fns=TraceMallocMultiColMetric)
learn.fit_one_cycle(3, max_lr=1e-2)
And then the output changes to:
Total time: 00:53
epoch train_loss valid_loss accuracy used max_used peak
1 0.321233 0.068252 0.978600 156504 2408404 2419891
2 0.093551 0.032776 0.988500 79343 2408404 2348085
3 0.047178 0.025307 0.992100 79568 2408404 2342754
Another way to do the same is by using learn.callbacks.append
, and this time we need to instantiate TraceMallocMultiColMetric
with learn
object which we now have, as it is called after the latter was created:
learn = cnn_learner(data, model, metrics=[accuracy])
learn.callbacks.append(TraceMallocMultiColMetric(learn))
learn.fit_one_cycle(3, max_lr=1e-2)
Configuring the custom metrics in the learn
object sets them to run in all future fit
-family calls. However, if you’d like to configure it for just one call, you can configure it directly inside fit
or fit_one_cycle
:
learn = cnn_learner(data, model, metrics=[accuracy])
learn.fit_one_cycle(3, max_lr=1e-2, callbacks=TraceMallocMultiColMetric(learn))
And to stress the differences:
- the
callback_fns
argument expects a classname or a list of those - the
callbacks
argument expects an instance of a class or a list of those learn.callbacks.append
expects a single instance of a class
For more examples, look inside fastai codebase and its test suite, search for classes that subclass either Callback
, LearnerCallback
and subclasses of those two.
Finally, while the above examples all add to the metrics, it’s not a requirement. A callback can do anything it wants and it is not required to add its outcomes to the metrics printout.
©2021 fast.ai. All rights reserved.
Site last generated: Jan 5, 2021