what metrics can be used in keras #2607

lqj1990 · 2016-05-04T13:37:21Z

Most examples are use metrics=['accuracy'], but accuracy is not always suitable for every task.

So are there any metrics such as precision, recall and so on?
If there are, what should I write in metrics list in order to use them?
If I just have one output, can I use multiple metrics to evaluate it from different aspect?

RaffEdwardBAH · 2016-05-04T14:20:13Z

One of the doc pages says the accuracy is the only thing implemented right now. There really should be a tab for metrics that says that and can be expanded later.

sloth2012 · 2016-08-25T09:49:34Z

I found funcitons name which like 'mae' or 'mean_absolute_error' in keras.metrics can be used in metrics, just like the parameter loss. It seems like the metrics is just used for logging, not joined in the training work.
By the way, the document really need to point that what the metrics support.

eli7 · 2016-11-15T06:42:28Z

Precision, Recall and F1-score were added by someone:

https://github.com/fchollet/keras/blob/master/keras/metrics.py

Example usage:

model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['binary_accuracy', 'fmeasure', 'precision', 'recall'])

gregmcinnes · 2016-11-16T22:17:17Z

After updating I still get this error:
Exception: Invalid metric: precision

eli7 · 2016-11-16T22:19:02Z

Hey Greg,

As of now, the latest Keras package doesn't contain this yet.

You can download the metrics code from GitHub, then copy it over your current one:

wget https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py
sudo cp metrics.py /usr/local/lib/python2.7/dist-packages/keras/

gregmcinnes · 2016-11-16T22:27:46Z

Thanks! That worked great

kevingo · 2016-12-18T05:39:17Z

I think the document is already updated? https://keras.io/metrics/

wqp89324 · 2017-02-28T23:09:08Z

What is the difference between loss (objectives) and metrics?

jhli973 · 2017-03-26T15:49:42Z

@wqp89324
A metric is a function that is used to judge the performance of your model. A metric function is similar to an loss function, except that the results from evaluating a metric are not used when training the model. You can find from this url: https://keras.io/metrics/

neverfox · 2017-04-09T17:55:34Z

@wqp89324 Another way to put it, expanding on @jhli973's answer, is that the evaluation metric is what you as the researcher will use to judge the model's performance (on training, test, and/or evaluation data); it's the bottom line number that you would publish. The loss function is what the network will use to try to improve itself, hopefully in a way that leads to improved evaluation for the researcher's sake. For example, in a binary classification problem, the network might train using a binary crossentropy loss function with gradient descent, whereas the modeler's goal is to design a network to improve binary category accuracy on hold-out data.

brannondorsey · 2017-04-17T00:24:28Z

It looks like many of the helpful metrics that used to be supported have been removed with Keras 2.0. I'm working on a classification problem where f-score would be much more valuable to me than accuracy. Is there a way that I can use that as a metric, or am I encouraged to use metrics.categorical_accuracy instead? If so, why? And how does that differ from metrics.sparse_categorical_accuracy. Cheers!

dattanchu · 2017-05-05T09:07:30Z

I resolved my problem by getting the old code from https://github.com/fchollet/keras/blob/53e541f7bf55de036f4f5641bd2947b96dd8c4c3/keras/metrics.py

Maybe someone would put together a keras-contrib package.

apacha · 2017-05-18T10:30:36Z

I agree with @brannondorsey. According to @fchollet, he explained in #5794 that it was intentionally removed in version 2.0 because it performs only approximation by batchwise evaluation. Unfortunately, there seems to be no evidence (#6002 #5705), that someone is working on a global measurement.

Probably the best thing to do currently is to store the predictions and then use Scikit for calculating global measurements. For me the following worked out quite well on a classification task:

Predict classes

test_generator = ImageDataGenerator()
test_data_generator = test_generator.flow_from_directory(
    "test_directory",
    batch_size=32,
    shuffle=False)
test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)

predictions = model.predict_generator(test_data_generator, steps=test_steps_per_epoch)
# Get most likely class
predicted_classes = numpy.argmax(predictions, axis=1)

Get ground-truth classes and class-labels

true_classes = test_data_generator.classes
class_labels = list(test_data_generator.class_indices.keys())

Use scikit-learn to get statistics

report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
print(report)

sxs4337 · 2017-06-14T18:42:38Z

@apacha
Thanks for the detailed explanation. This is very helpful. I have a follow up question.

While using "predict_generator", How to ensure that the prediction is done on all test samples once.

For example-
predictions = model.predict_generator(
test_generator,
steps=int(test_generator.samples/float(batch_size)), # all samples once
verbose = 1,
workers = 2,
max_q_size=10,
pickle_safe=True
)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes

So the dimensions of predicted_classes and true_classes is different since total samples is not divisible by batch size.

The size of my test_set is not consistent, so the no. of steps in predict_generator would change each time depending upon the batch size. I am using flow_from_directory and cannot use predict_on_batch since my data is organized in a directory structure.

One solution is running with batch size of 1, but makes it very slow.

I hope my question is clear. Thanks in advance.

apacha · 2017-06-21T09:02:22Z

@sxs4337 I am happy to tell you, that you don't have to worry about that, when using the ImageDataGenerator, as it automatically takes care of the last batch, if your samples are not divisible by the batch size. For example, if you have 10 samples and a minibatch-size of 4, test_generator will create batches of the following size: 4, 4, 2. Consecutive next()-calls will repeat the sequence from the beginning.

By using test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size) you automatically will get 3 batches for the example from above, which will result in a total of 10 predictions.

sxs4337 · 2017-06-21T13:12:04Z

@apacha
Thank you for the reply. I did try that and it seems to miss out on the last few test samples. I may be missing something very obvious here.

I have 505 test samples and tried running with a batch size of 4.

Below is my code snippet-

test_datagen = ImageDataGenerator(preprocessing_function=vgg_preprocess)
test_generator = test_datagen.flow_from_directory(
    'dataset_toy/test_toy',
    target_size=(img_rows, img_cols),
    batch_size = 4,
    shuffle=False,
    class_mode='categorical')
predictions = model.predict_generator(
    test_generator,
    steps = np.math.ceil(test_generator.samples / test_generator.batch_size),
    verbose = 1,
    workers = 2,
    max_q_size=10,
    pickle_safe=True
    )
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
accuracy = metrics.accuracy_score(true_classes, predicted_classes)

Here is the error-

Found 505 images belonging to 10 classes.
124/126 [============================>.] - ETA: 0sTraceback (most recent call last):
File "keras_finetune_vgg16_landmarks10k.py", line 201, in
(report, accuracy) = test_mode(model_path)
File "keras_finetune_vgg16_landmarks10k.py", line 177, in test_mode
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1384, in classification_report
sample_weight=sample_weight)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 956, in precision_recall_fscore_support
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 72, in _check_targets
check_consistent_length(y_true, y_pred)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [504 505]

So the prediction has 504 values where as ground truth is 505 values.

Thanks again and I appreciate the help.

apacha · 2017-06-21T13:21:47Z

Maybe this is a bug, when having more than one worker? Try it with workers=1 to see if the problem still remains. You can also check the len(predicted_classes) or run test_generator.next() a couple of times, to see what is reports.

If all that fails, I'm afraid that I can't help you. If you think this is a Keras bug, create a issue with detailed steps to reproduce the issue.

sxs4337 · 2017-06-21T16:18:56Z

@apacha
It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.

Found 505 images belonging to 10 classes.
126/126 [==============================] - 34s

/home/shagan/maya/landmark/keras_finetune_vgg16_landmarks10k.py(170)test_mode()
-> predicted_classes = np.argmax(predictions, axis=1)
(Pdb) predictions.shape
(504, 10)
(Pdb) test_generator.classes.shape
(505,)
(Pdb)

BTW, my keras version is 2.0.5
Thanks.

apacha · 2017-06-21T17:16:16Z

Well. Looks obvious to me now. See the number of steps? 126! 126x4=504. For some reason the calculation of the number of steps seems to have an issue. It should be 127, not 126

…

On 21 Jun 2017 6:19 pm, "sxs4337" ***@***.***> wrote: @apacha <https://github.com/apacha> It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4. Found 505 images belonging to 10 classes. 126/126 [==============================] - 34s /home/shagan/maya/landmark/keras_finetune_vgg16_ landmarks10k.py(170)test_mode() -> predicted_classes = np.argmax(predictions, axis=1) (Pdb) predictions.shape (504, 10) (Pdb) test_generator.classes.shape (505,) (Pdb) BTW, my keras version is 2.0.5 Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2607 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAkSQQ3LUJ00Q2hruyxiXOIc3LkR2gQPks5sGUKVgaJpZM4IXJpS> .

sxs4337 · 2017-06-21T17:27:41Z

Yes. That was the issue. Thanks a lot!

NourozR · 2017-08-18T19:26:05Z

what are the available "metrics" if I'm doing time series prediction(regression) in keras?

brannondorsey · 2017-08-18T21:22:19Z

@NourozR am I correct in assuming that you are using a mean squared error loss function? If so popular metrics include mean absolute error (mae) and accuracy (acc). From the metrics documentation page:

model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=['mae', 'acc'])

stale · 2017-11-16T21:58:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

damienrj · 2017-11-20T19:47:26Z

Why are mean absolute error (mae) and accuracy (acc) not listed in the available metrics section. Are there any other hidden metrics?

mimoralea · 2017-12-08T15:49:56Z

@damienrj, nothing is hidden is you look at the code: https://github.com/fchollet/keras/blob/master/keras/metrics.py

If you look deep enough, you'll see that many loss functions are added as metrics. Look then at the loss page: https://keras.io/losses/

mushahrukhkhan · 2018-04-19T13:19:05Z

Is there is anyway to calculate precission@k and recall@k using the above-mentioned code ??? @mimoralea

ZER-0-NE · 2018-06-17T08:35:39Z

@apacha How can I extend your code to work for multiclass classification?
My result for the predictions i get are all 1 but I need a list like [0,0,0,1,0,0,0,0,0,0,0,0] since i have 12 classes. How can I get that?

apacha · 2018-06-20T20:17:12Z

As far as I know, scikit's classification_report does support multiclass cases, but I am not sure if we are talking about the same thing. What exactly do you mean by multiclass classification: One object potentially belonging to multiple classes? Or just having 12 different classes in total? Maybe you need some one-hot encoding for the ground truth before computing the metrics. Apart from that, I'm afraid I can't help you unless you give more details, but I don't think this is the right place to answer such questions. Preferably, you should ask such questions on Stackoverflow.

tobigithub · 2018-08-17T07:08:50Z

@NourozR
Keras metrics for regression are: r_square (R^2), mean absolute error (MAE), mean_squared_error (MSE) and root mean squared error (RMSE). See here: #7947

dynamicwebpaige · 2020-05-17T07:16:40Z

Closing, as the metrics docs have been updated on both keras.io and tensorflow.org. 🙂

sschnug mentioned this issue May 7, 2016

Keras Multi-Class Accuracy/Loss during testing #2644

Closed

3 tasks

ramananbalakrishnan mentioned this issue Oct 11, 2016

Add documentation about metrics functions #4024

Merged

stale bot added the stale label Nov 16, 2017

stale bot removed the stale label Nov 20, 2017

dynamicwebpaige closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what metrics can be used in keras #2607

what metrics can be used in keras #2607

lqj1990 commented May 4, 2016

RaffEdwardBAH commented May 4, 2016

sloth2012 commented Aug 25, 2016

eli7 commented Nov 15, 2016 •

edited

Loading

gregmcinnes commented Nov 16, 2016

eli7 commented Nov 16, 2016

gregmcinnes commented Nov 16, 2016

kevingo commented Dec 18, 2016

wqp89324 commented Feb 28, 2017

jhli973 commented Mar 26, 2017

neverfox commented Apr 9, 2017 •

edited

Loading

brannondorsey commented Apr 17, 2017

dattanchu commented May 5, 2017

apacha commented May 18, 2017 •

edited

Loading

sxs4337 commented Jun 14, 2017

apacha commented Jun 21, 2017

sxs4337 commented Jun 21, 2017

apacha commented Jun 21, 2017

sxs4337 commented Jun 21, 2017

apacha commented Jun 21, 2017 via email

sxs4337 commented Jun 21, 2017

NourozR commented Aug 18, 2017

brannondorsey commented Aug 18, 2017

stale bot commented Nov 16, 2017

damienrj commented Nov 20, 2017

mimoralea commented Dec 8, 2017 •

edited

Loading

mushahrukhkhan commented Apr 19, 2018 •

edited

Loading

ZER-0-NE commented Jun 17, 2018

apacha commented Jun 20, 2018

tobigithub commented Aug 17, 2018

dynamicwebpaige commented May 17, 2020

what metrics can be used in keras #2607

what metrics can be used in keras #2607

Comments

lqj1990 commented May 4, 2016

RaffEdwardBAH commented May 4, 2016

sloth2012 commented Aug 25, 2016

eli7 commented Nov 15, 2016 • edited Loading

gregmcinnes commented Nov 16, 2016

eli7 commented Nov 16, 2016

gregmcinnes commented Nov 16, 2016

kevingo commented Dec 18, 2016

wqp89324 commented Feb 28, 2017

jhli973 commented Mar 26, 2017

neverfox commented Apr 9, 2017 • edited Loading

brannondorsey commented Apr 17, 2017

dattanchu commented May 5, 2017

apacha commented May 18, 2017 • edited Loading

sxs4337 commented Jun 14, 2017

apacha commented Jun 21, 2017

sxs4337 commented Jun 21, 2017

apacha commented Jun 21, 2017

sxs4337 commented Jun 21, 2017

apacha commented Jun 21, 2017 via email

sxs4337 commented Jun 21, 2017

NourozR commented Aug 18, 2017

brannondorsey commented Aug 18, 2017

stale bot commented Nov 16, 2017

damienrj commented Nov 20, 2017

mimoralea commented Dec 8, 2017 • edited Loading

mushahrukhkhan commented Apr 19, 2018 • edited Loading

ZER-0-NE commented Jun 17, 2018

apacha commented Jun 20, 2018

tobigithub commented Aug 17, 2018

dynamicwebpaige commented May 17, 2020

eli7 commented Nov 15, 2016 •

edited

Loading

neverfox commented Apr 9, 2017 •

edited

Loading

apacha commented May 18, 2017 •

edited

Loading

mimoralea commented Dec 8, 2017 •

edited

Loading

mushahrukhkhan commented Apr 19, 2018 •

edited

Loading