Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what metrics can be used in keras #2607

Closed
lqj1990 opened this issue May 4, 2016 · 30 comments
Closed

what metrics can be used in keras #2607

lqj1990 opened this issue May 4, 2016 · 30 comments

Comments

@lqj1990
Copy link

lqj1990 commented May 4, 2016

Most examples are use metrics=['accuracy'], but accuracy is not always suitable for every task.

  1. So are there any metrics such as precision, recall and so on?
  2. If there are, what should I write in metrics list in order to use them?
  3. If I just have one output, can I use multiple metrics to evaluate it from different aspect?
@RaffEdwardBAH
Copy link

One of the doc pages says the accuracy is the only thing implemented right now. There really should be a tab for metrics that says that and can be expanded later.

@sloth2012
Copy link

I found funcitons name which like 'mae' or 'mean_absolute_error' in keras.metrics can be used in metrics, just like the parameter loss. It seems like the metrics is just used for logging, not joined in the training work.
By the way, the document really need to point that what the metrics support.

@eli7
Copy link

eli7 commented Nov 15, 2016

Precision, Recall and F1-score were added by someone:

https://github.com/fchollet/keras/blob/master/keras/metrics.py

Example usage:

model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['binary_accuracy', 'fmeasure', 'precision', 'recall'])

@gregmcinnes
Copy link

After updating I still get this error:
Exception: Invalid metric: precision

@eli7
Copy link

eli7 commented Nov 16, 2016

Hey Greg,

As of now, the latest Keras package doesn't contain this yet.

You can download the metrics code from GitHub, then copy it over your current one:

wget https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py
sudo cp metrics.py /usr/local/lib/python2.7/dist-packages/keras/

@gregmcinnes
Copy link

Thanks! That worked great

@kevingo
Copy link

kevingo commented Dec 18, 2016

I think the document is already updated? https://keras.io/metrics/

@wqp89324
Copy link

What is the difference between loss (objectives) and metrics?

@jhli973
Copy link

jhli973 commented Mar 26, 2017

@wqp89324
A metric is a function that is used to judge the performance of your model. A metric function is similar to an loss function, except that the results from evaluating a metric are not used when training the model. You can find from this url: https://keras.io/metrics/

@neverfox
Copy link

neverfox commented Apr 9, 2017

@wqp89324 Another way to put it, expanding on @jhli973's answer, is that the evaluation metric is what you as the researcher will use to judge the model's performance (on training, test, and/or evaluation data); it's the bottom line number that you would publish. The loss function is what the network will use to try to improve itself, hopefully in a way that leads to improved evaluation for the researcher's sake. For example, in a binary classification problem, the network might train using a binary crossentropy loss function with gradient descent, whereas the modeler's goal is to design a network to improve binary category accuracy on hold-out data.

@brannondorsey
Copy link

It looks like many of the helpful metrics that used to be supported have been removed with Keras 2.0. I'm working on a classification problem where f-score would be much more valuable to me than accuracy. Is there a way that I can use that as a metric, or am I encouraged to use metrics.categorical_accuracy instead? If so, why? And how does that differ from metrics.sparse_categorical_accuracy. Cheers!

@dattanchu
Copy link

I resolved my problem by getting the old code from https://github.com/fchollet/keras/blob/53e541f7bf55de036f4f5641bd2947b96dd8c4c3/keras/metrics.py

Maybe someone would put together a keras-contrib package.

@apacha
Copy link

apacha commented May 18, 2017

I agree with @brannondorsey. According to @fchollet, he explained in #5794 that it was intentionally removed in version 2.0 because it performs only approximation by batchwise evaluation. Unfortunately, there seems to be no evidence (#6002 #5705), that someone is working on a global measurement.

Probably the best thing to do currently is to store the predictions and then use Scikit for calculating global measurements. For me the following worked out quite well on a classification task:

  1. Predict classes
test_generator = ImageDataGenerator()
test_data_generator = test_generator.flow_from_directory(
    "test_directory",
    batch_size=32,
    shuffle=False)
test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)

predictions = model.predict_generator(test_data_generator, steps=test_steps_per_epoch)
# Get most likely class
predicted_classes = numpy.argmax(predictions, axis=1) 
  1. Get ground-truth classes and class-labels
true_classes = test_data_generator.classes
class_labels = list(test_data_generator.class_indices.keys())    
  1. Use scikit-learn to get statistics
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
print(report)    

@sxs4337
Copy link

sxs4337 commented Jun 14, 2017

@apacha
Thanks for the detailed explanation. This is very helpful. I have a follow up question.

While using "predict_generator", How to ensure that the prediction is done on all test samples once.

For example-
predictions = model.predict_generator(
test_generator,
steps=int(test_generator.samples/float(batch_size)), # all samples once
verbose = 1,
workers = 2,
max_q_size=10,
pickle_safe=True
)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes

So the dimensions of predicted_classes and true_classes is different since total samples is not divisible by batch size.

The size of my test_set is not consistent, so the no. of steps in predict_generator would change each time depending upon the batch size. I am using flow_from_directory and cannot use predict_on_batch since my data is organized in a directory structure.

One solution is running with batch size of 1, but makes it very slow.

I hope my question is clear. Thanks in advance.

@apacha
Copy link

apacha commented Jun 21, 2017

@sxs4337 I am happy to tell you, that you don't have to worry about that, when using the ImageDataGenerator, as it automatically takes care of the last batch, if your samples are not divisible by the batch size. For example, if you have 10 samples and a minibatch-size of 4, test_generator will create batches of the following size: 4, 4, 2. Consecutive next()-calls will repeat the sequence from the beginning.

By using test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size) you automatically will get 3 batches for the example from above, which will result in a total of 10 predictions.

@sxs4337
Copy link

sxs4337 commented Jun 21, 2017

@apacha
Thank you for the reply. I did try that and it seems to miss out on the last few test samples. I may be missing something very obvious here.

I have 505 test samples and tried running with a batch size of 4.

Below is my code snippet-

test_datagen = ImageDataGenerator(preprocessing_function=vgg_preprocess)
test_generator = test_datagen.flow_from_directory(
    'dataset_toy/test_toy',
    target_size=(img_rows, img_cols),
    batch_size = 4,
    shuffle=False,
    class_mode='categorical')
predictions = model.predict_generator(
    test_generator,
    steps = np.math.ceil(test_generator.samples / test_generator.batch_size),
    verbose = 1,
    workers = 2,
    max_q_size=10,
    pickle_safe=True
    )
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
accuracy = metrics.accuracy_score(true_classes, predicted_classes)

Here is the error-

Found 505 images belonging to 10 classes.
124/126 [============================>.] - ETA: 0sTraceback (most recent call last):
File "keras_finetune_vgg16_landmarks10k.py", line 201, in
(report, accuracy) = test_mode(model_path)
File "keras_finetune_vgg16_landmarks10k.py", line 177, in test_mode
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1384, in classification_report
sample_weight=sample_weight)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 956, in precision_recall_fscore_support
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 72, in _check_targets
check_consistent_length(y_true, y_pred)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [504 505]

So the prediction has 504 values where as ground truth is 505 values.

Thanks again and I appreciate the help.

@apacha
Copy link

apacha commented Jun 21, 2017

Maybe this is a bug, when having more than one worker? Try it with workers=1 to see if the problem still remains. You can also check the len(predicted_classes) or run test_generator.next() a couple of times, to see what is reports.

If all that fails, I'm afraid that I can't help you. If you think this is a Keras bug, create a issue with detailed steps to reproduce the issue.

@sxs4337
Copy link

sxs4337 commented Jun 21, 2017

@apacha
It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.

Found 505 images belonging to 10 classes.
126/126 [==============================] - 34s

/home/shagan/maya/landmark/keras_finetune_vgg16_landmarks10k.py(170)test_mode()
-> predicted_classes = np.argmax(predictions, axis=1)
(Pdb) predictions.shape
(504, 10)
(Pdb) test_generator.classes.shape
(505,)
(Pdb)

BTW, my keras version is 2.0.5
Thanks.

@apacha
Copy link

apacha commented Jun 21, 2017 via email

@sxs4337
Copy link

sxs4337 commented Jun 21, 2017

Yes. That was the issue. Thanks a lot!

@NourozR
Copy link

NourozR commented Aug 18, 2017

what are the available "metrics" if I'm doing time series prediction(regression) in keras?

@brannondorsey
Copy link

@NourozR am I correct in assuming that you are using a mean squared error loss function? If so popular metrics include mean absolute error (mae) and accuracy (acc). From the metrics documentation page:

model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=['mae', 'acc'])

@stale
Copy link

stale bot commented Nov 16, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Nov 16, 2017
@damienrj
Copy link

Why are mean absolute error (mae) and accuracy (acc) not listed in the available metrics section. Are there any other hidden metrics?

@stale stale bot removed the stale label Nov 20, 2017
@mimoralea
Copy link

mimoralea commented Dec 8, 2017

@damienrj, nothing is hidden is you look at the code: https://github.com/fchollet/keras/blob/master/keras/metrics.py

If you look deep enough, you'll see that many loss functions are added as metrics. Look then at the loss page: https://keras.io/losses/

@mushahrukhkhan
Copy link

mushahrukhkhan commented Apr 19, 2018

Is there is anyway to calculate precission@k and recall@k using the above-mentioned code ??? @mimoralea

@ZER-0-NE
Copy link

@apacha How can I extend your code to work for multiclass classification?
My result for the predictions i get are all 1 but I need a list like [0,0,0,1,0,0,0,0,0,0,0,0] since i have 12 classes. How can I get that?

@apacha
Copy link

apacha commented Jun 20, 2018

As far as I know, scikit's classification_report does support multiclass cases, but I am not sure if we are talking about the same thing. What exactly do you mean by multiclass classification: One object potentially belonging to multiple classes? Or just having 12 different classes in total? Maybe you need some one-hot encoding for the ground truth before computing the metrics. Apart from that, I'm afraid I can't help you unless you give more details, but I don't think this is the right place to answer such questions. Preferably, you should ask such questions on Stackoverflow.

@tobigithub
Copy link

@NourozR
Keras metrics for regression are: r_square (R^2), mean absolute error (MAE), mean_squared_error (MSE) and root mean squared error (RMSE). See here: #7947

@dynamicwebpaige
Copy link

Closing, as the metrics docs have been updated on both keras.io and tensorflow.org. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests