Adding the MultiLabelConfusionMatrix for computing multi label confusion matrices. #1613

touqir14 · 2021-02-04T22:51:38Z

Fixes 1609

Description:
This is an in-progress PR that adds functionalities for computing confusion matrices for multi-label multi-class classification problems. I will add tests and doc strings once I get an initial feedback to accomodate any necessary changes (if any). The test_confusion_matrix.py file tests the ConfusionMatrix class quite thoroughly. I was wondering how thorough the tests for MultiLabelConfusionMatrix needs to be.

The current version only works with prediction and ground truth tensors of shape [batch_size, num_classes]. The tensor values need to be binary. The example below illustrates its use.

import torch
from ignite.metrics import MultiLabelConfusionMatrix

'''
Both inputs below must be a tensor of shape N x K, where N is the number of samples 
and K is the number of classes. The values of the tensors must be binary where a 
1 in the i'th row and j'th column marks the j'th label for the i'th sample.
'''

predicted = torch.Tensor([[1, 1, 0], [0, 1, 0], [1, 0, 1], [1, 1, 1]])
ground_truth = torch.Tensor([[0, 1, 0], [0, 0, 1], [1, 0, 1], [1, 0, 1]])
mtr_ignite = MultiLabelConfusionMatrix(3, normalized=False)
mtr_ignite.update([predicted, ground_truth])
conf_matrix = mtr_ignite.compute()


'''
conf_matrix[i, 0, 0] corresponds to count/rate of true negatives of class i,
conf_matrix[i, 0, 1] corresponds to count/rate of false positives of class i,
conf_matrix[i, 1, 0] corresponds to count/rate of false negatives of class i,
conf_matrix[i, 1, 1] corresponds to count/rate of true positives of class i.

With normalization : meter.MultiLabelConfusionMeter(k=3, normalized=True), for all i:
conf_matrix[i, 0, 0] + conf_matrix[i, 0, 1] + conf_matrix[i, 1, 0] + conf_matrix[i, 1, 1] = 1
'''

ccing @vfdev-5

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

vfdev-5 · 2021-02-05T12:03:14Z

@touqir14 thanks a lot for a quick and nice PR! Overall code looks good, later I can try to check the implementation vs tfa. Maybe, we could add a comment with credits to the original implementation in the code if it is just ported tfa implementation. I also wonder for the tests if we could compare the results with something like sklearn ?

touqir14 · 2021-02-05T12:16:47Z

This was not ported from tfa but since you mentioned it, a simple transpose operation will ensure that its output matches that of tfa's implementation. I think it is reasonable to test against scikit learn's implementation.

touqir14 · 2021-02-05T12:20:34Z

Btw, should we aim to keep the output consistent with tfa or just keep this as it is, @vfdev-5 ?

vfdev-5 · 2021-02-05T12:23:26Z

Let's keep it consistent to sklearn and mention about the difference vs tfa and what to do to make it consistent with tfa...

touqir14 · 2021-02-05T14:26:14Z

Added some tests, let me know what you think.

vfdev-5 · 2021-02-05T14:44:37Z

tests/ignite/metrics/test_confusion_matrix.py

+ conf_mtrx = mtr.compute()
+ correct_conf_mtrx = torch.tensor([[[1, 1], [0, 2]], [[1, 2], [0, 1]], [[1, 0], [1, 2]]])
+
+ TestCase.assertTrue(


in general we use pytest and do simply assert something, err_message, so let's do it here in the same way.

About the tests, I'm thinking if it won't be interesting to put them into a separate file : test_multilabel_confusion_matrix.py. Maybe, same for the code of MultiLabelConfusionMatrix ?

In the tests, you can simply generate inputs, compute the result and compare it vs sklearn result as it is done for ConfusionMatrix.
Do you think that it would make sense to ensure that MultiLabelConfusionMatrix works and then test it on the input of shape like (batch_size, classes, dim1, ...) for predictions/targets ?

Another point to address a bit later is to ensure that it works with DDP. In this case, we have to replicate what is done here :

ignite/tests/ignite/metrics/test_confusion_matrix.py

Lines 783 to 790 in f1cc9fb

@pytest.mark.distributed

@pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")

@pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU")

def test_distrib_gpu(local_rank, distributed_context_single_node_nccl):

device = torch.device(f"cuda:{local_rank}")

_test_distrib_multiclass_images(device)

_test_distrib_accumulator_device(device)

Regarding the input format of (batch_size, classes, dim1, ...) , is there any special use case? The tfa implementation and sklearn's function take 2d matrices.

We can imagine, for example, image segmentation with overlapping classes...

I see. So we can think of this as each class having num_batches x dim_1 x ... x dim_n configurations in the general case where in the 2d case, each class had num_batches number of configurations to compare between the predictions and the ground truth.

Yes, it can be seen like that. In ConfusionMatrix implementation we however do in some sense this reshape internally and can accept inputs like (B, C, H, W).

So in the test cases we can simply compare the output of ignite's implementation with sklearn's implementation having feeding the latter with predictions and ground truth arrays after being reshaped as [num_batches x dim_1 x ... x dim_n , num_classes]

Yes, exactly ! Transposed and reshaped

…and extended the tests

touqir14 · 2021-02-12T17:20:07Z

@vfdev-5 , I have been a little busy, so this commit got a bit delayed. Let me know what you think.
As of now, the MultiLabelConfusionMeter class and the tests are in separate files. I have also added the DDP tests.

vfdev-5 · 2021-02-12T17:28:10Z

@touqir14 thanks for the updates and no worries about the delay ! I'll take a look later the code and comment out.

vfdev-5

Thanks for the update @touqir14 !
I haven't yet seen the tests in details. There are some comments to address for the PR and seems like that code-formatting is not passing neither...

ignite/metrics/multilabel_confusion_matrix.py

tests/ignite/metrics/test_confusion_matrix.py

tests/ignite/metrics/test_multilabel_confusion_matrix.py

touqir14 · 2021-02-18T14:57:38Z

@vfdev-5 , feel free to review the tests. I can push updates addressing your current suggestions and any future suggestions together.

vfdev-5 · 2021-02-19T00:46:07Z

@vfdev-5 , feel free to review the tests. I can push updates addressing your current suggestions and any future suggestions together.

@touqir14 let me do that in the coming days. Thanks for pinging

vfdev-5

@touqir14 thanks again for the PR !
I have few comments on the implementation and tests. Let me know what do you think.

ignite/metrics/confusion_matrix.py

ignite/metrics/multilabel_confusion_matrix.py

vfdev-5 · 2021-02-19T21:28:34Z

ignite/metrics/multilabel_confusion_matrix.py

+ if (
+ not isinstance(output, Sequence)
+ or len(output) < 2
+ or not isinstance(output[0], torch.Tensor)
+ or not isinstance(output[1], torch.Tensor)
+ ):
+ raise ValueError(
+ (r"Argument must consist of a Python Sequence of two tensors such that the first is the predicted"
+ r" tensor and the second is the ground-truth tensor")
+ )


This is a correct check for output but we do not perform such test nowhere in other metrics as it is sort of documented convention. Maybe, we can remove that.

Or maybe, we could include this check in other metrics as we see fit? Either one is fine by me.

Let's remove it here and maybe we could add this check in Metric class in another PR

vfdev-5 · 2021-02-19T21:34:26Z

ignite/metrics/multilabel_confusion_matrix.py

+ if y.dtype not in valid_types:
+ raise ValueError(f"y must be of any type: {valid_types}")
+
+ if y_pred.numel() != ((y_pred == 0).sum() + (y_pred == 1).sum()).item():


Previously, we were checking for binary input as torch.equal(x, x**2). I tried to compare times of these two implementations:

import torch y_pred = torch.randint(0, 2, size=(32, 10)) %%timeit y_pred.numel() == ((y_pred == 0).sum() + (y_pred == 1).sum()).item() > 50.3 µs ± 96.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) %%timeit torch.equal(y_pred, y_pred ** 2) > 9.74 µs ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Probably, we can keep torch.equal(y_pred, y_pred ** 2).

Thats a clever optimization! I will replace that.

vfdev-5 · 2021-02-19T21:34:33Z

ignite/metrics/multilabel_confusion_matrix.py

+ if y_pred.numel() != ((y_pred == 0).sum() + (y_pred == 1).sum()).item():
+ raise ValueError("y_pred must be a binary tensor")
+
+ if y.numel() != ((y == 0).sum() + (y == 1).sum()).item():


vfdev-5 · 2021-02-19T22:16:49Z

tests/ignite/metrics/test_multilabel_confusion_matrix.py

+ ignite_CM = mlcm.compute().numpy()
+ assert np.all(sklearn_CM.astype(np.int64) == ignite_CM.astype(np.int64))
+
+ return


Suggested change

return

vfdev-5 · 2021-02-19T22:17:55Z

tests/ignite/metrics/test_multilabel_confusion_matrix.py

+ num_classes = 3
+ cm = MultiLabelConfusionMatrix(num_classes=num_classes, device=metric_device)
+
+ y_true, y_pred = get_y_true_y_pred()


We have to generate data a bit differently depending on distributed rank.

Do you have any suggestions?

Oh, I see, our tests for confusion matrix have the same remark. I think we should rework them. Let me propose a better tests for our confusion matrix which could be adapted for this PR as well. We can inspire from here :

ignite/tests/ignite/metrics/test_recall.py

Lines 784 to 785 in e17acc7

y_true = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)

y_preds = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)

tests/ignite/metrics/test_multilabel_confusion_matrix.py

touqir14 · 2021-02-20T20:21:34Z

There seems to be a test failure in the test_distrib_cpu function. I am not sure what is causing that. In the non distributed setting the tests ran fine.

ignite/metrics/multilabel_confusion_matrix.py

vfdev-5 · 2021-02-20T20:44:15Z

tests/ignite/metrics/test_multilabel_confusion_matrix.py

+
+def test_simple_ND_input():
+
+ num_iters = 100


Let's change it to 5

Suggested change

num_iters = 100

num_iters = 5

vfdev-5 · 2021-02-20T20:44:54Z

tests/ignite/metrics/test_multilabel_confusion_matrix.py

+
+def test_simple_batched():
+
+ num_iters = 100


Same here

Suggested change

num_iters = 100

num_iters = 5

vfdev-5 · 2021-02-20T20:49:20Z

ignite/metrics/multilabel_confusion_matrix.py

+ - The classes present in M are indexed as 0, ..., num_classes-1 as can be inferred from above.
+
+ Args:
+ num_classes (int): Number of classes, should be > 1. See notes for more details.


Suggested change

num_classes (int): Number of classes, should be > 1. See notes for more details.

num_classes (int): Number of classes, should be > 1.

vfdev-5 · 2021-02-20T20:57:54Z

tests/ignite/metrics/test_multilabel_confusion_matrix.py

+ num_classes = 3
+ cm = MultiLabelConfusionMatrix(num_classes=num_classes, device=metric_device)
+
+ y_true, y_pred = get_y_true_y_pred()


Oh, I see, our tests for confusion matrix have the same remark. I think we should rework them. Let me propose a better tests for our confusion matrix which could be adapted for this PR as well. We can inspire from here :

ignite/tests/ignite/metrics/test_recall.py

Lines 784 to 785 in e17acc7

y_true = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)

y_preds = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)

vfdev-5 · 2021-02-20T21:11:29Z

@touqir14 let's do like that for this PR: let's comment out distributed tests and fix all remaining nits asked below. And we can merge it. After that in a follow-up PR we can rework both distrib tests for CM and MLCM. What do you think ?

touqir14 · 2021-02-20T21:13:49Z

That sounds good. I will push a commit next addressing the remaining issues.

ignite/metrics/multilabel_confusion_matrix.py

…ng tests

touqir14 · 2021-02-20T21:28:47Z

Just pushed a commit. I think all the issues have been addressed. Let me know if I missed something

vfdev-5

Looks good @touqir14 ! Thanks !
Just fix code formatting issue and it is good to go

vfdev-5 · 2021-02-20T21:40:36Z

Another point from the checklist is not yet done :

Documentation is updated (if required)

We have to add an entry here as well : https://github.com/pytorch/ignite/blob/master/docs/source/metrics.rst

Could you please also merge current master into your branch as "This branch is out-of-date with the base branch". Thanks

touqir14 · 2021-02-20T21:43:51Z

For the docs part here : https://github.com/pytorch/ignite/blob/master/docs/source/metrics.rst , I should just add .. autoclass:: MultiLabelConfusionMatrix in the complete list of metrics right?

vfdev-5 · 2021-02-20T21:45:24Z

Yes, just .. autoclass:: MultiLabelConfusionMatrix in alphabetical order and should be good.

You can see docs preview here as well : https://deploy-preview-1613--pytorch-ignite-preview.netlify.app/

ignite/metrics/multilabel_confusion_matrix.py

touqir14 · 2021-02-20T22:08:16Z

Reformatted the docstring, let me know how it looks.

vfdev-5 · 2021-02-20T22:09:56Z

Reformatted the docstring, let me know how it looks.

Looks better :) https://deploy-preview-1613--pytorch-ignite-preview.netlify.app/metrics.html#ignite.metrics.MultiLabelConfusionMatrix

Description of M[i, 0, 0] values is inlined but it can go...

touqir14 · 2021-02-20T22:14:29Z

Its better to put the M[] parts in a bulleted format I think. Will adding an extra "-" before each line put them in their own bulleted list? Like:

- The confusion matrix 'M' is of dimension (num_classes, 2, 2).
      - M[i, 0, 0] corresponds to count/rate of true negatives of class i,
      - M[i, 0, 1] corresponds to count/rate of false positives of class i,
      - M[i, 1, 0] corresponds to count/rate of false negatives of class i,
      - M[i, 1, 1] corresponds to count/rate of true positives of class i.

vfdev-5 · 2021-02-20T22:21:10Z

Yes, I agree. I'm not sure if rst/sphinx would accept it simply like that, anyway It's better to try that locally :)
Here is info on how to generate docs locally: https://github.com/pytorch/ignite/blob/master/CONTRIBUTING.md#local-documentation-building-and-deploying

touqir14 · 2021-02-20T23:18:50Z

Looks good to me.

vfdev-5 · 2021-02-20T23:22:24Z

@touqir14 thanks again for the PR! If you are interested in finalizing the things with distributed we can discuss about that in the issue : #1657

touqir14 · 2021-02-20T23:27:05Z

Thanks @vfdev-5 for your valuable feedbacks! I will soon have a look at #1657

touqir14 added 2 commits February 5, 2021 04:34

Added MultiLabelConfusionMatrix class

377c7df

ran code formatters

e623279

Added tests for MultiLabelConfusionMatrix class

f1cc9fb

vfdev-5 reviewed Feb 5, 2021

View reviewed changes

touqir14 added 2 commits February 12, 2021 22:58

Added support for N-dimensional tensors in MultiLabelConfusionMatrix …

25f016e

…and extended the tests

Merge remote-tracking branch 'upstream/master' into Multi-label-CM

0ae46a3

vfdev-5 reviewed Feb 15, 2021

View reviewed changes

vfdev-5 reviewed Feb 19, 2021

View reviewed changes

touqir14 added 2 commits February 21, 2021 01:49

optimized and modified some of the functions

f5587a8

reformatted confusion_matrix.py

c746305

vfdev-5 reviewed Feb 20, 2021

View reviewed changes

ignite/metrics/multilabel_confusion_matrix.py Show resolved Hide resolved

Added tests for normalization and commented out the distributed setti…

4391ff0

…ng tests

vfdev-5 approved these changes Feb 20, 2021

View reviewed changes

formatted the code

bf6eef2

touqir14 added 2 commits February 21, 2021 03:47

Merge remote-tracking branch 'upstream/master' into Multi-label-CM

4e2f2e6

Added MultiLabelConfusionMatrix entry in the metrics.rst

c62275c

vfdev-5 reviewed Feb 20, 2021

View reviewed changes

ignite/metrics/multilabel_confusion_matrix.py Outdated Show resolved Hide resolved

reformatted the docstring

ff62054

updated the docstring

644c5d8

vfdev-5 merged commit abe1ddd into pytorch:master Feb 20, 2021

touqir14 mentioned this pull request Feb 20, 2021

Implemented MultiLabelConfusionMeter class pytorch/tnt#137

Closed

	@pytest.mark.distributed
	@pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
	@pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU")
	def test_distrib_gpu(local_rank, distributed_context_single_node_nccl):

	device = torch.device(f"cuda:{local_rank}")
	_test_distrib_multiclass_images(device)
	_test_distrib_accumulator_device(device)

	y_true = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)
	y_preds = torch.randint(0, 2, size=(offset * idist.get_world_size(), n_classes, 6, 8)).to(device)

	num_classes (int): Number of classes, should be > 1. See notes for more details.
	num_classes (int): Number of classes, should be > 1.

Adding the MultiLabelConfusionMatrix for computing multi label confusion matrices. #1613

Adding the MultiLabelConfusionMatrix for computing multi label confusion matrices. #1613

Conversation

touqir14 commented Feb 4, 2021 • edited Loading

vfdev-5 commented Feb 5, 2021

touqir14 commented Feb 5, 2021

touqir14 commented Feb 5, 2021

vfdev-5 commented Feb 5, 2021

touqir14 commented Feb 5, 2021

vfdev-5 Feb 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfdev-5 Feb 5, 2021 • edited Loading

Choose a reason for hiding this comment

touqir14 commented Feb 12, 2021 • edited Loading

vfdev-5 commented Feb 12, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

touqir14 commented Feb 18, 2021

vfdev-5 commented Feb 19, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

touqir14 commented Feb 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfdev-5 commented Feb 20, 2021

touqir14 commented Feb 20, 2021

touqir14 commented Feb 20, 2021

vfdev-5 left a comment • edited Loading

Choose a reason for hiding this comment

vfdev-5 commented Feb 20, 2021

touqir14 commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021 • edited Loading

touqir14 commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021

touqir14 commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021

touqir14 commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021 • edited Loading

touqir14 commented Feb 20, 2021

touqir14 commented Feb 4, 2021 •

edited

Loading

vfdev-5 Feb 5, 2021 •

edited

Loading

vfdev-5 Feb 5, 2021 •

edited

Loading

touqir14 commented Feb 12, 2021 •

edited

Loading

vfdev-5 left a comment •

edited

Loading

vfdev-5 commented Feb 20, 2021 •

edited

Loading

vfdev-5 commented Feb 20, 2021 •

edited

Loading