This package provides highly-efficient pairwise metrics for PyTorch.
- v0.1.1: Added SNR distance (
torchpairwise.snr_distances
) presented in https://arxiv.org/abs/1904.02616.
torchpairwise
is a collection of general purpose pairwise metric functions that behave similar to
torch.cdist
(which only implements scipy.spatial.distance
and sklearn.metrics.pairwise
.
For task-specific metrics (e.g. for evaluation of classification, regression, clustering, ...), you should be in the
wrong place, please head to the TorchMetrics repo.
Written in torch
's C++ API, the main differences are that our metrics:
- are all (except some boolean distances) differentiable with backward formulas manually derived, implemented,
and verified with
torch.autograd.gradcheck
. - are batched and can exploit GPU parallelization.
- can be integrated seamlessly within PyTorch-based projects, all functions are
torch.jit.script
-able.
These metrics are usually used to compute kernel for machine learning algorithms.
torchpairwise ops |
Equivalences in other libraries | Differentiable |
---|---|---|
linear_kernel |
sklearn.metrics.pairwise.linear_kernel |
✔️ |
polynomial_kernel |
sklearn.metrics.pairwise.polynomial_kernel |
✔️ |
sigmoid_kernel |
sklearn.metrics.pairwise.sigmoid_kernel |
✔️ |
rbf_kernel |
sklearn.metrics.pairwise.rbf_kernel |
✔️ |
laplacian_kernel |
sklearn.metrics.pairwise.laplacian_kernel |
✔️ |
cosine_similarity |
sklearn.metrics.pairwise.cosine_similarity |
✔️ |
additive_chi2_kernel |
sklearn.metrics.pairwise.additive_chi2_kernel |
✔️ |
chi2_kernel |
sklearn.metrics.pairwise.chi2_kernel |
✔️ |
Furthermore, we provide a convenient wrapper function analoguous to torch.cdist
excepts that it takes a string
metric: str = "minkowski"
indicating the desired metric to be used as the third argument,
and extra metric-specific arguments are passed as keywords.
import torch, torchpairwise
# directed_hausdorff_distances is a pairwise 2d metric
x1 = torch.rand(10, 6, 3)
x2 = torch.rand(8, 5, 3)
generator = torch.Generator().manual_seed(1)
output = torchpairwise.cdist(x1, x2,
metric="directed_hausdorff",
shuffle=True, # kwargs exclusive to directed_hausdorff
generator=generator)
Note that pairwise metrics on the second table are currently not allowed keys for cdist
because they are not dist.
We have a similar plan for pdist
(which is equivalent to calling cdist(x1, x1)
but avoid storing duplicated
positions).
However, that requires a total overhaul of existing C++/Cuda kernels and won't be available soon.
- Add more metrics (contact me or create a feature request issue).
- Add memory-efficient
argkmin
for retrieving pairwise neighbors' distances and indices without storing the whole pairwise distance matrix. - Add an equivalence of
torch.pdist
withmetric: str = "minkowski"
argument. - (Unlikely) Support sparse layouts.
torch>=2.1.0
(torch>=1.9.0
if compiled from source)
To install prebuilt wheels from torchpairwise, simply run:
pip install torchpairwise
Note that the Linux and Windows wheels in PyPI are compiled with torch==2.1.0
and Cuda 12.1.
We only do a non-strict version checking and a warning will be raised if torch
's and torchpairwise
's
Cuda versions do not match.
Make sure your machine has a C++17 and a Cuda compiler installed, then clone the repo and run:
pip install .
The basic usecase is very straight-forward if you are familiar with
sklearn.metrics.pairwise
and scipy.spatial.distance
:
scikit-learn / SciPy | TorchPairwise |
---|---|
import numpy as np
import sklearn.metrics.pairwise as sklearn_pairwise
x1 = np.random.rand(10, 5)
x2 = np.random.rand(12, 5)
output = sklearn_pairwise.cosine_similarity(x1, x2)
print(output) |
import torch
import torchpairwise
x1 = torch.rand(10, 5, device='cuda')
x2 = torch.rand(12, 5, device='cuda')
output = torchpairwise.cosine_similarity(x1, x2)
print(output) |
import numpy as np
import scipy.spatial.distance as distance
x1 = np.random.binomial(
1, p=0.6, size=(10, 5)).astype(np.bool_)
x2 = np.random.binomial(
1, p=0.7, size=(12, 5)).astype(np.bool_)
output = distance.cdist(x1, x2, metric='jaccard')
print(output) |
import torch
import torchpairwise
x1 = torch.bernoulli(
torch.full((10, 5), fill_value=0.6, device='cuda')).to(torch.bool)
x2 = torch.bernoulli(
torch.full((12, 5), fill_value=0.7, device='cuda')).to(torch.bool)
output = torchpairwise.jaccard_distances(x1, x2)
print(output) |
Please check the tests folder where we will add more examples.
The code is released under the MIT license. See LICENSE.txt
for details.
Footnotes
-
These metrics are not pairwise but a pairwise form can be computed by calling
scipy.spatial.distance.cdist(x1, x2, metric="[metric_name_or_callable]")
. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 -
These are boolean distances.
hamming_distances
can be applied for floating point inputs but involves comparison. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10