Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reporting for dist matrix #94

Closed
bdpedigo opened this issue Jun 29, 2020 · 0 comments · Fixed by #152
Closed

Error reporting for dist matrix #94

bdpedigo opened this issue Jun 29, 2020 · 0 comments · Fixed by #152
Labels
documentation Improvements or additions to documentation

Comments

@bdpedigo
Copy link
Contributor

Error reporting for distance matrix input is unclear.

There are 2 problems here: 1) is that sklearn pairwise_distances returns something where the diagonals are not exactly 0 when you pass in a pandas dataframe of floats, which is obviously not your problem. 2) is that the error message says dimension mismatch when it shouldn't (should report trace is not 0 in one of the distance matrices).

However, wondering if it also makes sense to change these to be soft checks (close to 0 as opposed to exactly 0). I don't have a strong feeling about that either way.

Reproducing code example:

import numpy as np
import pandas as pd
from sklearn.metrics import pairwise_distances

X = np.random.uniform(size=(100, 2))
Y = np.random.normal(size=(100, 2))
X = pd.DataFrame(X)
X_dist = pairwise_distances(X)
Y_dist = pairwise_distances(Y)
print(np.diag(X_dist).max())
MGC(None).test(X_dist, Y_dist)

Error message


2.1073424255447017e-08
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/mnt/c/Users/t-bpedig/code/graph-embedding-methods/GraphEmbeddingMethods/sandbox/notebooks/2020-06-29-v2-maggot-hemisphere-single-embed.py in 
      163 Y_dist = pairwise_distances(Y)
     164 print(np.diag(X_dist).max())
---> 165 MGC(None).test(X_dist, Y_dist)

~/miniconda3/envs/embed/lib/python3.7/site-packages/hyppo/independence/mgc.py in test(self, x, y, reps, workers)
    217 
    218         if self.is_distance:
--> 219             check_xy_distmat(x, y)
    220 
    221         # using our joblib implementation instead of multiprocessing backend in

~/miniconda3/envs/embed/lib/python3.7/site-packages/hyppo/_utils.py in check_xy_distmat(x, y)
     80     if nx != px or ny != py or np.trace(x) != 0 or np.trace(y) != 0:
     81         raise ValueError(
---> 82             "Shape mismatch, x and y must be distance matrices "
     83             "have shape [n, n] and [n, n]."
     84         )

ValueError: Shape mismatch, x and y must be distance matrices have shape [n, n] and [n, n].

Version information

  • OS: Ubuntu 20.04
  • Python Version 3.7.3
  • Package Version 0.1.2
@sampan501 sampan501 added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants