Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Dcorr/Energy/MMD/Hsic into scipy #98

Open
sampan501 opened this issue Jul 31, 2020 · 10 comments
Open

Merge Dcorr/Energy/MMD/Hsic into scipy #98

sampan501 opened this issue Jul 31, 2020 · 10 comments
Assignees
Labels
help wanted Extra attention is needed ndd Issues for NeuroData Design

Comments

@sampan501
Copy link
Member

No description provided.

@sampan501 sampan501 added the ndd Issues for NeuroData Design label Jul 31, 2020
@jovo jovo changed the title Merge Hsic into scipy Merge Dcorr/Energy/MMD/Hsic into scipy Aug 4, 2020
@bstraus1
Copy link

Can I be assigned to this issue?

@sampan501 sampan501 added help wanted Extra attention is needed and removed ndd Issues for NeuroData Design labels Mar 23, 2021
@bstraus1
Copy link

bstraus1 commented May 4, 2021

I've added the distance_correlation function (and its helpers) to my fork of SciPy in the ben-master branch.

The distance_correlation function is modeled off of the multiscale_graphcorr function which is already merged into SciPy and the helpers are from the hyppo repo (primarily the dcorr function).

I've also added a simple test (more to see that the code runs and returns an expected result rather than for accuracy). The function passes the test. This link goes directly to the distance_correlation function which relies on multiple other functions in the _stats.pyx file.

See attached screenshot of the test passed. Tests are in the same branch here. @sampan501

Screen Shot 2021-05-04 at 5 33 52 PM

Suggested future steps include:

  • Add more tests to assess the accuracy of the distance_correlation function to ensure that it is the same as Sambit's implementation.
  • Add a doc_string to the distance_correlation function. Currently the docstring matches the multiscale_graphcorr docstring since that is what the distance_correlation function was modeled off of. The docstring should be changed to refer specifically to distance_correlation.
  • Merge these changes into SciPy

@sampan501
Copy link
Member Author

sampan501 commented May 6, 2021

None of the functions that you added have been Cythonized. All the functions added use def which is calling pure Python

@bstraus1
Copy link

bstraus1 commented May 6, 2021

Then I don’t really understand why cythonizing is necessary if pure python works. @sampan501

@sampan501
Copy link
Member Author

Then I don’t really understand why cythonizing is necessary if pure python works.

Pure python is slow. With large datasets, it will long time to run.

@bstraus1
Copy link

bstraus1 commented May 6, 2021

Oh I see. I wasn't aware of that being an essential part for any reason beyond making the code run-able in SciPy.

@bstraus1
Copy link

bstraus1 commented May 12, 2021

Updates:

  1. _dcorr has been cythonized
  2. _center_distance_matrix has been changed to work for biased and unbiased data and will now work for mgc (multi-scale graph correlation) and dcorr (distance correlation).
  3. Bug Fix: The distance_correlationfunction now uses _dcorr in the pvalue computation function _perm_test to compute the test statistic rather than _mgc_stat
  4. Documentation: Added more comments throughout the distance_correlation function call tree

@sampan501

@bstraus1
Copy link

Documentation for the distance_correlation function still needs to be changed (it is currently just copied from multiscale_graphcorr.

@sampan501 sampan501 added the ndd Issues for NeuroData Design label Aug 25, 2021
@kfangster
Copy link

Can I be assigned this issue?

@sampan501
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed ndd Issues for NeuroData Design
Projects
None yet
Development

No branches or pull requests

3 participants