You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I will use ESS as an example, since it is a pretty slow estimator. Since it is a LocalEstimator, when fit is called kNNs are first computed (if not already provided)
The computations are "embarrassingly parallel". I have locally implemented parallelization and seen at least a 6x speedup in ESS computation on datasets of size 5,000 - 50,000.
I can open a PR with these changes as an example if you are willing to add joblib as a dependency.
The text was updated successfully, but these errors were encountered:
Makes sense and sounds like an awesome speedup! ESS is currently one of the slowest estimators but the same strategy might work for others. Please feel free to open a PR, joblib is already a dependency since it is used by sklearn. We can probably replace multiprocessing with joblib everywhere.
I will use ESS as an example, since it is a pretty slow estimator. Since it is a
LocalEstimator
, whenfit
is called kNNs are first computed (if not already provided)scikit-dimension/skdim/_commonfuncs.py
Line 419 in b9e8845
and this in turn calls to the sklearn library which properly parallelizes the computation based on the parameter
n_jobs
that we can provide.Second, a call to
self.fit
in theESS
class performs a simple, single-threadedfor
loop over the datapoints.scikit-dimension/skdim/id/_ESS.py
Lines 75 to 78 in b9e8845
The computations are "embarrassingly parallel". I have locally implemented parallelization and seen at least a 6x speedup in ESS computation on datasets of size 5,000 - 50,000.
I can open a PR with these changes as an example if you are willing to add
joblib
as a dependency.The text was updated successfully, but these errors were encountered: