Inconsistent results when using multiprocessing in fit_pw method #15

ben300694 · 2023-05-16T12:31:56Z

Dear authors,

thank you for your work on this project and for making the code available to the community!
I have encountered an issue with the fit_pw method in the lPCA class when using multiprocessing. The issue is that the output dimensions are not the same when using different numbers of jobs (n_jobs), while they should be consistent. Below is a code sample to reproduce the problem:

import numpy as np
import skdim

# set the random seed
np.random.seed(42)

lpca = skdim.id.lPCA(ver="FO", alphaFO=0.05, verbose=True)

# create a random dataset
X_np = np.random.randn(100, 10)

print("X_np.shape: ", X_np.shape)
print(X_np)

lpca.fit_pw(X=X_np, n_neighbors=20, n_jobs=1, smooth=False)
lpca_dimensions_pw_n_jobs_1 = lpca.dimension_pw_
print(f"lpca.dimension_pw_: {lpca.dimension_pw_}")

lpca.fit_pw(X=X_np, n_neighbors=20, n_jobs=2, smooth=False)
lpca_dimensions_pw_n_jobs_2 = lpca.dimension_pw_
print(f"lpca.dimension_pw_: {lpca.dimension_pw_}")

# Check that the results are the same
assert np.allclose(lpca_dimensions_pw_n_jobs_1, lpca_dimensions_pw_n_jobs_2)

I have identified that the problem is related to the management of class instances and their state when using multiprocessing. In the current implementation, the worker processes do not share memory with the main process, so modifications to the instances within the worker processes are not reflected in the main process. The dimension is being calculated and stored in the _dimension attribute of the instances within the worker processes, but this information is not being propagated back to the main process.

A possible solution is to change the fit_pw function to return the computed dimension values instead of storing them in the _dimension attribute of the instances. Then, use the apply_async function to asynchronously apply the fit function to each data point and collect the results in the main process. Here's an example of a modified fit_pw function that addresses this issue:

def fit_pw(self, X, precomputed_knn=None, smooth=False, n_neighbors=100, n_jobs=1):
    # ...
    if n_jobs > 1:
        with mp.Pool(n_jobs) as pool:
            # Asynchronously apply the `fit` function to each data point and collect the results
            results = [pool.apply_async(self.fit, (X[i, :],)) for i in knnidx]
            # Retrieve the computed dimensions
            self.dimension_pw_ = np.array([r.get().dimension_ for r in results])
    # ...

With this modification, the computed dimensions are correctly returned and stored in the main process, and the dimensions are consistent when running the code with different numbers of jobs.
Would it be possible to consider this change or a similar approach to address the issue with multiprocessing in the fit_pw method? This problem likely also affects the other multiprocessing dimension estimates, and not just the lPCA class.

The text was updated successfully, but these errors were encountered:

j-bac · 2023-05-16T20:39:01Z

thanks a lot for spotting this bug and for the detailed solution ! I implemented your fix in a new version '0.3.3'

j-bac closed this as completed May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results when using multiprocessing in fit_pw method #15

Inconsistent results when using multiprocessing in fit_pw method #15

ben300694 commented May 16, 2023

j-bac commented May 16, 2023 •

edited

Loading

Inconsistent results when using multiprocessing in fit_pw method #15

Inconsistent results when using multiprocessing in fit_pw method #15

Comments

ben300694 commented May 16, 2023

j-bac commented May 16, 2023 • edited Loading

j-bac commented May 16, 2023 •

edited

Loading