Skip to content

Memory Sharing Multiprocessing Tool for Parallelizing Function Apply on Large Python Objects (i.e., np.array / pd.DataFrame)

License

Notifications You must be signed in to change notification settings

jeffrey82221/mem_shared_multiprocess

Repository files navigation

Intro

Memory Sharing Multiprocessing Tool for Parallelizing Function Apply on Large Python Objects (i.e., np.array / pd.DataFrame)

TODO:

  • Allow sharing of numpy array between processes forked by os.fork
  • Allow sharing of numpy array between processes created by billiard
  • Allow sharing of Pandas DataFrame between processes created by billiard
    • SharedDataFrame class implemented (pd_share.py)
    • Use SharedDataFrame with billiard
  • Explain the code of this repo in README. (installation / run)
  • Build multi-thread version parallelism code using rust language
  • Shared Memory for AutoML (parallel computation)
    • Make sure sharedmem work with sklearn models (in grid_search)
    • Make sure sharedmem work with all models (sklearn + xgboost + lightgbm) (see ml_models.py)
    • Make sure SharedDataFrame work with xgboost!
    • Build my own version of parallel grid search!
  • Try experimenting with Ray tune
    • Run tutorial
    • Study the usage
    • Fix compatibility

About

Memory Sharing Multiprocessing Tool for Parallelizing Function Apply on Large Python Objects (i.e., np.array / pd.DataFrame)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages