Tags: modin-project/modin
Tags
Modin 0.32.0 This release introduces support for Polars API, a new query compiler for small data, more functions that can use dynamic partitioning, as well as several bug fixes. Key Features and Updates Since 0.31.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix type hint (#7343) * FIX-#7113: Fix docstring overrides for subclasses. (#7354) * FIX-#7134: Use a separate docstring class for BasePandasDataset. (#7353) * FIX-#7329: Do not sort columns on df.update (#7330) * FIX-#7351: Add ipython method calls to non-lookup list (#7352) * FIX-#7355: Cpu count would be set incorrectly on a cluster (#7356) * FIX-#7357: Fix `NoAttributeError` on `DataFrame.copy` (#7358) * FIX-#7371: Fix inserting datelike values into a DataFrame (#7372) * FIX-#7373: Try a previous version of `motoserver/moto` service, pin to 5.0.13 (#7374) * FIX-#7379: Fix __imul__ performing addition instead of multiplication (#7380) * FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388) * FIX-#7389: Fix uploading artifacts (#7390) * Refactor Codebase * REFACTOR-#0000: Update copyright date (#7333) * Documentation improvements * DOCS-#0000: Update RunLLM Ask AI widget script path (#7345) * DOCS-#7335: Fix borken links in Modin Usage Examples page (#7336) * DOCS-#7382: Add documentation on how to use Modin Native query compiler (#7386) * New Features * FEAT-#4605: Add native query compiler (#7259) * FEAT-#7308: Interoperability between query compilers (#7376) * FEAT-#7331: Initial Polars API (#7332) * FEAT-#7337: Using dynamic partitionning in `broadcast_apply` (#7338) * FEAT-#7340: Add more granular lazy flags to query compiler (#7348) * FEAT-#7368: Add a new environment variable for using dynamic partitioning (#7369) Contributors ------------ @MortalHappiness @Retribution98 @YarShev @ZhipengXue97 @anmyachev @arunjose696 @devin-petersohn @likawind @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.31.0 First release compatible with NumPy 2.0. Key Features and Updates Since 0.30.0 ------------------------------------- * Stability and Bugfixes * FIX-#7138: Stop reloading modules for custom docstrings. (#7307) * FIX-#7263: Empty docstrings should not be inherited (#7264) * FIX-#7272: Remove HDK engine (#7275) * FIX-#7277: Remove Cudf storage format as unmaintained (#7290) * FIX-#7278: Make sure `enable_logging` decorator preserve type hints (#7279) * FIX-#7292: Prepare Modin code to NumPy 2.0 (#7293) * FIX-#7295: Unpin numexpr to allow versions >= 2.8.4 to match pandas (#7296) * FIX-#7309: Update versioneer with `versioneer install --vendor` (#7311) * FIX-#7320: Bump the github-actions group with 3 updates (#7319) * FIX-#7321: Using 'C' engine instead of 'pyarrow' for getting metadata in 'read_csv' (#7322) * Performance enhancements * PERF-#7299: Avoid using `synchronize_labels` for `combine` function (#7300) * Refactor Codebase * REFACTOR-#7271: Remove `instance_type` attribute of axis partitions (#7268) * REFACTOR-#7273: Remove deprecated functions from utils.py, accessor.py and io.py (#7274) * REFACTOR-#7285: Remove deprecated configs (#7286) * REFACTOR-#7294: Reduce access of methods `_modin_frame` methods from `_query_compiler` (#7297) * REFACTOR-#7313: Add similar methods as in #7294 for operating on columns (#7314) * Update testing suite * TEST-#0000: Add a Dependabot config to auto-update GitHub action versions (#7318) * TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289) * Documentation improvements * DOCS-#0000: Adds RunLLM widget to docs (#7326) * DOCS-#7287: Update Modin on Dask documentation (#7288) * New Features * FEAT-#6574: UserWarning no longer displayed when Series/DataFrames are small (#7323) * FEAT-#7249: Add `reload_modin` feature (#7280) * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) * FEAT-#7283: Introduce MinRowPartitionSize and MinColumnPartitionSize (#7284) * FEAT-#7310: NumPy 2.0 support (#7312) Contributors ------------ @Jayson729 @Retribution98 @YarShev @anmyachev @arunjose696 @kurtmckee @sfc-gh-dpetersohn @vsreekanti
Modin 0.29.1 This release pins numpy<2. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.28.3 This release pins numpy<2. Key Features and Updates Since 0.28.2 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.27.1 This release pins numpy<2. Key Features and Updates Since 0.27.0 ------------------------------------- * Stability and Bugfixes * FIX-#6968: Align API with pandas (#6969) * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @dchigarev @sfc-gh-dpetersohn
Modin 0.30.0 This release introduces support for DataFrame API standard, a distributed implementation for right merge/join, more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions, improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix badge in README.md (#7213) * FIX-#0000: Make merge tests more stable by sorting results (#7266) * FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258) * FIX-#7093: Make sure 'idxmax' and 'idxmin' can work with string columns (#7193) * FIX-#7102: Remove `enable_api_only` mode in modin logging (#7194) * FIX-#7103: Move lower-level functionality logging to debug (#7184) * FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214) * FIX-#7185: Add extra check for some config classes (#7189) * FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209) * FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208) * FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220) * FIX-#7221: Don't use 'use_legacy_dataset=False' for 'ParquetDataset' (#7222) * FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225) * FIX-#7233: Display property name in default_to_pandas error messages (#7269) * FIX-#7234: Deprecate HDK engine (#7235) * FIX-#7238: Fix docstring inheritance for `cached_property` and use it (#7239) * FIX-#7240: Allow `doc_checker.py` works with `functools.cached_property` (#7241) * FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247) * FIX-#7248: Make sure '_validate_dtypes_sum_prod_mean' works correctly with datetime types (#7237) * FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251) * Performance enhancements * PERF-#7227: Call 'modin_frame.combine()' for merge and join only when necessary (#7228) * PERF-#7230: Don't preserve bad partition for 'merge' (#7229) * Refactor Codebase * REFACTOR-#7242: Add type hints for `modin/core/dataframe/algebra/` (#7243) * REFACTOR-#7260: Use `extract_dtype` internal function in more places (#7261) * Update testing suite * TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199) * TEST-#7191: Fix ASV after changing default branch (#7190) * Documentation improvements * DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198) * DOCS-#0000: Supplement Optmization Notes with a link to configs (#7197) * DOCS-#7217: Update docs as to when Modin operators work best (#7218) * DOCS-#7255: Update docs as to from_* functions (#7256) * New Features * FEAT-#5394: Reduce amount of remote calls for Map operator (#7136) * FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245) * FEAT-#6492: Add `from_map` feature to create dataframe (#7215) * FEAT-#6498: Make Fold operator more flexible (#7257) * FEAT-#6808: Implement '__arrow_array__' for Series (#7200) * FEAT-#6890: Modin implementation of DataFrame API standard (#7216) * FEAT-#7139: Use ray-core instead of ray-default (#6955) * FEAT-#7187: Change "master" branch to "main" (#7188) * FEAT-#7202: Use custom resources for Ray (#7205) * FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204) * FEAT-#7207: Add the ability to assing a df to a columns selection without d2p (#7210) * FEAT-#7252: Add type hints for `base.py` (#7253) * FEAT-#7254: Support right merge/join (#7226) Contributors ------------ @Retribution98 @YarShev @anmyachev @arunjose696 @noloerino @sfc-gh-jkew
Modin 0.29.0 This release introduces `modin.pandas.testing` and `modin.pandas.arrays` modules, faster implementation (range-partitioning) for `pivot_table`, `unique`, `drop_duplicates`, `nunique`, `df.resample` functions, new functions to interact with Dask: `to/from_dask` distributed implementation for `Series.case_when`, optimization for `astype` function with scalar dtype. Key Features and Updates Since 0.28.0 ------------------------------------- * Stability and Bugfixes * FIX-#6227: Make sure `Series.unique()` with pyarrow dtype returns `ArrowExtensionArray` (#7042) * FIX-#6793: Use 'pandas_dtype' instead of 'np.dtype' for some more places in Modin code (#6794) * FIX-#7039: Pass scalar dtype as is to astype query compiler (#7152) * FIX-#7051: Update exception message for 'astype' function (#7052) * FIX-#7054: Update exception message for `shift` function (#7055) * FIX-#7056: Update exception message for `iloc/loc` functions (#7057) * FIX-#7058: Update exception message for `insert` function (#7059) * FIX-#7060: Fix 'pivot' when index or columns are of Index type (#7061) * FIX-#7062: Update exception message for `aggregate` function (#7063) * FIX-#7072: Replace MaterializationHook with the materialized object on serialization. (#7075) * FIX-#7088: Make sure `rank` raises `No axis named None...` exception (#7089) * FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116) * FIX-#7135: Fix appending a new row (#7172) * FIX-#7153: Fix 'Series.corr' with 'method != pearson' (#7158) * FIX-#7157: Make sure `quantile` function works with `numeric_only=True` (#7160) * FIX-#7170: Don't use `MinPartitionSize` configuration variable in remote context (#7177) * Performance enhancements * PERF-#5296: Partition parquet file if it has too few row groups (#7016) * PERF-#7068: Provide shape_hint="column" for some more operations with Series (#7069) * PERF-#7123: Preserve shape_hint for dropna (#7124) * PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True (#7131) * PERF-#7132: Preserve partition lengths in apply_full_axis with keep_partitioning=False (#7133) * PERF-#7150: Reduce peak memory consumption (#7149) * Refactor Codebase * REFACTOR-#3257: Move logging and caching to the `gen_data` internal function (#7046) * REFACTOR-#7105: Deprecate 'cfg.RangePartitioningGroupby' (#7161) * REFACTOR-#7106: Rename from/to_ray_dataset to from/to_ray (#7107) * REFACTOR-#7109: Remove the outdated aws_example.yaml file. (#7110) * Update testing suite * TEST-#3622: Centralize tests in Modin (#7137) * TEST-#6016: Make sure `eval_general` doesn't expect exceptions by default (#6954) * TEST-#7064: Explicitly check for exceptions in `test_groupby.py` (#7065) * TEST-#7066: Explicitly check for exceptions in `test_io.py` (#7067) * TEST-#7073: Explicitly check for exceptions in `test_default.py` (#7074) * TEST-#7076: Explicitly check for exceptions in `test_map_metadata.py` (#7077) * TEST-#7082: Explicitly check for exceptions in 'test_series.py' (#7083) * TEST-#7084: Explicitly check for exceptions in 'test_indexing.py' (#7085) * TEST-#7086: Explicitly check for exceptions in `test_reduce.py` (#7087) * TEST-#7094: Rename 'raising_exceptions' argument of 'eval_general' testing function (#7095) * TEST-#7125: Explicitly install modin in ci tests (#7126) * TEST-#7165: Add codecov token to fix CI on master (#7175) * TEST-