Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-37668][PYTHON] 'Index' object has no attribute 'levels' in pys…
…park.pandas.frame.DataFrame.insert ### What changes were proposed in this pull request? This PR proposes to address the unexpected error in `pyspark.pandas.frame.DataFrame.insert`. Assigning tuple as column name is only supported for MultiIndex Columns for now in pandas API on Spark: ```python # MultiIndex column >>> psdf x y 0 1 1 2 2 3 >>> psdf[('a', 'b')] = [4, 5, 6] >>> psdf x a y b 0 1 4 1 2 5 2 3 6 # However, not supported for non-MultiIndex column >>> psdf A 0 1 1 2 2 3 >>> psdf[('a', 'b')] = [4, 5, 6] Traceback (most recent call last): ... KeyError: 'Key length (2) exceeds index depth (1)' ``` So, we should show proper error message rather than `AttributeError: 'Index' object has no attribute 'levels'` when users try to insert the tuple named column. **Before** ```python >>> psdf.insert(0, ("a", "b"), 10) Traceback (most recent call last): ... AttributeError: 'Index' object has no attribute 'levels' ``` **After** ```python >>> psdf.insert(0, ("a", "b"), 10) Traceback (most recent call last): ... NotImplementedError: Assigning column name as tuple is only supported for MultiIndex columns for now. ``` ### Why are the changes needed? Let users know proper usage. ### Does this PR introduce _any_ user-facing change? Yes, the exception message is changed as described in the **After**. ### How was this patch tested? Unittests. Closes apache#34957 from itholic/SPARK-37668. Authored-by: itholic <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information