fix: `EvaluationResult` serialization changes dataframes #4906

tstadel · 2023-05-12T15:28:28Z

Related Issues

fixes Dataframes of EvalResult are not the same after (de)serializing #4905

Proposed Changes:

ensure None values stay the same
ensure index is set correctly

How did you test it?

extended existing tests

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added tests that demonstrate the correct behavior of the change
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2023-05-12T15:47:04Z

Coverage: 38.151% (+0.02%) from 38.126% when pulling b33278a on fix/evalresult_dfs_serialization into 37cadd7 on main.

julian-risch

The changes look good to me and they are ready to be merged. 👍 The changed handling of np.nan is not covered by any test though I think. The PR could be improved by adding a test case that loads an evaluation result from disk that contains np.nan in one of the dataframes. The test could confirm that the loaded dataframe contains not np.nan but None.

julian-risch · 2023-05-15T12:01:10Z

haystack/schema.py

@@ -1620,6 +1620,7 @@ def safe_literal_eval(x: str) -> Any:
 node_results = {file.stem: pd.read_csv(file, **read_csv_kwargs) for file in csv_files}
 # backward compatibility mappings
 for df in node_results.values():
+ df.replace(to_replace=np.nan, value=None, inplace=True)


The effect of this line is not tested, is it?

Actually it is. The test of the generative pipeline produces None values for the context column. When reading the csv, we get np.nan values for this column. But I agree, it's not obvious. I'll write a test that makes this behavior explicit.

fix nan and index values

b8f8562

tstadel requested a review from a team as a code owner May 12, 2023 15:28

tstadel requested review from julian-risch and removed request for a team May 12, 2023 15:28

github-actions bot added topic:pipeline topic:tests labels May 12, 2023

add test

0b09584

github-actions bot added the type:documentation Improvements on the docs label May 12, 2023

julian-risch approved these changes May 15, 2023

View reviewed changes

tstadel and others added 2 commits May 16, 2023 14:53

make test for None values after evalresult read explicit

3159b3c

Merge branch 'main' into fix/evalresult_dfs_serialization

b33278a

tstadel merged commit 7625829 into main May 16, 2023
60 checks passed

tstadel deleted the fix/evalresult_dfs_serialization branch May 16, 2023 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `EvaluationResult` serialization changes dataframes #4906

fix: `EvaluationResult` serialization changes dataframes #4906

tstadel commented May 12, 2023

coveralls commented May 12, 2023 •

edited

Loading

julian-risch left a comment

julian-risch May 15, 2023

tstadel May 15, 2023

fix: EvaluationResult serialization changes dataframes #4906

fix: EvaluationResult serialization changes dataframes #4906

Conversation

tstadel commented May 12, 2023

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented May 12, 2023 • edited Loading

julian-risch left a comment

Choose a reason for hiding this comment

julian-risch May 15, 2023

Choose a reason for hiding this comment

tstadel May 15, 2023

Choose a reason for hiding this comment

fix: `EvaluationResult` serialization changes dataframes #4906

fix: `EvaluationResult` serialization changes dataframes #4906

coveralls commented May 12, 2023 •

edited

Loading