Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: EvaluationResult serialization changes dataframes #4906

Merged
merged 4 commits into from
May 16, 2023

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented May 12, 2023

Related Issues

Proposed Changes:

  • ensure None values stay the same
  • ensure index is set correctly

How did you test it?

  • extended existing tests

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added tests that demonstrate the correct behavior of the change
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@tstadel tstadel requested a review from a team as a code owner May 12, 2023 15:28
@tstadel tstadel requested review from julian-risch and removed request for a team May 12, 2023 15:28
@coveralls
Copy link
Collaborator

coveralls commented May 12, 2023

Coverage Status

Coverage: 38.151% (+0.02%) from 38.126% when pulling b33278a on fix/evalresult_dfs_serialization into 37cadd7 on main.

@github-actions github-actions bot added the type:documentation Improvements on the docs label May 12, 2023
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me and they are ready to be merged. 👍 The changed handling of np.nan is not covered by any test though I think. The PR could be improved by adding a test case that loads an evaluation result from disk that contains np.nan in one of the dataframes. The test could confirm that the loaded dataframe contains not np.nan but None.

@@ -1620,6 +1620,7 @@ def safe_literal_eval(x: str) -> Any:
node_results = {file.stem: pd.read_csv(file, **read_csv_kwargs) for file in csv_files}
# backward compatibility mappings
for df in node_results.values():
df.replace(to_replace=np.nan, value=None, inplace=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of this line is not tested, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is. The test of the generative pipeline produces None values for the context column. When reading the csv, we get np.nan values for this column. But I agree, it's not obvious. I'll write a test that makes this behavior explicit.

@tstadel tstadel merged commit 7625829 into main May 16, 2023
60 checks passed
@tstadel tstadel deleted the fix/evalresult_dfs_serialization branch May 16, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataframes of EvalResult are not the same after (de)serializing
3 participants