Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up aggregation #647

Merged
merged 38 commits into from
May 23, 2023
Merged

Conversation

davidhassell
Copy link
Collaborator

@davidhassell davidhassell commented May 3, 2023

Fixes #640, #144

Needs cfunits>=3.3.6

Notes

  • constructlist.py - Protect logger calls.
  • coordinatereference.py - Protect logger calls.
  • flags.py - Protect logger calls.
  • auxiliarycoordinate.py - Protect logger calls.
  • fielddomain.py - Protect logger calls.
  • propertiesdata,py - Protect logger calls. relaxed_units and copy keywords to concatenate.
  • propertiesdatabounds,py - Protect logger calls. relaxed_units and copy keywords to concatenate.
  • formula_terms.py - Protect logger calls.
  • query.py - Protect logger calls.
  • read.py - Protect logger calls.
  • timeduration.py - Protect logger calls.
  • functions.py - Deprecate hash_array now that we have Dask names.
  • fieldlist.py - cull_graph defaults to False. relaxed_units and copy keywords to concatenate.
  • data.py - Deterministic name for the data; new copy keyword to concatenate; cull_graph defaults to False; add potential short-circuit to equals.
  • utils.py - Speed up conform_units
  • dimensioncoordinate.py - Speed up _infer_direction.
  • field.py - Protect logger call; cull_graph defaults to False. relaxed_units and copy keywords to concatenate.
  • netcdfread.py - Don't cache values from field constructs; Unicode updates; fix up element caching.
  • umread.py - Protect logger calls; set array datatypes; avoid unnecessary copies; cache regular coordinate arrays; set cached elements on Data objects; don't build stash2standardnames dictionary on import.
  • cellmethod.py - Protect logger calls, tweak equivalent.
  • aggregate.py - Protect logger calls; Lots of changes that are hopefully sufficiently documented (let me know if not :)). A major change is the relocation of the concatenate commands from def _aggregate_2_fields to def aggregate, for reasons described in the new comments.

@davidhassell davidhassell added this to the 3.15.1 milestone May 3, 2023
@davidhassell davidhassell added performance Relating to speed and memory performance aggregation Rerlating to metadata-based field and domain aggregation bug Something isn't working labels May 4, 2023
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an intermediate review, since I've just covered all of the files directly under the cf/ directory (i.e. not within a sub-directory e.g. data, mixin), notably the big one aggregate.py, and it might be useful to give you time to respond to feedback in the meantime. I'll complete the review by the end of the day.

Generally everything looks good, though do note in particular the comment regarding the higher log levels being captured by the protecting keywords. Otherwise just minor comments so far. Thanks.

cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
cf/aggregate.py Outdated Show resolved Hide resolved
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still mid-review. I've just covered the data directory changes, with some further minor feedback as registered below. I'll try to complete the review tonight.

cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Show resolved Hide resolved
cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Outdated Show resolved Hide resolved
cf/data/utils.py Outdated Show resolved Hide resolved
cf/data/utils.py Outdated Show resolved Hide resolved
davidhassell and others added 17 commits May 15, 2023 14:23
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have covered everything but the tests and documentation changes now. Thought I'd leave some more intermediate comments, whilst you appear to be addressing the previous feedback.

cf/field.py Show resolved Hide resolved
cf/field.py Outdated Show resolved Hide resolved
cf/field.py Show resolved Hide resolved
cf/field.py Outdated Show resolved Hide resolved
cf/field.py Outdated Show resolved Hide resolved
cf/read_write/netcdf/netcdfread.py Outdated Show resolved Hide resolved
cf/read_write/netcdf/netcdfread.py Show resolved Hide resolved
cf/read_write/um/umread.py Outdated Show resolved Hide resolved
cf/read_write/um/umread.py Outdated Show resolved Hide resolved
cf/timeduration.py Outdated Show resolved Hide resolved
davidhassell and others added 2 commits May 15, 2023 15:12
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
davidhassell and others added 11 commits May 15, 2023 15:18
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
@sadielbartholomew
Copy link
Member

Thanks for addressing much feedback so far, David. To aid further review, please can you resolve the conflicts here, since quite a list has developed at this stage...

@davidhassell
Copy link
Collaborator Author

resolve the conflicts

Done.

This reverts commit 21737ef, reversing
changes made to 8878747.
@davidhassell
Copy link
Collaborator Author

Oh hell - I've mucked it up. Need to revert all of the last changes.

@davidhassell
Copy link
Collaborator Author

I'm not sure why the merge conflicts have disappeared after I reverted their resolutions, but I think everything looks ok ....

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think everything looks ok

The conflict resolution seems fine to me, at this stage.

Just finishing up the review. One question: did you mean to commit the changes under docs other than under docs/source, namely those in docs/_downloads, docs/_images and docs/recipes to touch recipe-related content? I am wondering if that was unintentional since it seems unrelated to this PR...

@davidhassell
Copy link
Collaborator Author

One question: did you mean to commit the changes under docs other than under docs/source, namely those in docs/_downloads, docs/_images and docs/recipes to touch recipe-related content? I am wondering if that was unintentional since it seems unrelated to this PR...

No. Oops. I guess I did something odd and the merged #653 became entrained, somehow. I think it's benign in that it'll all come out in the wash, but if there's a good way of removing these that's also be nice.

@sadielbartholomew
Copy link
Member

I think it's benign in that it'll all come out in the wash, but if there's a good way of removing these that's also be nice.

I'll try some git CLI wizardry (or, tomfoolery, if it goes wrong...) to exclude all but docs/source from docs in this PR. If it looks about right I'll push it up, unless you tell me not to (I assume in the worst case I push up something wrong we can recover via your local branch, so it should be safe?)...

@davidhassell
Copy link
Collaborator Author

I'll try some git CLI wizardry

Thanks, Sadie - sounds like a plan,

@sadielbartholomew
Copy link
Member

sadielbartholomew commented May 22, 2023

Hi David, all done via using git checkout origin/main -- docs/<directory involving recipes touched> to revert any files under those to the state they are in the current (up-to-date) main branch. That's all pushed in the latest commit but it is easier now just to check the full diff/'Files changed' page and see that no recipes changes now feature (I had a brief check to see it did what I hoped it would). (Sorry for the delay, it took a while to work out how to push to this specific branch of yours, from my own copy. Kept trying to push to a new branch under your user account by mistake and it naturally wouldn't let me do that.)

I'll finish the full PR review by checking the testing, within an hour or so. Thanks.

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous feedback addressed, thanks. I just had the testing to look through to finish the review and I've done that now. It all looks good (though we can remove a skip, as commented in-line). I also did a final sanity check on the state of the PR after the feedback changes and after my push to revert the recipe-related changes that made their way into the PR. All tests pass locally.

All good, please merge!

cf/test/test_cfa.py Outdated Show resolved Hide resolved
Co-authored-by: Sadie L. Bartholomew <[email protected]>
@davidhassell
Copy link
Collaborator Author

Hi Sadie - OK, ready to merge! Thanks for your review of this - really appreciated.

@davidhassell davidhassell merged commit 2f76131 into NCAS-CMS:main May 23, 2023
@davidhassell davidhassell linked an issue Jun 15, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregation Rerlating to metadata-based field and domain aggregation bug Something isn't working performance Relating to speed and memory performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregation has slowed down after the introduction of dask at 3.14.0 Improve performance of cf.aggregate
2 participants