Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch stabilization #4647

Merged
merged 33 commits into from
Apr 25, 2023
Merged

Batch stabilization #4647

merged 33 commits into from
Apr 25, 2023

Conversation

michaelabuckley
Copy link
Contributor

@michaelabuckley michaelabuckley commented Mar 14, 2023

Rework our batch2 framework for correctness.

  • Move all conditional JobInstance updates into write-locked transactions
  • Change all WorkChunk writes to use event transactions defined in IWorkChunkPersistence
  • Simplify stats calc
  • Clarify error handling during work chunk processing.

@michaelabuckley michaelabuckley changed the title Use java event names for work chunk transitions. Batch stabilization Mar 17, 2023
@michaelabuckley michaelabuckley marked this pull request as draft March 17, 2023 15:48
michaelabuckley and others added 6 commits March 17, 2023 12:11
Avoid fetching work-chunk data (#4622)
* add end time to reduction step

* add changelog

---------

Co-authored-by: Long Ma <[email protected]>
(cherry picked from commit 37f5e59)
Provide error message in cancelled jobs, and avoid transitions in final states.
@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.11 🎉

Comparison is base (2f5ffe7) 81.32% compared to head (8c398c9) 81.43%.

❗ Current head 8c398c9 differs from pull request most recent head 41646c1. Consider uploading reports for the commit 41646c1 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4647      +/-   ##
============================================
+ Coverage     81.32%   81.43%   +0.11%     
- Complexity    23650    24869    +1219     
============================================
  Files          1425     1525     +100     
  Lines         86399    91237    +4838     
  Branches      11677    12243     +566     
============================================
+ Hits          70265    74302    +4037     
- Misses        10947    11531     +584     
- Partials       5187     5404     +217     

see 1475 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…nstance-transactions

# Conflicts:
#	hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/batch2/JpaJobPersistenceImpl.java
#	hapi-fhir-storage-batch2-test-utilities/src/main/java/ca/uhn/hapi/fhir/batch2/test/AbstractIJobPersistenceSpecificationTest.java
#	hapi-fhir-storage-batch2/src/main/java/ca/uhn/fhir/batch2/api/IWorkChunkPersistence.java
#	hapi-fhir-storage-batch2/src/main/java/ca/uhn/fhir/batch2/coordinator/JobCoordinatorImpl.java
#	hapi-fhir-storage-batch2/src/main/java/ca/uhn/fhir/batch2/coordinator/WorkChannelMessageHandler.java
#	hapi-fhir-storage-batch2/src/main/java/ca/uhn/fhir/batch2/model/WorkChunkCreateEvent.java
#	hapi-fhir-storage-batch2/src/test/java/ca/uhn/fhir/batch2/coordinator/JobCoordinatorImplTest.java
@michaelabuckley michaelabuckley marked this pull request as ready for review April 18, 2023 16:57
…pa/test/Batch2JobHelper.java

Co-authored-by: StevenXLi <[email protected]>
@jamesagnew jamesagnew enabled auto-merge (squash) April 25, 2023 21:54
@tadgh tadgh merged commit 8813d9b into master Apr 25, 2023
@tadgh tadgh deleted the mb-job-instance-transactions branch April 25, 2023 22:47
Copy link
Collaborator

@fil512 fil512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to just before ReductionStepDataSink.accept()

tadgh pushed a commit that referenced this pull request Apr 25, 2023
* Use java event names for work chunk transitions.

* Cherry-pick d5ebd1f from rel_6_4

Avoid fetching work-chunk data (#4622)

* add end time to reduction step (#4640)

* add end time to reduction step

* add changelog

---------

Co-authored-by: Long Ma <[email protected]>
(cherry picked from commit 37f5e59)

* Cancel processing

Provide error message in cancelled jobs, and avoid transitions in final states.

* Apply tx boundary to starting job and first chunk.

* cleanup

* Apply tx boundary to work chunk processing

* Delete BatchWorkChunk

* Introduce events for job create, and chunk dequeue

* Apply tx boundary to chunk handler

* Move instance cancellation to database

* tx boundary around stats collection and completion

* tx boundary around stats collection and completion

* Extend tx boundary to error, fail, and cancel

* Move failure into status calc

* ERROR is not an "ended" state.

* Revert generics cleanup to avoid noise

* Avoid sending gated chunks twice.

* Make no-data path safer.  Cleanup

* Fix mock test for step advance.

* Delete unsafe updateInstace() call

* Cleanup

* Changelog and notes

* Fix cancel boundary.  Cleanups

* Cleanup

* Sort mongo chunks for stable paging.

Other cleanup

* Document error handling

* Cleanup

* Update hapi-fhir-jpaserver-test-utilities/src/main/java/ca/uhn/fhir/jpa/test/Batch2JobHelper.java

Co-authored-by: StevenXLi <[email protected]>

---------

Co-authored-by: longma1 <[email protected]>
Co-authored-by: StevenXLi <[email protected]>
tadgh pushed a commit that referenced this pull request Apr 26, 2023
* Use java event names for work chunk transitions.

* Cherry-pick d5ebd1f from rel_6_4

Avoid fetching work-chunk data (#4622)

* add end time to reduction step (#4640)

* add end time to reduction step

* add changelog

---------

Co-authored-by: Long Ma <[email protected]>
(cherry picked from commit 37f5e59)

* Cancel processing

Provide error message in cancelled jobs, and avoid transitions in final states.

* Apply tx boundary to starting job and first chunk.

* cleanup

* Apply tx boundary to work chunk processing

* Delete BatchWorkChunk

* Introduce events for job create, and chunk dequeue

* Apply tx boundary to chunk handler

* Move instance cancellation to database

* tx boundary around stats collection and completion

* tx boundary around stats collection and completion

* Extend tx boundary to error, fail, and cancel

* Move failure into status calc

* ERROR is not an "ended" state.

* Revert generics cleanup to avoid noise

* Avoid sending gated chunks twice.

* Make no-data path safer.  Cleanup

* Fix mock test for step advance.

* Delete unsafe updateInstace() call

* Cleanup

* Changelog and notes

* Fix cancel boundary.  Cleanups

* Cleanup

* Sort mongo chunks for stable paging.

Other cleanup

* Document error handling

* Cleanup

* Update hapi-fhir-jpaserver-test-utilities/src/main/java/ca/uhn/fhir/jpa/test/Batch2JobHelper.java

Co-authored-by: StevenXLi <[email protected]>

---------

Co-authored-by: longma1 <[email protected]>
Co-authored-by: StevenXLi <[email protected]>
@michaelabuckley michaelabuckley mentioned this pull request Apr 26, 2023
michaelabuckley added a commit that referenced this pull request Apr 28, 2023
Post review batch2 cleanup.

Review fixes and cleanup from #4647

Deleted methods in IJobPersistence unused in prod paths, and replaced their usage in tests.
Lots of docs.
Replace copy-on-write pattern in JobDefinitionRegistry with simple ConcurrentHashMap
Fixed bad mappings from job ERRORED state to external APIs. People think ERRORED is a failed state. ¯\_(ツ)_/¯
Added some spec tests for chunk purging
Deprecated ERRORED. For deleting in 6.8
Lots of plans for 6.8. Too risky for 6.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants