Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running the reindex job across resources in multiple partitions and all partitions #6009

Merged
merged 12 commits into from
Jul 6, 2024

Conversation

codeforgreen
Copy link
Collaborator

@codeforgreen codeforgreen commented Jun 13, 2024

#6008

What was done:

  • added a new bean with interface IJobPartitionProvider which can return a list of partitions for operation requests. The implementation will handle requests configured with RequestPartitionId.allPartitions and return a list of all partitions, including the default partition. Please mind that this interface is likely to change as another factor for determining partitions to run a job against can also be a list of urls provided as parameters in these jobs. Further simplification can be made and I plan to add another PR just for that refactoring simplification (bring UrlPartitioner into the mix as it also computes partitions for jobs).
  • updated ReindexProvider to use the new bean to compute the partitions
  • made a few simplifications in the existing batch2 jobs since most of them required partitioning support so some json model classes (extending IModelJson) were merged and made some of the templating across job steps simpler. I removed the "partitioned" prefix out of the classes since there was no use-case not requiring partitions and it seemed right. For example, RequestDetails did not get extended when partitioning was added.
  • updated the batch2 framework such that passing a list of partitions can be understood by jobs based on GenerateRangeChunksStep and JobPartitions. Also made updates such that the chunk partitionId is the authoritative source for partition information to handle the next steps for the chunk.
  • introduced another loop in the GenerateRangeChunksStep such that can have cartesian product (urls x partitions) for generating chunks provided in the job parameters.
  • updated the documentation; may make a few more changes to explain the _ALL tenant.
  • added tests for the batch2 steps that were changed and for the reindex job

@codeforgreen codeforgreen added the Work In Progress Work does not need to be reviewed yet, and shouldnt be considered for staleness. label Jun 13, 2024
Copy link

github-actions bot commented Jun 13, 2024

Formatting check succeeded!

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.51%. Comparing base (497b9f2) to head (8f7a1aa).
Report is 126 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6009      +/-   ##
============================================
+ Coverage     83.39%   83.51%   +0.11%     
- Complexity    26927    27405     +478     
============================================
  Files          1681     1706      +25     
  Lines        103965   106104    +2139     
  Branches      13189    13395     +206     
============================================
+ Hits          86702    88613    +1911     
- Misses        11613    11759     +146     
- Partials       5650     5732      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codeforgreen codeforgreen changed the title [Draft] Add support for running batch2 jobs across resources in multiple partitions Add support for running batch2 jobs across resources in multiple partitions Jun 28, 2024
…PartitionProvider. Add tests and fix one test. Add missing changelog.
@codeforgreen codeforgreen changed the title Add support for running batch2 jobs across resources in multiple partitions Add support for running the reindex job across resources in multiple partitions and all partitions Jun 28, 2024
@codeforgreen codeforgreen removed the Work In Progress Work does not need to be reviewed yet, and shouldnt be considered for staleness. label Jun 28, 2024
@codeforgreen codeforgreen merged commit ecef727 into master Jul 6, 2024
66 checks passed
@codeforgreen codeforgreen deleted the 6008-reindex-across-multiple-partitions branch July 6, 2024 05:23
@codeforgreen codeforgreen restored the 6008-reindex-across-multiple-partitions branch July 6, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants