Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major scaling problems processing when dynamically yielding huge amount of tasks for parallel execution #4

Open
proycon opened this issue Jul 6, 2016 · 2 comments
Assignees

Comments

@proycon
Copy link
Member

proycon commented Jul 6, 2016

The luigi scheduler doesn't seem to cope well with a huge amount (300,000 in my test) of scheduled tasks (siblings in the dependency graph). Scheduling becomes the bottleneck and load is not distributed over available workers.

Attempting to bypass this issue by grouping tasks in batches...

@proycon
Copy link
Member Author

proycon commented Jan 6, 2017

This has been stalled for some time due to other priorities and depends on resolution of spotify/luigi#1750 (non-trivial); I hope to get time to dive into it soon..

proycon added a commit to proycon/luigi that referenced this issue Jan 31, 2017
…ical sort, so next tasks can be obtained from the scheduler in O(1) instead of O(n). Deals with issue spotify#1750 for LanguageMachines/LuigiNLP#4. Still contains lots of debug statements and breaks certain stuff.
@proycon
Copy link
Member Author

proycon commented Mar 14, 2017

The above patch for Luigi works but needs further testing and may break other behaviour. Not merged into master yet. Waiting to see if further action is needed.

@proycon proycon added waiting and removed PRIORITY labels Mar 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant