Straggling Task Detection Improvement #53

zhangpengshan · 2014-10-10T05:07:49Z

So far, if a task/container run over threshold three times, it will be killed and fail-over make this task run again in other machine. But I found in our cluster(very busy), sometimes there are always a slow task blocking all other tasks. A good detection improvement is needed to detect such kind of task.

Not to let user set the threshold, while collecting metrics each iteration from all workers, if someone is over standard deviation too more, kill it.

zhangpengshan added the enhancement label Oct 10, 2014

zhangpengshan added this to the 0.7.0 milestone Feb 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Straggling Task Detection Improvement #53

Straggling Task Detection Improvement #53

zhangpengshan commented Oct 10, 2014

Straggling Task Detection Improvement #53

Straggling Task Detection Improvement #53

Comments

zhangpengshan commented Oct 10, 2014