Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Hyperband Max Iter Fix #1620

Merged
merged 4 commits into from
Mar 3, 2018
Merged

Conversation

richardliaw
Copy link
Contributor

@richardliaw richardliaw commented Feb 27, 2018

What do these changes do?

Changes semantics of max_iter to be more intuitive.

TODO:

  • Fix tests

Related issue number

#1456

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3993/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3994/
Test PASSed.

self._r *= self._eta
self._r = int((self._r))
if self._cumul_r + self._r > self._max_t_attr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= min(...)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4048/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented Mar 3, 2018

Test hang looks unrelated, merging.

@ericl ericl merged commit 96d7938 into ray-project:master Mar 3, 2018
royf added a commit to royf/ray that referenced this pull request Apr 22, 2018
* 'master' of https://github.com/royf/ray-private:
  [rllib] Basic regression tests on CartPole (ray-project#1608)
  [autoscaler] [tune] More doc fixes (ray-project#1560)
  [tune] Hyperband Max Iter Fix (ray-project#1620)
krfricke added a commit that referenced this pull request Sep 8, 2023
This PR fixes two flaws in the current hyperband implementation.

#### 1. Bug in the `r` calculation.

#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed.

E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed.

I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper.

#### 2. Stopping of "overstepped" trials.

The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever.

This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition.

Signed-off-by: Kai Fricke <[email protected]>
krfricke added a commit to krfricke/ray that referenced this pull request Sep 8, 2023
This PR fixes two flaws in the current hyperband implementation.

#### 1. Bug in the `r` calculation.

ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed.

E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed.

I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper.

#### 2. Stopping of "overstepped" trials.

The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever.

This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition.

Signed-off-by: Kai Fricke <[email protected]>
GeneDer pushed a commit that referenced this pull request Sep 9, 2023
This PR fixes two flaws in the current hyperband implementation.

#### 1. Bug in the `r` calculation.

#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed.

E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed.

I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper.

#### 2. Stopping of "overstepped" trials.

The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever.

This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition.

Signed-off-by: Kai Fricke <[email protected]>
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
This PR fixes two flaws in the current hyperband implementation.

#### 1. Bug in the `r` calculation.

ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed.

E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed.

I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper.

#### 2. Stopping of "overstepped" trials.

The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever.

This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Jim Thompson <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
This PR fixes two flaws in the current hyperband implementation.

#### 1. Bug in the `r` calculation.

ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed.

E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed.

I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper.

#### 2. Stopping of "overstepped" trials.

The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever.

This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Victor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants