Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] Add tune API to support tune tab in new backend #11009

Merged
merged 2 commits into from
Oct 2, 2020

Conversation

mfitton
Copy link
Contributor

@mfitton mfitton commented Sep 24, 2020

Why are these changes needed?

These changes are needed to enable running the existing Tune tab with the new backend. There are a few changes to the tune backend to make it work with async io, but otherwise it is pretty much identical to how it was in the old dashboard backend.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

I tested this alongside many other changes in a branch that contains the full migration code. That said, this is not yet unit tested.

@mfitton mfitton added the dashboard Issues specific to the Ray Dashboard label Sep 24, 2020
@mfitton mfitton added this to the New Dashboard milestone Sep 24, 2020
@simon-mo
Copy link
Contributor

cc @richardliaw

if not self._tensor_board_dir:
tb = program.TensorBoard()
tb.configure(argv=[None, "--logdir", str(self._logdir)])
tb.launch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this eventually get terminated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw I just ran it and tensorboard died when I killed my ray cluster. It seems to fate share with its parent process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm then

from ray.tune import Analysis
from tensorboard import program
except ImportError:
Analysis = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened if the Analysis is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Analysis is none, the Tune tab availability check will fail, and the tune tab won't be displayed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The self.collect() does not raise an exception? That will be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not, I added a check.

transforms them, and serves them up over an API.

Args
logdir (str): Directory path to save the status information of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Args comments can be removed? Or, can we move the comments to self._logdir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you.

try:
experiment = req.query["experiment"]
except KeyError:
return await rest_response(success=False, message="Bad request")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @routes.get decorator catches the handler exception and returns rest_response(success=False, message=<exception stack>). This try ... except ... do the same thing, but makes the response much cleaner rest_response(success=False, message="Bad request").

self._tensor_board_dir = self._logdir

def _get_availability(self):
return {"available": True, "trials_available": self._trials_available}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the _get_availability is only used by the get_availability, is it better to merge them?

self._errors[str(trial)]["trial_id"] = str(trial_id)
if str(trial_id) in self._trial_records.keys():
self._trial_records[str(trial_id)]["error"] = text
self._trial_records[str(trial_id)][
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many str(trail_id), can we just trail_id = str(other_data["trial_id"].values[0])?

"hostname", "node_ip", "time_since_restore",
"timesteps_since_restore", "iterations_since_restore",
"experiment_tag", "trial_id"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't care the order of the names, can we change the default_names to a set?

timeit.timeit("'iterations_since_restore' in default_names_set", globals=globals())
0.04095575900282711
timeit.timeit("'iterations_since_restore' in default_names", globals=globals())
0.2835074480099138

@rkooo567 rkooo567 assigned fyrestone and unassigned rkooo567 Sep 25, 2020
@mfitton
Copy link
Contributor Author

mfitton commented Oct 2, 2020

@fyrestone @richardliaw could you please take a look to see if your concerns are addressed and approve if so?

@richardliaw
Copy link
Contributor

Great - could you post a screenshot of this working?

@mfitton
Copy link
Contributor Author

mfitton commented Oct 2, 2020

@richardliaw This PR doesn't make this implementation the default dashboard yet. Currently I'm shepherding through multiple PRs to be able to make a PR to switch over to this dashboard by default.

Given that, this branch doesn't even have the front-end changes included for me to show you this working. I did test it on a branch where I had integrated all these changes before making the PRs.

When I make a PR to deprecate the old dashboard and turn this functionality on, I'll tag you in it and provide a screenshot. Is that alright @richardliaw ?

@richardliaw
Copy link
Contributor

Sounds good! No objections for merging from me.

@mfitton
Copy link
Contributor Author

mfitton commented Oct 2, 2020

I'm not authorized to merge. @rkooo567 could you please merge?

@rkooo567 rkooo567 merged commit 6ed8459 into ray-project:master Oct 2, 2020
@mfitton mfitton deleted the ant-dash-transition-tune branch October 2, 2020 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Issues specific to the Ray Dashboard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants