Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native MLFlow tracking server support #35

Merged
merged 36 commits into from
Aug 7, 2020
Merged

Conversation

ajslone
Copy link
Collaborator

@ajslone ajslone commented Jul 9, 2020

PR for MLFlow integration.

todo list:

  • see if we can get mflow to use name of script vs. "caliban"
  • set run tags for mode (cpu, gpu, etc)
  • logs for runs
  • set run name vs run_id
  • For Cloud jobs, add a tag that links to the AI Platform job manager page
  • same as above for GKE
  • uv env var prefixing (ENVVAR_*) = tags
  • for wrapper, pass single json
  • experiment name as env_var for UV to pick up MLFLOW_EXPERIMENT_NAME
  • caliban shell when there is a sql proxy entry in .calibanconfig.json

@ajslone ajslone requested a review from sritchie July 9, 2020 22:00
@codecov
Copy link

codecov bot commented Jul 9, 2020

Codecov Report

Merging #35 into master will increase coverage by 0.32%.
The diff coverage is 48.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
+ Coverage   51.51%   51.84%   +0.32%     
==========================================
  Files          30       31       +1     
  Lines        2972     3148     +176     
==========================================
+ Hits         1531     1632     +101     
- Misses       1441     1516      +75     
Impacted Files Coverage Δ
caliban/config/__init__.py 100.00% <ø> (ø)
caliban/platform/gke/cluster.py 29.75% <9.09%> (-0.31%) ⬇️
caliban/platform/gke/cli.py 21.72% <14.28%> (-0.12%) ⬇️
caliban/docker/build.py 33.17% <18.42%> (-0.33%) ⬇️
caliban/platform/cloud/core.py 23.30% <30.00%> (+0.06%) ⬆️
caliban/resources/cloud_sql_proxy.py 50.00% <33.33%> (-2.00%) ⬇️
caliban/resources/caliban_launcher.py 33.84% <33.84%> (ø)
caliban/platform/run.py 28.44% <46.15%> (+2.73%) ⬆️
caliban/util/metrics.py 100.00% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5377b00...85fda3e. Read the comment docs.

@ajslone
Copy link
Collaborator Author

ajslone commented Jul 9, 2020

todo list:

  • see if we can get mflow to use name of script vs. "caliban"
  • set run tags for mode (cpu, gpu, etc)
  • logs for runs?
  • set run name vs run_id
  • For GKE or Cloud jobs, add a tag that links to the AI Platform / GKE job manager page

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@ajslone
Copy link
Collaborator Author

ajslone commented Jul 10, 2020

@googlebot I consent.

@sritchie
Copy link
Collaborator

@ajslone , here's an idea for how to handle the wrapper script in a slightly cleaner way.

from pkg_resources import resource_filename

def wrapper_script_path():
  return resource_filename('caliban.resources', FILEPATH)

this is from a different project I'm working on; but if you modify the script to take arguments instead of actually templating it, then you can put the script into caliban/resources/script.sh and use caliban.util.fs.TempCopy to get a local, temporary copy into the user's directory so it can get copied into the build.

the advantage is that we won't print the entire script out to stdout as the image is building.

I think this solves another issue for us too... the issue of injecting run-time environment variables into the container. Since we can send in kv pairs of env variable / value, have this wrapper script take them and set them, and then pass the rest of its flags into the user's script, this should solve the problem of injecting run and experiment IDs into the container.

(I think if we do this it's worth promoting an --env command up in caliban. right now you have to send in env variables using --docker_run_args, but this is cleaner, I think, since it solves the problem for AI Platform too)

@ajslone
Copy link
Collaborator Author

ajslone commented Jul 24, 2020

@sritchie agree this is a far better way to do this, I'll push a change to implement this way.

@sritchie sritchie changed the title mlflow test for local caliban runs Native MLFlow tracking server support Jul 29, 2020
setup.py Outdated
@@ -56,7 +56,8 @@ def readme():
# dep that commentjson brings in. Delete once this is merged:
# https://github.com/vaidik/commentjson/pull/33/files
'lark-parser>=0.7.1,<0.8.0',
'SQLAlchemy>=1.3.11',
'SQLAlchemy==1.3.11',
'mlflow>=1.9.1',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where we use this dependency, can we remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this was a relic from earlier tests where we were calling the mlflow api in caliban. Will remove.

caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/resources/caliban_cloud_sql.py Outdated Show resolved Hide resolved
caliban/resources/caliban_wrapper.py Outdated Show resolved Hide resolved
caliban/resources/caliban_wrapper.py Outdated Show resolved Hide resolved
caliban/resources/caliban_wrapper.py Outdated Show resolved Hide resolved
caliban/util/fs.py Outdated Show resolved Hide resolved
caliban/cli.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/docker/build.py Outdated Show resolved Hide resolved
caliban/main.py Outdated Show resolved Hide resolved
@ajslone
Copy link
Collaborator Author

ajslone commented Aug 5, 2020

these latest changes are a WIP, will break this up once I can verify everything is working

setup.py Outdated Show resolved Hide resolved
ajslone pushed a commit that referenced this pull request Aug 6, 2020
This PR:

Adds the ability to inject a launcher script into the container, with the ability to intercept arguments, set environment variables and start up a service that will be available for the duration of the container's task.
This is the conclusion of a port of #35 .

Co-authored-by: Ambrose Slone <[email protected]>
@ajslone ajslone marked this pull request as ready for review August 7, 2020 16:15
@ajslone ajslone merged commit 49fe8ae into master Aug 7, 2020
@ajslone ajslone deleted the aslone/mlflow_tracking branch August 7, 2020 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants