Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/databricks oauth #16155

Merged
merged 2 commits into from
Sep 6, 2023
Merged

Conversation

zyd14
Copy link
Contributor

@zyd14 zyd14 commented Aug 29, 2023

Summary & Motivation

Fixes #16126 about enabling oauth authentication for Databricks service principals

How I Tested These Changes

Unit tests + manually tested that Databricks steps can be launched via the step launcher using either a personal access token or oauth client ID + client secret.

During this manual testing I also uncovered a few bugs around the submit() call to the databricks-sdk which were preventing jobs from being launched (databricks-sdk requires a task_key parameter, and reading a file that doesn't exist now raises a DatabricksError). Currently the step launcher is broken as-is and will not work without these changes, so if this whole PR doesn't get merged before the next release then those two commits should probably at least get cherry-picked into a different PR and merged in. I should also note that it appears that the most recent release 0.20.10 works, and that the breaking changes must have been made since the last release.

@sryza
Copy link
Contributor

sryza commented Aug 30, 2023

@rexledesma are you able to take a look at this one?

@rexledesma
Copy link
Member

@rexledesma are you able to take a look at this one?

@smackesey Could you take a look? Some bugs were uncovered that look like fallout from #14673

@smackesey
Copy link
Collaborator

smackesey commented Aug 30, 2023

Some bugs were uncovered that look like fallout from #14673

I'll take a look

@zyd14
Copy link
Contributor Author

zyd14 commented Aug 31, 2023

@smackesey / @rexledesma - just tested the latest dagster / dagster-databricks release and it did in fact break the step launcher. Attempting to run a job with the step launcher configured gives this error:

databricks.sdk.core.DatabricksError: Task requires key (`task_key`).
  File "/Users/zach/Documents/empirico/projects/etxlib/.venvs/dagster-test-env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 273, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/Users/zach/Documents/empirico/projects/etxlib/.venvs/dagster-test-env/lib/python3.9/site-packages/dagster_databricks/databricks_pyspark_step_launcher.py", line 257, in launch_step
    databricks_run_id = self.databricks_runner.submit_run(self.run_config, task)
  File "/Users/zach/Documents/empirico/projects/etxlib/.venvs/dagster-test-env/lib/python3.9/site-packages/dagster_databricks/databricks.py", line 371, in submit_run
    return self.client.workspace_client.jobs.submit(
  File "/Users/zach/Documents/empirico/projects/etxlib/.venvs/dagster-test-env/lib/python3.9/site-packages/databricks/sdk/service/jobs.py", line 3575, in submit
    op_response = self._api.do('POST', '/api/2.1/jobs/runs/submit', body=body)
  File "/Users/zach/Documents/empirico/projects/etxlib/.venvs/dagster-test-env/lib/python3.9/site-packages/databricks/sdk/core.py", line 1003, in do
    raise self._make_nicer_error(status_code=response.status_code, **payload) from None

Would you like me to cherry-pick my fixes into another PR?

smackesey added a commit that referenced this pull request Sep 5, 2023
## Summary & Motivation

[This PR (#14673)](#14673)
broke the Databricks step launcher due to an incomplete implementation
of the new databricks-sdk API. I cherry-picked my fixes from [another
PR](#16155) that seems like it
may have been lost in the ether.

## How I Tested These Changes
Spun up a new environment with dagster / dagster-databricks 1.4.11 and
launched a job with an op that uses the
`databricks_pyspark_step_launcher`

---------

Co-authored-by: Sean Mackesey <[email protected]>
zyd14 added a commit to zyd14/dagster that referenced this pull request Sep 5, 2023
## Summary & Motivation

[This PR (dagster-io#14673)](dagster-io#14673)
broke the Databricks step launcher due to an incomplete implementation
of the new databricks-sdk API. I cherry-picked my fixes from [another
PR](dagster-io#16155) that seems like it
may have been lost in the ether.

## How I Tested These Changes
Spun up a new environment with dagster / dagster-databricks 1.4.11 and
launched a job with an op that uses the
`databricks_pyspark_step_launcher`

---------

Co-authored-by: Sean Mackesey <[email protected]>
@smackesey
Copy link
Collaborator

@zyd14 looks like you merged instead of rebased? If you're having git problems or would otherwise prefer I can just take this over to move it over the line.

@zyd14
Copy link
Contributor Author

zyd14 commented Sep 5, 2023

@smackesey Ah okay, honestly I'm not as familiar with rebase strategies. I thought I clicked through rebasing from master in my IDE but must've messed something up along the way. If you don't mind taking over that's totally fine with me.

zyd14 and others added 2 commits September 6, 2023 11:57
…to the DatabricksClient. Throwing an exception if the `api_client` or `client` properties are access when oauth credentials are used, as those clients do not support oauth logins.
@rexledesma rexledesma removed their request for review September 6, 2023 16:27
@smackesey smackesey merged commit 1eed77f into dagster-io:master Sep 6, 2023
1 check passed
@zyd14 zyd14 deleted the feature/databricks-oauth branch September 6, 2023 17:31
zyd14 added a commit to zyd14/dagster that referenced this pull request Sep 15, 2023
## Summary & Motivation
Fixes dagster-io#16126 about enabling oauth authentication for Databricks service
principals
## How I Tested These Changes
Unit tests + manually tested that Databricks steps can be launched via
the step launcher using either a personal access token or oauth client
ID + client secret.

During this manual testing I also uncovered a few bugs around the
`submit()` call to the `databricks-sdk` which were preventing jobs from
being launched (databricks-sdk requires a [task_key
parameter](dagster-io@7d2466c),
and reading a file that doesn't exist now raises a
[DatabricksError](dagster-io@fd91538)).
Currently the step launcher is broken as-is and will not work without
these changes, so if this whole PR doesn't get merged before the next
release then those two commits should probably at least get
cherry-picked into a different PR and merged in. I should also note that
it appears that the most recent release 0.20.10 works, and that the
breaking changes must have been made since the last release.

---------

Co-authored-by: Sean Mackesey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable oauth authentication flow for Databricks jobs launched with dagster-databricks
4 participants