Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BigQueryInsertJobOperator cancel_on_kill #25342

Merged
merged 9 commits into from
Aug 4, 2022

Conversation

lidalei
Copy link
Contributor

@lidalei lidalei commented Jul 27, 2022

This PR fixes an issue with BigQueryInsertJobOperator. If the task reaches timeout set by task execution_timeout, on_kill will be called but the self.job_id is None. This is because the function _submit_job is a blocking call but self.job_id is only set after it. This PR is hugely inspired by #22955.

@lidalei lidalei requested a review from turbaszek as a code owner July 27, 2022 14:40
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jul 27, 2022
@lidalei lidalei marked this pull request as draft July 27, 2022 14:44
@lidalei lidalei marked this pull request as ready for review July 29, 2022 08:50
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks cool

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

Some errors though

@lidalei
Copy link
Contributor Author

lidalei commented Jul 29, 2022

Some errors though

Fixed the failed test case. The Conflict exception will be raised when we call _begin.

Traceback (most recent call last):
  File "/Users/dalei/.pyenv/versions/3.8.10/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Users/dalei/go/src/github.com/lidalei/airflow/airflow/providers/google/common/hooks/base_google.py", line 463, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/Users/dalei/go/src/github.com/lidalei/airflow/airflow/providers/google/cloud/hooks/bigquery.py", line 1542, in insert_job
    job._begin()
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1298, in _begin
    super(QueryJob, self)._begin(client=client, retry=retry, timeout=timeout)
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 510, in _begin
    api_response = client._call_api(
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 759, in _call_api
    return call()
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Conflict: 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/xxx/jobs?prettyPrint=false: Already Exists: Job xxx:EU.abc_test
Location: EU
Job ID: abc_test

@lidalei lidalei requested review from josh-fell and potiuk July 29, 2022 15:01
@potiuk potiuk merged commit e84d753 into apache:main Aug 4, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 4, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants