Skip to content

Commit

Permalink
[Data] Check if BigQuery dataset exists before creation (#41630)
Browse files Browse the repository at this point in the history
We found issue that user may not grant permission to create Google BigQuery dataset when using Ray Data `write_bigquery()` API. So the `client.create_dataset` call would fail with permission error:

```
Access Denied: Project ...: User does not have bigquery.datasets.create permission in project ...
```

So here we change to check if BigQuery dataset already exists, if not then trying to create it.

Signed-off-by: Cheng Su <[email protected]>
  • Loading branch information
c21 committed Dec 6, 2023
1 parent e92f015 commit 9516630
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions python/ray/data/datasource/bigquery_datasink.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,14 @@ def on_write_start(self) -> None:
client = bigquery.Client(project=self.project_id)
dataset_id = self.dataset.split(".", 1)[0]
try:
client.create_dataset(f"{self.project_id}.{dataset_id}", timeout=30)
logger.info("Created dataset " + dataset_id)
except exceptions.Conflict:
client.get_dataset(dataset_id)
logger.info(
f"Dataset {dataset_id} already exists. "
"The table will be overwritten if it already exists."
)
except exceptions.NotFound:
client.create_dataset(f"{self.project_id}.{dataset_id}", timeout=30)
logger.info("Created dataset " + dataset_id)

# Delete table if it already exists
client.delete_table(f"{self.project_id}.{self.dataset}", not_found_ok=True)
Expand Down

0 comments on commit 9516630

Please sign in to comment.