Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLP job trigger on bigquery cannot use timespan config #14096

Closed
ayferkrd opened this issue Mar 26, 2023 · 14 comments · Fixed by GoogleCloudPlatform/magic-modules#11035, hashicorp/terraform-provider-google-beta#7669 or #18621

Comments

@ayferkrd
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.4.2
on darwin_amd64

  • provider registry.terraform.io/hashicorp/archive v2.3.0
  • provider registry.terraform.io/hashicorp/google v4.57.0

Affected Resource(s)

  • google_data_loss_prevention_job_trigger

Terraform Configuration Files

resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_timestamp" {
  parent       = "projects/${var.project_id}"
  description  = "BigQuery DLP Job Trigger with timestamplimit"
  display_name = "bigquery-dlp-job-trigger-limit-timestamp"

  triggers {
    schedule {
      recurrence_period_duration ="86400s"
    }
  }

  inspect_job {
    inspect_template_name = "projects/test/locations/global/inspectTemplates/6425492983381733900" 
 storage_config {
      big_query_options {
        table_reference {
          project_id = var.project_id
          dataset_id = "dataset"
          table_id   = "table"
        }
      }

      timespan_config {
        start_time = "2023-01-01T00:00:23Z"
        timestamp_field {
          name = "timestamp"
        }
        enable_auto_population_of_timespan_config = true
      }
    }

    actions {
      save_findings {
        output_config {
          table {
            project_id = var.project_id
            dataset_id = "output"
          }
        }
      }
    }
  }
}

Debug Output

Error: Error creating JobTrigger: googleapi: Error 400: rows_limit or rows_limit_percent must be greater than zero when sample method is specified.

│ with module.bq_dlp_trigger.google_data_loss_prevention_job_trigger.bigquery_row_limit_timestamp,
│ on modules/bq_dlp_trigger/dlp_job_trigger.tf line 1, in resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_timestamp":
│ 1: resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_timestamp" {

Panic Output

Expected Behavior

Should create a DLP job trigger with timestamp configuration to run through the BigQuery.

Actual Behavior

Cannot create DLP job trigger

Steps to Reproduce

  1. terraform apply

Important Factoids

rows_limit or rows_limit_percent cannot be used in conjunction with TimespanConfig.

References

[<!---
Information about referencing Github Issues: https://help.github.com/articles/basic-writing-and-formatting-syntax/#referencing-issues-and-pull-requests

Are there any other GitHub issues (open or closed) or pull requests that should be linked here? Vendor documentation? For example:
--->](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/data_loss_prevention_job_trigger#attributes-reference)

  • #0000
@ayferkrd ayferkrd added the bug label Mar 26, 2023
@edwardmedia edwardmedia self-assigned this Mar 27, 2023
@edwardmedia
Copy link
Contributor

@ayferkrd it seems to be fine with me if I use below config. Do you see any difference?

resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_percentage" {
    parent = "projects/myproject"
    description = "Description"
    display_name = "Displayname"

    triggers {
        schedule {
            recurrence_period_duration = "86400s"
        }
    }

    inspect_job {
        inspect_template_name = "fake"
        actions {
            save_findings {
                output_config {
                    table {
                        project_id = "project"
                        dataset_id = "dataset"
                    }
                }
            }
        }
        storage_config {
            big_query_options {
                table_reference {
                    project_id = "project"
                    dataset_id = "dataset"
                    table_id = "table_to_scan"
                }

                rows_limit_percent = 50
                sample_method = "RANDOM_START"
            }
            
            timespan_config {
                start_time = "2023-01-01T00:00:23Z"
                timestamp_field {
                    name = "timestamp"
                }
                enable_auto_population_of_timespan_config = true
            }
        }
    }

}

@ayferkrd
Copy link
Author

rows_limit_percent or rows_limit cannot be used with timespan_config:
Error: Error creating JobTrigger: googleapi: Error 400: Both TimespanConfig and RowsLimit may not be set at the same time.

│ with module.bq_dlp_trigger.google_data_loss_prevention_job_trigger.bigquery_row_limit_timestamp["test_table"],
│ on modules/bq_dlp_trigger/dlp_job_trigger.tf line 1, in resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_timestamp":
│ 1: resource "google_data_loss_prevention_job_trigger" "bigquery_row_limit_timestamp" {

@edwardmedia
Copy link
Contributor

@ayferkrd help me understand what the issue is. Below error comes from the api. Do you think this is wrong? The config works fine with me.

Error 400: Both TimespanConfig and RowsLimit may not be set at the same time

@ayferkrd
Copy link
Author

@edwardmedia above config has sample_method with rows_limit_percent with timespan_config which is not possible

image

@edwardmedia
Copy link
Contributor

@ayferkrd Interesting. I see what you mean. I am not able to hit the error with my config, which contradicts below api doc.
Wonder under what condition you can hit the error. Could you share your debug log?

b/275540727

https://cloud.google.com/dlp/docs/reference/rest/v2/InspectJobConfig#BigQueryOptions

@ayferkrd
Copy link
Author

@edwardmedia I think even it doesn't hit the error, I believe the behaviour is not correct with the above configuration because timestamp configuration cannot work with sampling, it should scan the whole table.

@edwardmedia
Copy link
Contributor

edwardmedia commented Mar 29, 2023

Create inspect job with both rows_limit and timespan_config:

  • expected: fails
  • actual: fails

Create inspect job with both rows_limit_percent and timespan_config:

  • expected: fails
  • actual: succeeds

@slevenick
Copy link
Collaborator

Looks like this is an upstream API issue. We don't control this in the Terraform provider, and it seems like the documentation is wrong.

According to the docs, the API should be rejecting the request when both of timespan_config & rows_limit_percent are set, but they are incorrectly accepting it. We may be able to add ConflictsWith to block this on the provider side, but we probably don't want to make that decision for the API.

I'm going to mark this as upstream because the core of the issue is an API error

@ayferkrd
Copy link
Author

@edwardmedia, the real problem here is there is a default value set for sample_method, which blocks me from using timespan_config. Other than that, yes, both terraform and API has some problems independent of each other related to job trigger implementation.

@edwardmedia edwardmedia removed their assignment Apr 15, 2023
@trodge
Copy link
Collaborator

trodge commented May 1, 2023

b/280337903

@soumya92
Copy link

soumya92 commented May 1, 2023

I'm going to mark this as upstream because the core of the issue is an API error

Not entirely correct. As pointed out in #14096 (comment), this is an issue caused by the default value for sample_method which conflicts with Timespan filtering. Either setting timespan filtering should clear the sampling value, or the default for sample method should be unset.

@tlines2016
Copy link

I'm going to mark this as upstream because the core of the issue is an API error

Not entirely correct. As pointed out in #14096 (comment), this is an issue caused by the default value for sample_method which conflicts with Timespan filtering. Either setting timespan filtering should clear the sampling value, or the default for sample method should be unset.

I've come across this same issue where the provider is defaultiing sample_method to top within bigquery_options. In the meantime I'm just setting the default value for "rows_limit_percent" to 99 if rows_limit isn't defined as a sort of workaround for the issue of the provider defaulting to sample_method = "top" if we don't pass a value.

Quick question though, is it the case where we should open a new Issue for this, where it provides all of the details specific to the default value for sample_method? Or should we stick with this Issue for now?

@patrickmoy
Copy link

patrickmoy commented Jun 20, 2024

I attempted to use a custom encoder to remove sample_method if timespan_config is set, which would be a workaround for this case, but my update tests won't pass as the test plan is not empty. I'm not sure when the default_value: :TOP gets applied, but ImportStateVerifyIgnore doesn't fix my test either — the plan still expects TOP.

I've done some additional testing though, and it looks like setting sample_method = "" will pass, without the drawback of the 99% row limit percent workaround. I'll update the documentation with an example using timespan_config with an empty string for sample_method and consider this fixed.

Copy link

github-actions bot commented Aug 1, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.