Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): clarify s3/s3a requirements and platform defaults #4263

Merged
merged 4 commits into from
Mar 2, 2022

Conversation

kevinhu
Copy link
Contributor

@kevinhu kevinhu commented Feb 26, 2022

Adds a few sentences to the docs clarifying the need to grant S3A permissions when running profiling on files in AWS S3. Also sets the platform to s3 or file depending on the base_path if platform is unspecified.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Feb 26, 2022

Unit Test Results (build & test)

  71 files  ±0    71 suites  ±0   20m 28s ⏱️ -12s
618 tests ±0  559 ✔️ ±0  59 💤 ±0  0 ±0 

Results for commit 963f1ef. ± Comparison against base commit bcabff8.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 26, 2022

Unit Test Results (metadata ingestion)

       5 files         5 suites   47m 38s ⏱️
   342 tests    342 ✔️   0 💤 0
1 557 runs  1 526 ✔️ 31 💤 0

Results for commit 963f1ef.

♻️ This comment has been updated with latest results.

def read_file_spark(self, file: str, is_aws: bool) -> Optional[DataFrame]:

if is_aws:
file = f"s3a:https://{file}"
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s3:https://{file}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is the "a" intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a typo—Spark uses the s3a protocol rather than s3 (see https://spark.apache.org/docs/latest/cloud-integration.html)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay got it.. Thanks !

@@ -40,6 +41,15 @@ def ensure_profiling_pattern_is_passed_to_profiling(
profiling.allow_deny_patterns = values["profile_patterns"]
return values

@pydantic.validator("platform", always=True)
def validate_platform(cls, value: str, values: Dict[str, Any]) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is "values" coming from here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the rest of the config values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values are the previously-validated fields here since this is a Pydantic validator — see https://pydantic-docs.helpmanual.io/usage/validators/

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, otherwise LGTM

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit b2b8826 into datahub-project:master Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants