Skip to content

Commit

Permalink
Merge branch 'datahub-project:master' into bigquery-profiling
Browse files Browse the repository at this point in the history
  • Loading branch information
MugdhaHardikar-GSLab authored Jul 6, 2022
2 parents b307981 + 4b515e0 commit 32c3446
Show file tree
Hide file tree
Showing 26 changed files with 7,038 additions and 103 deletions.
1 change: 1 addition & 0 deletions datahub-web-react/src/app/preview/DefaultPreviewCard.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ const TagContainer = styled.div`
display: inline-flex;
margin-left: 0px;
margin-top: 3px;
flex-wrap: wrap;
`;

const TagSeparator = styled.div`
Expand Down
6 changes: 6 additions & 0 deletions datahub-web-react/src/images/logo-salesforce.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
## Next

### Breaking Changes

- The `should_overwrite` flag in `csv-enricher` has been replaced with `write_semantics` to match the format used for other sources. See the [documentation](https://datahubproject.io/docs/generated/ingestion/sources/csv/) for more details
### Potential Downtime

### Deprecations
Expand Down
30 changes: 30 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### Prerequisites

In order to ingest metadata from Salesforce, you will need:

- Salesforce username, password, [security token](https://developer.Salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_concepts_security.htm) OR
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)

## Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance.
Python library [simple-salesforce](https://pypi.org/project/simple-salesforce/) is used for authenticating and calling [Salesforce REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.htm) to retrive details from Salesforce instance.

### REST API Resources used in this integration
- [Versions](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_versions.htm)
- [Tooling API Query](https://developer.salesforce.com/docs/atlas.en-us.api_tooling.meta/api_tooling/intro_rest_resources.htm) on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- [Record Count](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm)

### Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

| Source Concept | DataHub Concept | Notes |
| -- | -- | -- |
| `Salesforce` | [Data Platform](../../metamodel/entities/dataPlatform.md) | |
|Standard Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Standard Object" |
|Custom Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Custom Object" |

### Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by [Salesforce RecordCount REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm).
- This integration does not support ingesting Salesforce [External Objects](https://developer.Salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_external_objects.htm)
25 changes: 25 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce_recipe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"

object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"

sink:
type: "datahub-rest"
config:
server: "https://localhost:8080"
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
resource,subresource,glossary_terms,tags,owners
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",,[urn:li:glossaryTerm:SavingAccount],[urn:li:tag:Legacy],[urn:li:corpuser:datahub|urn:li:corpuser:jdoe]
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",field_foo,[urn:li:glossaryTerm:AccountBalance],,
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",field_bar,,[urn:li:tag:Legacy],
resource,subresource,glossary_terms,tags,owners,ownership_type,description
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",,[urn:li:glossaryTerm:CustomerAccount],[urn:li:tag:Legacy],[urn:li:corpuser:datahub|urn:li:corpuser:jdoe],TECHNICAL_OWNER,new description
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",field_foo,[urn:li:glossaryTerm:AccountBalance],,,,field_foo!
"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",field_bar,,[urn:li:tag:Legacy],,,field_bar?
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
source:
type: "csv-enricher"
config:
filename: "/Users/adityaradhakrishnan/code/datahub-fork/metadata-ingestion/examples/demo_data/csv_enricher_demo_data.csv"
should_overwrite: false
filename: "./examples/demo_data/csv_enricher_demo_data.csv"
write_semantics: "PATCH"
delimiter: ","
array_delimiter: "|"

Expand Down
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ def get_long_description():
"redshift": sql_common | redshift_common,
"redshift-usage": sql_common | usage_common | redshift_common,
"sagemaker": aws_common,
"salesforce":{"simple-salesforce"},
"snowflake": snowflake_common,
"snowflake-usage": snowflake_common
| usage_common
Expand Down Expand Up @@ -366,6 +367,7 @@ def get_long_description():
"starburst-trino-usage",
"powerbi",
"vertica",
"salesforce"
# airflow is added below
]
for dependency in plugins[plugin]
Expand Down Expand Up @@ -509,6 +511,7 @@ def get_long_description():
"vertica = datahub.ingestion.source.sql.vertica:VerticaSource",
"presto-on-hive = datahub.ingestion.source.sql.presto_on_hive:PrestoOnHiveSource",
"pulsar = datahub.ingestion.source.pulsar:PulsarSource",
"salesforce = datahub.ingestion.source.salesforce:SalesforceSource",
],
"datahub.ingestion.sink.plugins": [
"file = datahub.ingestion.sink.file:FileSink",
Expand Down
Loading

0 comments on commit 32c3446

Please sign in to comment.