Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): refactor source and sink docs #3031

Merged
merged 40 commits into from
Aug 8, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
0b2f343
Begin reorg
kevinhu Jul 27, 2021
0916b75
Add links
kevinhu Jul 27, 2021
2bb1d79
Fix link
kevinhu Jul 27, 2021
487a2b6
Fix glue link
kevinhu Jul 27, 2021
a24dc59
Add module installs to each page
kevinhu Jul 27, 2021
5c6a19a
Consistency
kevinhu Jul 27, 2021
2382c30
Standardize sqlalchemy pattern
kevinhu Jul 27, 2021
34fbccf
Add missing sql options
kevinhu Jul 27, 2021
9808735
More consistent recipes
kevinhu Jul 27, 2021
9af3cab
Finish consistency checks for recipes
kevinhu Jul 27, 2021
9dc365f
As above
kevinhu Jul 28, 2021
9afa393
Typo fixes
kevinhu Jul 28, 2021
c6388cb
More typo fixes
kevinhu Jul 28, 2021
8588cb9
More consistency fixes
kevinhu Jul 28, 2021
63691dd
Fix broken links
kevinhu Jul 28, 2021
f186b49
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu Jul 28, 2021
410b9b8
Merge
kevinhu Aug 2, 2021
59623e4
Merge
kevinhu Aug 2, 2021
eef2a62
Note on allow/deny
kevinhu Aug 2, 2021
bee872f
Add questions section
kevinhu Aug 2, 2021
124c0a3
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu Aug 2, 2021
6ffd8a1
Fix inconsistencies
kevinhu Aug 3, 2021
ba3cb36
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu Aug 3, 2021
8a4de6d
Begin separation of quickstart and config details
kevinhu Aug 3, 2021
8bf27a5
Write generic sqlalchemy options
kevinhu Aug 3, 2021
3dbb736
Up to looker
kevinhu Aug 3, 2021
186235f
Add all config vars
kevinhu Aug 4, 2021
35ecc45
Add source config docs
kevinhu Aug 4, 2021
73a42fd
Clean up quickstart configs
kevinhu Aug 4, 2021
b1bf7e7
Update usage docs
kevinhu Aug 4, 2021
5933f1f
Formatting
kevinhu Aug 4, 2021
bbbe612
Revise capabilities
kevinhu Aug 4, 2021
30f9e6f
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu Aug 4, 2021
9cf1acb
Merge
kevinhu Aug 6, 2021
aa608b6
PR fixes
kevinhu Aug 6, 2021
f429324
Add link back to main readme
kevinhu Aug 6, 2021
5fbac7b
Add link back to recipe section
kevinhu Aug 6, 2021
387137f
Add sink config placeholder
kevinhu Aug 6, 2021
34d6c57
Categories
kevinhu Aug 6, 2021
625baa0
Remove sink compatibility
kevinhu Aug 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
More consistent recipes
  • Loading branch information
kevinhu committed Jul 27, 2021
commit 9808735a4d31827edc1d606d0c9403b0f2ab153b
4 changes: 3 additions & 1 deletion metadata-ingestion/source_docs/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,15 @@ source:
options:
# See https://googleapis.dev/python/logging/latest/client.html for details.
credentials: ~ # optional - see docs
env: PROD

# Common usage stats options
bucket_duration: "DAY"
kevinhu marked this conversation as resolved.
Show resolved Hide resolved
start_time: ~ # defaults to the last full day in UTC (or hour)
end_time: ~ # defaults to the last full day in UTC (or hour)

top_n_queries: 10 # number of queries to save for each table

env: PROD
```

:::note
Expand Down
10 changes: 10 additions & 0 deletions metadata-ingestion/source_docs/dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,21 @@ This plugin pulls metadata from dbt's artifact files:
source:
type: "dbt"
config:
# https://docs.getdbt.com/reference/artifacts/manifest-json
manifest_path: "./path/dbt/manifest_file.json"
# https://docs.getdbt.com/reference/artifacts/catalog-json
catalog_path: "./path/dbt/catalog_file.json"
# https://docs.getdbt.com/reference/artifacts/sources-json
sources_path: "./path/dbt/sources_file.json" # (optional, used for freshness checks)

# the platform that dbt is loading onto
target_platform: "postgres" # optional, eg "postgres", "snowflake", etc.

# whether to load schemas of datasets from dbt
# (otherwise, only includes a simple list of tables)
load_schemas: True or False

# regex pattern to allow/deny nodes
node_type_pattern: # optional
deny:
- ^test.*
Expand Down
17 changes: 15 additions & 2 deletions metadata-ingestion/source_docs/glue.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,20 @@ source:

extract_transforms: True # whether to ingest Glue jobs, defaults to True

# Filtering patterns for databases and tables to scan
database_pattern: # Optional, to filter databases scanned, same as schema_pattern above.
# Regex filters for databases to scan
database_pattern:
deny:
# Note that the deny patterns take precedence over the allow patterns.
- "bad_database"
- "junk_database"
# Can also be a regular expression
- "(old|used|deprecated)_database"
allow:
- "good_database"
- "excellent_database"
table_pattern: # Optional, to filter tables scanned, same as table_pattern above.
deny:
# ...
allow:
# ...
```
1 change: 0 additions & 1 deletion metadata-ingestion/source_docs/hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ source:
allow:
# ...

include_views: True # whether to include views, defaults to True
include_tables: True # whether to include views, defaults to True
```

Expand Down
19 changes: 17 additions & 2 deletions metadata-ingestion/source_docs/kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,24 @@ source:
config:
connection:
bootstrap: "broker:9092"
consumer_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.DeserializingConsumer
schema_registry_url: https://localhost:8081
schema_registry_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient

# Extra schema registry config.
# These options will be passed into Kafka's SchemaRegistryClient.
# See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html?#schemaregistryclient
schema_registry_config: {}

# Extra consumer config.
# These options will be passed into Kafka's DeserializingConsumer.
# See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#deserializingconsumer
# and https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md.
consumer_config: {}

# Extra producer config.
# These options will be passed into Kafka's SerializingProducer.
# See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#serializingproducer
# and https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md.
producer_config: {}
```

The options in the consumer config and schema registry config are passed to the Kafka DeserializingConsumer and SchemaRegistryClient respectively.
Expand Down