Skip to content

Commit

Permalink
docs(features): update & clean up Features page (datahub-project#5175)
Browse files Browse the repository at this point in the history
  • Loading branch information
maggiehays committed Jun 16, 2022
1 parent b4bf1d4 commit 63b673b
Show file tree
Hide file tree
Showing 7 changed files with 75 additions and 102 deletions.
177 changes: 75 additions & 102 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,61 +6,97 @@ title: "Features"

DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. This extensible metadata platform is built for developers to tame the complexity of their rapidly evolving data ecosystems, and for data practitioners to leverage the full value of data within their organization.

Here’s an overview of DataHub’s current functionality. Curious about what’s to come? Check out our [roadmap](https://feature-requests.datahubproject.io/roadmap).
Here’s an overview of DataHub’s current functionality. Check out our [roadmap](https://feature-requests.datahubproject.io/roadmap) to see what's to come.

## End-to-end Search and Discovery
---

## Search and Discovery

### **Search All Corners of Your Data Stack**

### Search for assets across databases, datalakes, BI platforms, ML feature stores, workflow orchestration, and more
DataHub's unified search experience surfaces results across across databases, datalakes, BI platforms, ML feature stores, orchestration tools, and more.

Here’s an example of searching for assets related to the term `health`: we see results spanning Looker dashboards, BigQuery datasets, and DataHub Tags & Users, and ultimately navigate to the “DataHub Health” Looker dashboard overview ([view in demo site](https://demo.datahubproject.io/dashboard/urn:li:dashboard:(looker,dashboards.11)/Documentation?is_lineage_mode=false))
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-search-all-corners-of-your-datastack.gif"/>
</p>

![](./imgs/feature-search-across-all-entities.gif)
### **Trace End-to-End Lineage**

### Easily understand the end-to-end journey of data by tracing lineage across platforms, datasets, pipelines, charts, and dashboards
Easily understand the end-to-end journey of data by tracing lineage across platforms, datasets, ETL/ELT pipelines, charts, and dashboards, and beyond.

Let’s dig into the dependency chain of the “DataHub Health” Looker dashboard. Using the lineage view, we can navigate all upstream dependencies of the Dashboard including Looker Charts, Snowflake and s3 Datasets, and Airflow Pipelines ([view in demo site](https://demo.datahubproject.io/dashboard/urn:li:dashboard:(looker,dashboards.11)/Documentation?is_lineage_mode=true))
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-end-to-end-lineage.png"/>
</p>

![](./imgs/feature-navigate-lineage-vis.gif)
### **Understand the Impact of Breaking Changes on Downstream Dependencies**

### Quickly gain context about related entities as you navigate the lineage graph
Proactively identify which entities may be impacted by a breaking change using Impact Analysis.

As you explore the relationships between entities, it’s easy to view documentation, usage stats, ownership, and more without leaving the lineage graph
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-impact-analysis.gif"/>
</p>

![](./imgs/feature-view-entitiy-details-via-lineage-vis.gif)
### **View Metadata 360 at a Glance**

### Gain confidence in the accuracy and relevance of datasets
Combine *technical* and *logical* metadata to provide a robust 360º view of your data entities.

DataHub provides dataset profiling and usage statistics for popular data warehousing platforms, making it easy for data practitioners to understand the shape of the data and how it has evolved over time. Query stats give context into how often (and by whom) the data is queried which can act as a strong signal of the trustworthiness of a dataset
Generate **Dataset Stats** to understand the shape & distribution of the data

![](./imgs/feature-table-usage-and-stats.gif)
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-dataset-stats.png"/>
</p>

## Robust Documentation and Tagging
Capture historical **Data Validation Outcomes** from tools like Great Expectations

### Capture and maintain institutional knowledge via API and/or the DataHub UI
<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/44Pr_55Qkik" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

DataHub makes it easy to update and maintain documentation as definitions and use cases evolve. In addition to managing documentation via GMS, DataHub offers rich documentation and support for external links via the UI.
Leverage DataHub's **Schema Version History** to track changes to the physical structure of data over time

![](./imgs/feature-rich-documentation.gif)
<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/IYaV7r5HjZY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

### Create and define new tags via API and/or the DataHub UI
---

## Modern Data Governance

Create and add tags to any type of entity within DataHub via the GraphQL API, or allow your end users to create and define new tags within the UI as use cases evolve over time
### **Govern in Real Time**

![](./imgs/feature-create-new-tag.gif)
[The Actions Framework](./actions/README.md) powers the following real-time use cases:

### Browse and search specific tags to fast-track discovery across entities
* **Notifications:** Generate organization-specific notifications when a change is made on DataHub. For example, send an email to the governance team when a "PII" tag is added to any data asset.
* **Workflow Integration:** Integrate DataHub into your organization's internal workflows. For example, create a Jira ticket when specific Tags or Terms are proposed on a Dataset.
* **Synchronization:** Syncing changes made in DataHub into a 3rd party system. For example, reflecting Tag additions in DataHub into Snowflake.
* **Auditing:** Audit who is making what changes on DataHub through time.

Seamlessly browse entities associated with a tag or filter search results for a specific tag to find the entities that matter most
<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/yeloymkK5ow" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

![](./imgs/feature-tag-browse.gif)
### **Manage Entity Ownership**
Quickly and easily assign entitiy ownership to users and/or user groups.

## Data Governance at your fingertips
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-entity-owner.png"/>
</p>

### Quickly assign asset ownership to users and/or user groups
### **Govern with Tags, Glossary Terms, and Domains**
Empower data owners to govern their data entities with:

![](./imgs/feature-add-owners.gif)
1. **Tags:** Informal, loosely controlled labels that serve as a tool for search & discovery. No formal, central management.
2. **Glossary Terms:** A controlled vocabulary with optional hierarchy, commonly used to describe core business concepts and/or measurements.
3. **Domains:** Curated, top-level folders or categories, commonly used in Data Mesh to organize entities by department (i.e., Finance, Marketing) and/or Data Products.

### Manage Fine-Grained Access Control with Policies
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-tags-terms-domains.png"/>
</p>

---
## DataHub Administration

### **Create Users, Groups, & Access Policies**

DataHub admins can create Policies to define who can perform what action against which resource(s). When you create a new Policy, you will be able to define the following:

Expand All @@ -69,77 +105,14 @@ DataHub admins can create Policies to define who can perform what action against
* **Privileges** - Choose the set of permissions, such as Edit Owners, Edit Documentation, Edit Links
* **Users and/or Groups** - Assign relevant Users and/or Groups; you can also assign the Policy to Resource Owners, regardless of which Group they belong to

![](./imgs/feature-create-policy.gif)

## Metadata quality & usage analytics

Gain a deeper understanding of the health of metadata within DataHub and how end-users are interacting with the platform. The Analytics view provides a snapshot of volume of assets and percentage with assigned ownership, weekly active users, and most common searches & actions ([view in demo site](https://demo.datahubproject.io/analytics)).

![](./imgs/feature-datahub-analytics.png)

## DataHub is a Platform for Developers

DataHub is an API- and stream-first platform, empowering developers to implement an instance tailored to their specific data stack. Our growing set of flexible integration models allow for push and pull metadata ingestion, as well as no-code metadata model extensions to quickly get up and running.

### Dataset Sources
| Source | Status |
|---|:---:|
| Athena | Supported |
| BigQuery | Supported |
| Delta Lake | Planned |
| Druid | Supported |
| Elasticsearch | Supported |
| Hive | Supported |
| Hudi | Planned |
| Iceberg | Planned |
| Kafka Metadata | Supported |
| MongoDB | Supported |
| Microsoft SQL Server | Supported |
| MySQL | Supported |
| Oracle | Supported |
| PostgreSQL | Supported |
| Redshift | Supported |
| s3 | Supported |
| Snowflake | Supported |
| Spark/Databricks | Partially Supported |
| Trino FKA Presto | Supported |

### BI Tools
| Source | Status |
|---|:---:|
| Business Glossary | Supported |
| Looker | Supported |
| Redash | Supported |
| Superset | Supported |
| Tableau | Planned |
| Grafana | Partially Supported |

### ETL / ELT
| Source | Status |
|---|:---:|
| dbt | Supported |
| Glue | Supported |

### Workflow Orchestration
| Source | Status |
|---|:---:|
| Airflow | Supported |
| Prefect | Planned |

### Data Observability
| Source | Status |
|---|:---:|
| Great Expectations | Planned |

### ML Platform
| Source | Status |
|---|:---:|
| Feast | Supported |
| Sagemaker | Supported |

### Identity Management
| Source | Status |
|---|:---:|
| Azure AD | Supported |
| LDAP | Supported |
| Okta | Supported |
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-manage-policies.png"/>
</p>

### **Ingest Metadata from the UI**

Create, configure, schedule, & execute batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-managed-ingestion-config.png"/>
</p>
Binary file removed docs/imgs/feature-add-owners.gif
Binary file not shown.
Binary file removed docs/imgs/feature-create-policy.gif
Binary file not shown.
Binary file removed docs/imgs/feature-navigate-lineage-vis.gif
Binary file not shown.
Binary file removed docs/imgs/feature-search-across-all-entities.gif
Binary file not shown.
Binary file removed docs/imgs/feature-table-usage-and-stats.gif
Binary file not shown.
Binary file added docs/imgs/feature-validation-timeseries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 63b673b

Please sign in to comment.