diff --git a/README.md b/README.md index b662d6e998..44e824e9fe 100644 --- a/README.md +++ b/README.md @@ -1,176 +1,222 @@ -
- - - +
+ + +
- -Marquez is an open source **metadata service** for the **collection**, **aggregation**, and **visualization** of a data ecosystem's metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez was released and open sourced by [WeWork](https://www.wework.com). - -## Badges - -[![CircleCI](https://circleci.com/gh/MarquezProject/marquez/tree/main.svg?style=shield)](https://circleci.com/gh/MarquezProject/marquez/tree/main) -[![codecov](https://codecov.io/gh/MarquezProject/marquez/branch/main/graph/badge.svg)](https://codecov.io/gh/MarquezProject/marquez/branch/main) -[![status](https://img.shields.io/badge/status-active-brightgreen.svg)](#status) -[![Slack](https://img.shields.io/badge/slack-chat-blue.svg)](https://join.slack.com/t/marquezproject/shared_invite/zt-29w4n8y45-Re3B1KTlZU5wO6X6JRzGmA) -[![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://raw.githubusercontent.com/MarquezProject/marquez/main/LICENSE) -[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md) -[![maven](https://img.shields.io/maven-central/v/io.github.marquezproject/marquez-api.svg)](https://search.maven.org/search?q=g:io.github.marquezproject) -[![docker](https://img.shields.io/badge/docker-hub-blue.svg?style=flat)](https://hub.docker.com/r/marquezproject/marquez) -[![Known Vulnerabilities](https://snyk.io/test/github/MarquezProject/marquez/badge.svg)](https://snyk.io/test/github/MarquezProject/marquez) -[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/5160/badge)](https://bestpractices.coreinfrastructure.org/projects/5160) - -## Status - -Marquez is an [LF AI & Data Foundation](https://lfaidata.foundation/projects/marquez) incubation project under active development, and we'd love your help! - -## Adopters - -Want to be added? Send a pull request our way! - -* [Astronomer](https://astronomer.io) -* [Datakin](https://datakin.com) -* [Northwestern Mutual](https://www.northwesternmutual.com) - -## Try it! - -[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/MarquezProject/marquez) - -## Quickstart - -Marquez provides a simple way to collect and view _dataset_, _job_, and _run_ metadata using [OpenLineage](https://openlineage.io). The easiest way to get up and running is with Docker. From the base of the Marquez repository, run: - -``` -$ ./docker/up.sh -``` - -> **Tip:** Use the `--build` flag to build images from source, and/or `--seed` to start Marquez with sample lineage metadata. For a more complete example using the sample metadata, please follow our [quickstart](https://marquezproject.github.io/marquez/quickstart.html) guide. - -> **Note:** Port 5000 is now reserved for MacOS. If running locally on MacOS, you can run `./docker/up.sh --api-port 9000` to configure the API to listen on port 9000 instead. Keep in mind that you will need to update the URLs below with the appropriate port number. - -**`WEB UI`** - -You can open [http://localhost:3000](http://localhost:3000) to begin exploring the Marquez Web UI. The UI enables you to discover dependencies between jobs and the datasets they produce and consume via the lineage graph, view run metadata of current and previous job runs, and much more! - -

- +

Marquez 是一种开源元数据服务,用于数据生态系统元数据的收集聚合可视化。它维护数据集的消费和生成方式的来源,提供作业运行时和数据集访问频率的全局可见性、数据集生命周期管理的集中化等等。Marquez 由WeWork发布并开源

+

徽章

+

循环CI +代码科夫 +地位 +松弛 +执照 +贡献者契约 +行家 +泊坞窗 +已知漏洞 +CII 最佳实践

+

地位

+

Marquez 是一个LF AI & Data Foundation孵化项目,正在积极开发中,我们希望得到您的帮助!

+

采用者

+

想要添加吗?以我们的方式发送拉取请求!

+ +

尝试一下!

+

在 Gitpod 中打开

+

快速开始

+

Marquez 提供了一种使用OpenLineage收集和查看数据集作业以及运行元数据的简单方法最简单的启动和运行方法是使用 Docker。从 Marquez 存储库的基础上,运行:

+
$ ./docker/up.sh
+
+ + + + +
+
+

提示:使用该--build标志从源构建图像,和/或--seed使用示例谱系元数据启动 Marquez。有关使用示例元数据的更完整示例,请遵循我们的快速入门指南。

+
+
+

注意:端口 5000 现在为 MacOS 保留。如果在 MacOS 上本地运行,您可以运行./docker/up.sh --api-port 9000将 API 配置为侦听端口 9000。请记住,您需要使用适当的端口号更新下面的 URL。

+
+

WEB UI

+

您可以打开http://localhost:3000开始探索 Marquez Web UI。UI 使您能够通过沿袭图发现作业及其生成和使用的数据集之间的依赖关系,查看当前和先前作业运行的运行元数据等等!

+

+ +

- -**`HTTP API`** - -The Marquez [HTTP API](https://marquezproject.github.io/marquez/openapi.html) listens on port `5000` for all calls and port `5001` for the admin interface. The admin interface exposes helpful endpoints like `/healthcheck` and `/metrics`. To verify the HTTP API server is running and listening on `localhost`, browse to [http://localhost:5001](http://localhost:5001). To begin collecting lineage metadata as OpenLineage events, use the [LineageAPI](https://marquezproject.github.io/marquez/openapi.html#tag/Lineage/paths/~1lineage/post) or an OpenLineage [integration](https://openlineage.io/docs/integrations/about). - -> **Note:** By default, the HTTP API does not require any form of authentication or authorization. - -**`GRAPHQL`** - -To explore metadata via graphql, browse to [http://localhost:5000/graphql-playground](http://localhost:5000/graphql-playground). The graphql endpoint is currently in _beta_ and is located at [http://localhost:5000/api/v1-beta/graphql](http://localhost:5000/api/v1-beta/graphql). - -## Documentation - -We invite everyone to help us improve and keep documentation up to date. Documentation is maintained in this repository and can be found under [`docs/`](https://github.com/MarquezProject/marquez/tree/main/docs). - -> **Note:** To begin collecting metadata with Marquez, follow our [quickstart](https://marquezproject.github.io/marquez/quickstart.html) guide. Below you will find the steps to get up and running from source. - -## Versions and OpenLineage Compatibility - -Versions of Marquez are compatible with OpenLineage unless noted otherwise. We ensure backward compatibility with a newer version of Marquez by recording events with an older OpenLineage specification version. **We strongly recommend understanding how the OpenLineage specification is** [versioned](https://github.com/OpenLineage/OpenLineage/blob/main/spec/Versioning.md) **and published**. - -| **Marquez** | **OpenLineage** | **Status** | -|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------|---------------| -| [`UNRELEASED`](https://github.com/MarquezProject/marquez/blob/main/CHANGELOG.md#unreleased) | [`1-0-5`](https://openlineage.io/spec/1-0-5/OpenLineage.json) | `CURRENT` | -| [`0.43.0`](https://github.com/MarquezProject/marquez/blob/0.43.0/CHANGELOG.md#0430---2023-12-15) | [`1-0-5`](https://openlineage.io/spec/1-0-5/OpenLineage.json) | `RECOMMENDED` | -| [`0.42.0`](https://github.com/MarquezProject/marquez/blob/0.42.0/CHANGELOG.md#0420---2023-10-17) | [`1-0-5`](https://openlineage.io/spec/1-0-0/OpenLineage.json) | `MAINTENANCE` | - -> **Note:** The [`openlineage-python`](https://pypi.org/project/openlineage-python) and [`openlineage-java`](https://central.sonatype.com/artifact/io.openlineage/openlineage-java) libraries will a higher version than the OpenLineage [specification](https://github.com/OpenLineage/OpenLineage/tree/main/spec) as they have different version requirements. - -We currently maintain three categories of compatibility: `CURRENT`, `RECOMMENDED`, and `MAINTENANCE`. When a new version of Marquez is released, it's marked as `RECOMMENDED`, while the previous version enters `MAINTENANCE` mode (which gets bug fixes whenever possible). The unreleased version of Marquez is marked `CURRENT` and does not come with any guarantees, but is assumed to remain compatible with OpenLineage, although surprises happen and there maybe rare exceptions. - -## Modules - -Marquez uses a _multi_-project structure and contains the following modules: - -* [`api`](https://github.com/MarquezProject/marquez/tree/main/api): core API used to collect metadata -* [`web`](https://github.com/MarquezProject/marquez/tree/main/web): web UI used to view metadata -* [`clients`](https://github.com/MarquezProject/marquez/tree/main/clients): clients that implement the HTTP [API](https://marquezproject.github.io/marquez/openapi.html) -* [`chart`](https://github.com/MarquezProject/marquez/tree/main/chart): helm chart - -> **Note:** The `integrations` module was removed in [`0.21.0`](https://github.com/MarquezProject/marquez/blob/main/CHANGELOG.md#removed), so please use an OpenLineage [integration](https://openlineage.io/integration) to collect lineage events easily. - -## Requirements - -* [Java 17](https://adoptium.net) -* [PostgreSQL 14](https://www.postgresql.org/download) - -> **Note:** To connect to your running PostgreSQL instance, you will need the standard [`psql`](https://www.postgresql.org/docs/9.6/app-psql.html) tool. - -## Building - -To build the entire project run: - -```bash -./gradlew build -``` - -The executable can be found under `api/build/libs/` - -## Configuration - -To run Marquez, you will have to define `marquez.yml`. The configuration file is passed to the application and used to specify your database connection. The configuration file creation steps are outlined below. - -### Step 1: Create Database - -When creating your database using [`createdb`](https://www.postgresql.org/docs/12/app-createdb.html), we recommend calling it `marquez`: - -```bash -$ createdb marquez -``` - -### Step 2: Create `marquez.yml` - -With your database created, you can now copy [`marquez.example.yml`](https://github.com/MarquezProject/marquez/blob/main/marquez.example.yml): - -``` -$ cp marquez.example.yml marquez.yml -``` - -You will then need to set the following environment variables (we recommend adding them to your `.bashrc`): `POSTGRES_DB`, `POSTGRES_USER`, and `POSTGRES_PASSWORD`. The environment variables override the equivalent option in the configuration file. - -By default, Marquez uses the following ports: - -* TCP port `8080` is available for the HTTP API server. -* TCP port `8081` is available for the admin interface. - -> **Note:** All of the configuration settings in `marquez.yml` can be specified either in the configuration file or in an environment variable. - -## Running the [HTTP API](https://github.com/MarquezProject/marquez/blob/main/src/main/java/marquez/MarquezApp.java) Server - -```bash -$ ./gradlew :api:runShadow -``` -Marquez listens on port `8080` for all API calls and port `8081` for the admin interface. To verify the HTTP API server is running and listening on `localhost`, browse to [http://localhost:8081](http://localhost:8081). We encourage you to familiarize yourself with the [data model](https://marquezproject.github.io/marquez/#data-model) and [APIs](https://marquezproject.github.io/marquez/openapi.html) of Marquez. To run the web UI, please follow the steps outlined [here](https://github.com/MarquezProject/marquez/tree/main/web#development). - -> **Note:** By default, the HTTP API does not require any form of authentication or authorization. - -## Related Projects - -* [`OpenLineage`](https://github.com/OpenLineage/OpenLineage): an open standard for metadata and lineage collection - -## Getting Involved - -* Website: https://marquezproject.ai -* Source: https://github.com/MarquezProject/marquez -* Chat: [MarquezProject Slack](https://join.slack.com/t/marquezproject/shared_invite/zt-29w4n8y45-Re3B1KTlZU5wO6X6JRzGmA) -* Twitter: [@MarquezProject](https://twitter.com/MarquezProject) - -## Contributing - -See [CONTRIBUTING.md](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md) for more details about how to contribute. - -## Reporting a Vulnerability - -If you discover a vulnerability in the project, please open an issue and attach the "security" label. - ----- -SPDX-License-Identifier: Apache-2.0 -Copyright 2018-2023 contributors to the Marquez project. +

HTTP API

+

Marquez HTTP API侦听5000所有调用的端口和5001管理界面的端口。管理界面公开了有用的端点,例如/healthcheck/metrics要验证 HTTP API 服务器是否正在运行并侦听localhost,请浏览到http://localhost:5001要开始将沿袭元数据收集为 OpenLineage 事件,请使用LineageAPI或 OpenLineage集成

+
+

注意:默认情况下,HTTP API 不需要任何形式的身份验证或授权。

+
+

GRAPHQL

+

要通过 graphql 探索元数据,请浏览到http://localhost:5000/graphql-playgroundgraphql 端点目前处于测试阶段,位于http://localhost:5000/api/v1-beta/graphql

+

文档

+

我们邀请每个人帮助我们改进并保持文档最新。文档在此存储库中维护,可以在 下找到docs/

+
+

注意:要开始使用 Marquez 收集元数据,请遵循我们的快速入门指南。您将在下面找到从源代码启动和运行的步骤。

+
+

版本和 OpenLineage 兼容性

+

除非另有说明,Marquez 的版本与 OpenLineage 兼容。我们通过使用较旧的 OpenLineage 规范版本记录事件来确保与较新版本的 Marquez 向后兼容。我们强烈建议您了解 OpenLineage 规范的 版本控制 和发布方式。

+ + + + + + + + + + + + + + + + + + + + + + + + + +
马尔克斯开放血统地位
UNRELEASED1-0-5CURRENT
0.43.01-0-5RECOMMENDED
0.42.01-0-5MAINTENANCE
+
+

注意:和库openlineage-pythonopenlineage-java版本将高于 OpenLineage规范,因为它们有不同的版本要求。

+
+

我们目前维护三类兼容性:CURRENTRECOMMENDEDMAINTENANCE当 Marquez 的新版本发布时,它被标记为RECOMMENDED,而以前的版本进入MAINTENANCE模式(尽可能修复错误)。Marquez 的未发布版本已被标记CURRENT,并且不提供任何保证,但假定与 OpenLineage 保持兼容,尽管意外发生并且可能存在罕见的例外。

+

模块

+

Marquez 采用项目结构并包含以下模块:

+ +
+

注意:integrations模块已在 中删除0.21.0,因此请使用 OpenLineage集成来轻松收集沿袭事件。

+
+

要求

+ +
+

注意:要连接到正在运行的 PostgreSQL 实例,您将需要标准psql工具。

+
+

建筑

+

要构建整个项目,请运行:

+
./gradlew build
+ + + + +
+

可执行文件可以在下面找到api/build/libs/

+

配置

+

要运行 Marquez,您必须定义marquez.yml. 配置文件将传递给应用程序并用于指定您的数据库连接。配置文件创建步骤概述如下。

+

第1步:创建数据库

+

使用创建数据库时createdb,我们建议调用它marquez

+
$ createdb marquez
+ + + + +
+

第 2 步:创建marquez.yml

+

创建数据库后,您现在可以复制marquez.example.yml

+
$ cp marquez.example.yml marquez.yml
+
+ + + + +
+

然后,您需要设置以下环境变量(我们建议将它们添加到您的.bashrc):POSTGRES_DBPOSTGRES_USERPOSTGRES_PASSWORD环境变量会覆盖配置文件中的等效选项。

+

默认情况下,Marquez 使用以下端口:

+
    +
  • TCP 端口8080可用于 HTTP API 服务器。
  • +
  • TCP 端口8081可用于管理界面。
  • +
+
+

注意:中的所有配置设置都marquez.yml可以在配置文件或环境变量中指定。

+
+

运行HTTP API服务器

+
$ ./gradlew :api:runShadow
+ + + + +
+

Marquez 监听8080所有 API 调用的端口和8081管理界面的端口。要验证 HTTP API 服务器是否正在运行并侦听localhost,请浏览到http://localhost:8081我们鼓励您熟悉Marquez 的数据模型API要运行 Web UI,请按照此处概述的步骤操作

+
+

注意:默认情况下,HTTP API 不需要任何形式的身份验证或授权。

+
+

相关项目

+
    +
  • OpenLineage:元数据和谱系收集的开放标准
  • +
+

卷入

+ +

贡献

+

有关如何贡献的更多详细信息,请参阅CONTRIBUTING.md 。

+

报告漏洞

+

如果您发现项目中存在漏洞,请提出问题并附上“安全”标签。

+
+

SPDX-许可证-标识符:Apache-2.0 版权所有 2018-2023 Marquez 项目贡献者。

+