Skip to content

Latest commit

 

History

History
675 lines (434 loc) · 30.4 KB

CHANGELOG.md

File metadata and controls

675 lines (434 loc) · 30.4 KB

Changelog

0.22.0 - 2022-05-16

Added

  • Add support for LifecycleStateChangeFacet with an ability to softly delete datasets #1847@pawel-big-lebowski
  • Enable pod specific annotations in Marquez Helm Chart via marquez.podAnnotations #1945 @wslulciuc
  • Add support for job renaming/redirection via symlink #1947 @collado-mike
  • Add Created by view for dataset versions along with SQL syntax highlighting in web UI #1929 @phixMe
  • Add operationId to openapi spec #1978 @phixMe

Changed

Fixed

0.21.0 - 2022-03-03

Added

  • Add MDC to the LoggingMdcFilter to include API method, path, and request ID @fm100
  • Add Postgres sub-chart to Helm deployment for easier installation option @KevinMellott91
  • GitHub Action workflow to validate changes to Helm chart @KevinMellott91

Changed

  • Upgrade from Java11 to Java17 @ucg8j
  • Switch JDK image from alpine to temurin enabling Marquez to run on multiple CPU architectures @ucg8j

Fixed

  • Error when running Marquez on Apple M1 @ucg8j

Removed

  • The /api/v1-beta/lineage endpoint @wslulciuc

  • The marquez-airflow lib. has been removed, Please use the openlineage-airflow library instead. To migrate to using openlineage-airflow, make the following changes @wslulciuc:

    # Update the import in your DAG definitions
    -from marquez_airflow import DAG
    +from openlineage.airflow import DAG
    # Update the following environment variables in your Airflow instance
    -MARQUEZ_URL
    +OPENLINEAGE_URL
    -MARQUEZ_NAMESPACE
    +OPENLINEAGE_NAMESPACE
  • The marquez-spark lib. has been removed. Please use the openlineage-spark library instead. To migrate to using openlineage-spark, make the following changes @wslulciuc:

    SparkSession.builder()
    - .config("spark.jars.packages", "io.github.marquezproject:marquez-spark:0.20.+")
    + .config("spark.jars.packages", "io.openlineage:openlineage-spark:0.2.+")
    - .config("spark.extraListeners", "marquez.spark.agent.SparkListener")
    + .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener")
      .config("spark.openlineage.host", "https://api.demo.datakin.com")
      .config("spark.openlineage.apiKey", "your datakin api key")
      .config("spark.openlineage.namespace", "<NAMESPACE_NAME>")
    .getOrCreate()

0.20.0 - 2021-12-13

Added

Changed

  • Clarify docs on using OpenLineage for metadata collection @fm100
  • Upgrade to gradle 7.x @wslulciuc
  • Use eclipse-temurin for Marquez API base docker image @fm100

Deprecated

  • The following endpoints have been deprecated and are scheduled to be removed in 0.25.0. Please use the /lineage endpoint when collecting source, dataset, and job metadata @wslulciuc:
    • /sources endpoint to collect source metadata
    • /datasets endpoint to collect dataset metadata
    • /jobs endpoint to collect job metadata

Fixed

  • Validation of OpenLineage events on write @collado-mike
  • Increase name column size for tables namespaces and sources @mmeasic

Security

0.19.1 - 2021-11-05

Fixed

  • URI and URL DB mappper should handle empty string as null @OleksandrDvornik
  • Fix NodeId parsing when dataset name contains struct<> @fm100
  • Add encoding for dataset names in URL construction @collado-mike

0.19.0 - 2021-10-21

Added

  • Add simple python client example @wslulciuc
  • Display dataset versions in web UI 🎉 @phixMe
  • Display runs and run facets in web UI 🎉 @phixMe
  • Facet formatting and highlighting as Json in web UI @phixMe
  • Add option for docker/up.sh to run in the background @rossturk
  • Return totalCount in lists of jobs and datatsets @phixMe

Changed

  • Change type column in dataset_fields table to TEXT @wslulciuc
  • Set ZonedDateTime parsing to support optional offsets and default to server timezone @collado-mike

Fixed

  • Job.location and Source.connectionUrl should be in URI format on write @OleksandrDvornik
  • Z-Index fix for nodes and edges in lineage graph @phixMe
  • Format of the index files for web UI @phixMe
  • Fix OpenLineage API to return correct response codes for exceptions propagated from async calls @collado-mike
  • Stopped overwriting nominal time information with nulls @mobuchowski

Removed

  • WriteOnly clients for java and python. Before OpenLineage, we added a WriteOnly implementation to our clients to emit calls to a backend. A backend enabled collecting raw HTTP requests to an HTTP endpoint, console, or file. This was our way of capturing lineage events that could then be used to automatically create resources on the Marquez backend. We soon worked on a standard that eventually became OpenLineage. That is, OpenLineage removed the need to make individual calls to create a namespace, a source, a datasets, etc, but rather accept an event with metadata that the backend could process. @wslulciuc

0.18.0 - 2021-09-14

Added

  • New Add Search API 🎉 @wslulciuc
  • Add .env.example to override variables defined in docker-compose files @wslulciuc

Changed

Fixed

Removed

  • Drop job_versions_io_mapping_inputs and job_versions_io_mapping_outputs tables @OleksandrDvornik

0.17.0 - 2021-08-20

Changed

  • Update Lineage runs query to improve performance, added tests @collado-mike
  • Add POST /api/v1/lineage endpoint to docs and deprecate run endpoints @wslulciuc
  • Drop FieldType enum @wslulciuc

Deprecated

Removed

0.16.1 - 2021-07-13

Fixed

  • dbt packages should look for namespace packages @mobuchowski
  • Add common integration dependency to dbt plugins @mobuchowski
  • DatasetVersionDao queries missing input and output facets @dominiquetipton
  • (De)serialization issue for Run and JobData models @collado-mike
  • Prefix spark openlineage.* configuration parameters with spark.* @collado-mike
  • Parse multi-statement sql in class SqlParser used in Airflow integration @wslulciuc
  • URL-encode namespace on calls to API backend @phixMe

0.16.0 - 2021-07-01

Added

Changed

Fixed

0.15.2 - 2021-06-17

Added

  • Add endpoint to create tags @hanbei

Fixed

  • Fixed build & release process for python marquez-integration-common package @collado-mike
  • Fixed snowflake and bigquery errors when connector libraries not loaded @collado-mike
  • Fixed Openlineage API does not set Dataset current_version_uuid #1361 @collado-mike

0.15.1 - 2021-06-11

Added

  • Factored out common functionality in Python airflow integration @mobuchowski
  • Added Airflow task run macro to expose task run id @collado-mike

Changed

  • Refactored ValuesAverageExpectationParser to ValuesSumExpectationParser and ValuesCountExpectationParser @collado-mike
  • Updated SparkListener to extend Spark's SparkListener abstract class @collado-mike

Fixed

  • Use current project version in spark openlineage client @mobuchowski
  • Rewrote LineageDao queries and LineageService for performance @collado-mike
  • Updated lineage query to include new jobs that have no job version yet @collado-mike

0.15.0 - 2021-05-24

Added

Changed

  • Augment tutorial instructions & screenshots for Airflow example @rossturk
  • Rewrite correlated subqueries when querying the lineage_events table @collado-mike

Fixed

0.14.2 - 2021-05-06

Changed

  • Unpin requests dep in marquez-airflow integration @wslulciuc
  • Unpin attrs dep in marquez-airflow integration @wslulciuc

0.14.1 - 2021-05-05

Changed

  • Updated dataset lineage query to find most recent job that wrote to it @collado-mike
  • Pin http-proxy-middleware to 0.20.0 @wslulciuc

0.14.0 - 2021-05-03

Added

Changed

0.13.1 - 2021-04-01

Changed

  • Remove unused implementation of SQL parser in marquez-airflow @mobuchowski

Fixed

  • Add inputs and outputs to lineage graph @henneberger
  • Updated NodeId regex to support URIs with scheme and ports @collado-mike

0.13.0 - 2021-03-30

Added

  • Secret support for helm chart @KevinMellott91
  • New seed cmd to populate marquez database with source, dataset, and job metadata allowing users to try out features of Marquez (data lineage, view job run history, etc) 🎉
  • Docs on applying db migrations manually
  • New Lineage API to support data lineage queries 🎉
  • Support for logging errors via sentry
  • New Airflow example with Marquez 🎉

Changed

  • Update OpenLinageDao to stop converting URI structures to contain underscores instead of colons and slashes @collado-mike
  • Bump testcontainers dependency to v1.15.2 @ ShakirzyanovArsen
  • Register output datasets for a run lazily @henneberger
  • Refactor spark plan traversal to find input/output datasets from datasources @collado-mike
  • Web UI project settings and default marquez port @phixMe
  • Associate dataset inputs on run start @henneberger

Fixed

0.12.2 - 2021-03-16

Changed

  • Use alpine image for marquez reducing image size by +50% @KevinMellott91
  • Use alpine image for marquez-web reducing image size by +50% @KevinMellott91

Fixed

  • Ensure marquez.DAG is (de)serializable

0.12.0 - 2021-02-08

Added

Changed

  • Drop Source.type enum (now a string type)

Fixed

  • Replace jdbi.getHandle() with jdbi.withHandle() to free DB connections from pool @henneberger
  • Fix RunListener when registering outside of the MarquezContext builder @henneberger

0.11.3 - 2020-11-02

Added

  • Add support for external ID on run creation @julienledem
  • Throw RunAlreadyExistsException on run ID already exists
  • Add BigQuery, Pulsar, and Oracle source types @sreev
  • Add run ID support in job meta; the optional run ID will be used to link a newly created job version to an existing job run, while supporting updating the run state and avoiding having to create another run

Fixed

0.11.2 - 2020-08-21

Changed

  • Always migrate db schema on app start in development config
  • Update default db username / password
  • Use marquez.dev.yml in on docker compose up

0.11.1 - 2020-08-19

Added

  • Use shorten name for namespaces in version IDs

  • Add namespace to Dataset and Job models

  • Add ability to deserialize int type to columns @phixMe

  • Add SqlLogger for SQL profiling

  • Add DatasetVersionId.asDatasetId() and JobVersionId.asJobId()

  • Add DatasetService.getBy(DatasetVersionId): Dataset

  • Add JobService.getBy(JobVersionId): Job

  • Allow for run transition override via at=<TIMESTAMP>, where TIMESTMAP is an ISO 8601 timestamp representing the date/time of the state transition. For example:

    POST /jobs/runs/{id}/start?at=<TIMESTAMP>
    

Changed

  • config.yml -> marquez.yml

Fixed

  • Fix dataset version column mappings

0.11.0 - 2020-05-27

Added

Changed

  • Job inputs / outputs are defined as DatasetId
  • Bump to JDK 11

Removed

  • Use of API models under marquez.api.models pkg

Fixed

  • API docs example to show correct SQL key in job context @frankcash

0.10.4 - 2020-01-17

Fixed

  • Fix RunState.isComplete()

0.10.3 - 2020-01-17

Added

  • Add new logo
  • Add JobResource.locationFor()

Fixed

  • Fix dataset field versioning
  • Fix list job runs

0.10.2 - 2020-01-16

Added

  • Added Location header to run creation @nkijak

0.10.1 - 2020-01-11

Changed

  • Rename datasets.last_modified

0.10.0 - 2020-01-08

Changed

  • Rename table dataset_tag_mapping

0.9.2 - 2020-01-07

Added

  • Add Flyway.baselineOnMigrate flag

0.9.1 - 2020-01-06

Added

  • Add redshift data types
  • Add links to dropwizard overrides in config.yml

0.9.0 - 2020-01-05

Added

  • Validate runID when linked to dataset change
  • Add Utils.toUuid()
  • Add tests for class TagDao
  • Add default tags to config
  • Add tagging support for dataset fields
  • Add docker/config.dev.yml
  • Add flyway config support

Changed

  • Replace deprecated App.onFatalError()

Fixed

  • Fix error on tag exists
  • Fix malformed sql in RunDao.findAll()

0.8.0 - 2019-12-12

Added

  • Add `Dataset.lastModified``
  • Add tags table schema
  • Add GET /tags

Changed

  • Use new Flyway version to fix migration with custom roles
  • Modify args column in table `run_args

0.7.0 - 2019-12-05

Added

  • Link dataset versions with run inputs
  • Add schema required by tagging
  • More tests for class common.Utils
  • Add ColumnsTest
  • Add RunDao.insert()
  • Add RunStateDao.insert()
  • Add METRICS.md
  • Add prometheus dep and expose GET /metrics

Fixed

  • Fix dataset field serialization

0.6.0 - 2019-11-29

Added

  • Add Job.latestRun
  • Add debug logging

Changed

  • Adjust class RunResponse property ordering on serialization
  • Update logging on default namespace creation

0.5.1 - 2019-11-20

Added

  • Add dataset field versioning support
  • Add link to web UI
  • Add Job.context

Changed

  • Update semver regex in build-and-push.sh
  • Minor updates to job and dataset versioning functions
  • Make Job.location optional

0.5.0 - 2019-11-04

Added

  • Add lombok.config
  • Add code review guidelines
  • Add JobType
  • Add limit and offset support to NamespaceAPI
  • Add Development section to CONTRIBUTING.md
  • Add class DatasetMeta
  • Add class MorePreconditions
  • Added install instructions for docker

Changed

  • Rename guid column to uuid
  • Use admin ping and health
  • Update owner to ownerName

Removed

  • Remove experimental db table versioning code

Fixed

  • Fix marquez.jar rename on COPY

0.4.0 - 2019-06-04

Added

  • Add quickstart
  • Add GET /namespaces/{namespace}/jobs/{job}/runs

0.3.4 - 2019-05-17

Changed

  • Change Datasetdao.findAll() to order by Dataset.name

0.3.3 - 2019-05-14

Changed

  • Set timestamps to CURRENT_TIMESTAMP

0.3.2 - 2019-05-14

Changed

  • Set job_versions.updated_at to CURRENT_TIMESTAMP

0.3.1 - 2019-05-14

Added

  • Handle Flyway.repair() error

0.3.0 - 2019-05-14

Added

  • Add JobResponse.updatedAt

Changed

  • Return timestamp strings as ISO format

Removed

  • Remove unused tables in db schema

0.2.1 - 2019-04-22

Changed

  • Support dashes (-) in namespace

0.2.0 - 2019-04-15

Added

  • Add @NoArgsConstructor to exceptions
  • Add license to *.java
  • Add column constants
  • Add response/error metrics to API endpoints
  • Add build info to jar manifest
  • Add release steps and plugin
  • Add /jobs/runs/{id}/run
  • Add jdbi metrics
  • Add gitter link
  • Add column constants
  • Add MarquezServiceException
  • Add -parameters compiler flag
  • Add JSON logging support

Changed

Fixed

  • Fix dataset list error

0.1.0 - 2018-12-18

  • Marquez initial public release.