Skip to content

Latest commit

 

History

History
956 lines (661 loc) · 61.2 KB

CHANGELOG.md

File metadata and controls

956 lines (661 loc) · 61.2 KB

Changelog

0.31.0 - 2023-02-16

Added

  • UI: add facet view enhancements #2336 @tito12
    Creates a dynamic component with the ability to navigate and search the JSON, expand sections and click on links.
  • UI: highlight selected path on graph and display status of jobs and datasets based on last 14 runs or latest quality facets #2384 @tito12
    Adds highlighting of the visual graph based on upstream and downstream dependencies of selected nodes, makes displayed status reflect last 14 runs the case of jobs and latest quality facets in the case of datasets.
  • UI: enable auto-accessibility feature on graph nodes #2388 @merobi-hub
    Adds attributes to the FontAwesomeIcons to enable a built-in accessibility feature.

Fixed

  • API: add index to jobs_fqn table using namespace_name and job_fqn columns #2357 @collado-mike
    Optimizes read queries by adding an index to this table.
  • API: add missing indices to column_lineage, dataset_facets, job_facets tables #2419 @pawel-big-lebowski
    Creates missing indices on reference columns in a number of database tables.
  • Spec: make data version and dataset types the same #2400 @phixme
    Makes the fields property the same for datasets and dataset versions, allowing type-denerating systems to treat them the same way.
  • UI: show location button only when link to code exists #2409 @tito12
    Makes the button visible only if the link is not empty.

0.30.0 - 2023-01-31

Added

  • Proposals: add proposal for OL facet tables #2076 @wslulciuc
    Adds the proposal Optimize query performance for OpenLineage facets.
  • UI: display column lineage of a dataset #2293 @pawel-big-lebowski
    Adds a JSON preview of column-level lineage of a selected dataset to the UI.
  • UI: Add soft delete option to UI #2343 @tito12
    Adds option to soft delete a data record with a dialog component and double confirmation.
  • API: split lineage_events table to dataset_facets, run_facets, and job_facets tables. 2350, 2355, 2359 @wslulciuc, @pawel-big-lebowski Performance improvement storing and querying facets. Migration procedure requires manual steps if database has more than 100K lineage events. We highly encourage users to review our migration plan.
  • Docker: add new script for stopping Docker #2380 @rossturk
    Provides a clean way to stop a deployment via docker-compose down.
  • Docker: seed data for column lineage #2381 @rossturk
    Adds some ColumnLineageDatasetFacet JSON snippets to docker/metadata.json to seed data for column-level lineage facets.

Fixed

  • API: validate RunLink and JobLink #2342 @pawel-big-lebowski
    Fixes validation of the ParentRunFacet to avoid NullPointerExceptions in the case of empty run sections.
  • Docker: use docker-compose.web.yml as base compose file #2360 @wslulciuc
    Fixes the Marquez HTTP server set in docker/up.sh so the script uses docker-compose.web.yml with overrides for dev set via docker-compose.web-dev.yml.
  • Docs: update copyright headers #2353 @merobi-hub Updates the headers with the current year.
  • Chart: fix Helm chart #2374 @perttus
    Fixes minor issues with the Helm chart.
  • Spec: update dataset version API spec #2389 @phixme
    Adds limit and offset to the openapi.yml spec file as query parameters.

0.29.0 - 2022-12-19

Added

  • Column-lineage endpoints supports point-in-time requests #2265 @pawel-big-lebowski Enable requesting column-lineage endpoint by a dataset version, job version or dataset field of a specific dataset version.
  • Present column lineage of a dataset #2293 @pawel-big-lebowski Column lineage of a dataset with a single level of depth can be displayed in datase details tab.
  • Add point-in-time requests support to column-lineage endpoints #2265 @pawel-big-lebowski
    Enables requesting column-lineage endpoint by a dataset version, job version or dataset field of a specific dataset version.
  • Add column lineage point-in-time Java client methods #2269 @pawel-big-lebowski
    Java client methods to retrieve point-in-time column-lineage. Please note that the existing methods getColumnLineageByDataset, getColumnLineageByDataset and getColumnLineageByDatasetField are replaced by a single getColumnLineage method taking NodeId as a parameter.
  • Add raw event viewer to UI #2249 @tito12
    A new events page enables filtering events by date and expanding the payload by clicking on each event.
  • Update events page with styling synchronization #2324 @phixMe
    Makes some updates to the new page to make it conform better to the overall design system.
  • Update helm Ingress template to be cross-compatible with recent k8s versions #2275 @jlukenoff
    Certain components of the Ingress schema have changed in recent versions of Kubernetes. This change updates the Ingress helm template to render based on the semantic Kubernetes version.
  • Add delete namespace endpoint doc to OpenAPI docs #2295 @mobuchowski
    Adds a doc about the delete namespace endpoint.
  • Add i18next and language switcher for i18n of UI #2254 @merobi-hub @phixMe
    Adds i18next framework, language switcher, and translations for i18n of UI.
  • Add indexed created_at column to lineage events table #2299 @prachim-collab
    A new timestamp column in the database supports analytics use cases by allowing for identification of incrementally created events (backwards-compatible).

Fixed

  • Allow null column type in column lineage #2272 @pawel-big-lebowski
    The column-lineage endpoint was throwing an exception when no data type of the field was provided. Includes a test.
  • Include error message for JSON processing exception #2271 @pawel-big-lebowski
    In case of JSON processing exceptions, the Marquez API now returns an exception message to a client.
  • Fix column lineage when multiple jobs write to same dataset #2289 @pawel-big-lebowski
    The fix deprecates the way the fields transformationDescription and transformationType are returned. The deprecated way of returning those fields will be removed in 0.30.0.
  • Use raw link for iconSearchArrow.svg #2280 @wslulciuc
    Using a direct link to the events viewer icon fixes a loading issue.
  • Fill run state of parent run when created by child run #2296 @fm100
    Adds a run state to the parent at creation time to address a missing run state issue in Airflow integration.
  • Update migration query to make it work with existing view #2308 @fm100
    Changes the V52 migration query to drop the view before ALTER. Because repeatable migration runs only when its checksum changes, it was necessary to get the view definition first then drop and recreate it.
  • Fix lineage for orphaned datasets #2314 @collado-mike
    Fixes lineage for datasets generated by jobs whose current versions no longer write to the databases in question.
  • Ensure job data in lineage query is not null or empty #2253 @wslulciuc
    Changes the API to return an empty graph in the edge case of a job UUID that has no lineage when calling LineageDao.getLineage() yet is associated with a dataset. This case formerly resulted in an empty set and backend exception. Also includes logging and an API check for a nodeID.
  • Make name and type required for datasets #2305 @wslulciuc
    When generating Typescript from the OpenAPI spec, name and type were not required but should have been.
  • Remove unused filter on RunDao.updateStartState() #2319 @wslulciuc
    Removes the conditions updated_at < transitionedAt and start_run_state_uuid != null to allow for updating the run state.
  • Update linter #2322 @phixMe
    Adds npm run eslint-fix to the CI config to fail if it does not return with a RC 0.
  • Fix asset loading for web #2323 @phixMe
    Fixes the webpack config and allows files to be imported in a modern capacity that enforces the assets exist.

0.28.0 - 2022-11-21

Added

  • Optimize current runs query for lineage API #2211 @prachim-collab
    Add a simpler, alternate getCurrentRuns query that gets only simple runs from the database without the additional data from tables such as run_args, job_context, facets, etc., which required extra table joins.
  • Add Code Quality, DCO and Governance docs to project #2237 #2241 @merobi-hub
    Adds a number of standard governance and procedure docs to the project.
  • Add possibility to soft-delete namespaces #2244 @mobuchowski
    Adds the ability to "hide" inactive namespaces. The namespaces are undeleted when a relevant OL event is received.
  • Add search service proposal #2203 @pawel-big-lebowski
    Proposes using ElasticSearch as a pluggable search service to enhance the search feature in Marquez and adding the ability to turn it off, as well. Includes ideas about what should be indexed and the requirements for the interface.

Fixed

  • Show facets even when dataset has no fields #2214 @JDarDagran
    Changes the logic in the DatasetInfo component to always show facets so that dataset facets are visible in the UI even if no dataset fields have been set.
  • Appreciate column prefix when given for ended_at #2231 @fm100
    The ended_at column was always null when querying if columnPrefix was given for the mapper. Now, columnPrefix is included when checking for column existence.
  • Fix bug keeping jobs from being properly deleted #2244 @mobuchowski
    It wasn't possible to delete jobs created from events that had a ParentRunFacet. Now it's possible.
  • Fix symlink table column length '#2217' @pawel-big-lebowski
    The dataset's name column in the dataset_symlinks table was shorter than the column in the datasets table. Changes the existing V48 migration script to allow proper migration for users who did not upgrade yet, and adds an extra migration script to extend the column length for users who did upgrade but did not experience the issues.

0.27.0 - 2022-10-24

Added

  • Implement dataset symlink feature #2066 @pawel-big-lebowski
    Adds support for multiple dataset names and adds edges to the lineage graph based on symlinks.
  • Store column lineage facets in separate table #2096 @mzareba382 @pawel-big-lebowski
    Adds a column-level lineage representation and API endpoint to retrieve column-level lineage data from the Marquez database.
  • Add a lineage graph endpoint for column lineage #2124 @pawel-big-lebowski
    Allows for the storing of column-lineage information from events in the Marquez database and exposes column lineage through a graph endpoint.
  • Enrich returned dataset resource with column lineage information #2113 @pawel-big-lebowski
    Extends the /api/v1/namespaces/{namespace}/datasets endpoint to return the columnLineage facet.
  • Add downstream column lineage #2159 @pawel-big-lebowski
    Extends the recursive query that returns column lineage nodes to traverse the graph for downstream nodes.
  • Implement column lineage within Marquez Java client #2163 @pawel-big-lebowski
    Adds Marquez API client methods for column lineage.
  • Provide dataset_symlinks table for SymlinkDatasetFacet #2087 @pawel-big-lebowski
    Modifies Marquez to handle the new SymlinkDatasetFacet in the OpenLineage spec.
  • Display current run state for job node in lineage graph #2146 @wslulciuc
    Fills job nodes in the lineage graph with the latest run state and makes some minor changes to column names used to display dataset and job metadata.
  • Include column lineage in dataset resource #2148 @pawel-big-lebowski
    Creates a method in ColumnLineageService to enrich Dataset with column lineage information and uses the method in DatasetResource.
  • Add indices on the job table #2161 @phixMe
    Adds indices to the fields used we join on inside the lineage query to speed up the join operation in the /lineage query.
  • Add endpoint to get column lineage by a job #2204 @pawel-big-lebowski
    Changes the API to make column lineage available for jobs.
  • Add column lineage methods to Python client #2209 @pawel-big-lebowski
    Implements methods for column lineage in the Python client.

Changed

  • Update insert job function to avoid joining on symlinks for jobs with no symlinks #2144 @collado-mike
    Radically reduces the database compute load in Marquez installations that frequently create a large number of new jobs.
  • Increase size of column-lineage.description column #2205 @pawel-big-lebowski
    VARCHAR(255) was too small for some users.

Fixed

  • Add support for parentRun facet as reported by older Airflow OpenLineage versions #2130 @collado-mike
    Adds a parentRun alias to the LineageEvent RunFacet.
  • Add fix and tests for handling Airflow DAGs with dots and task groups #2126 @collado-mike @wslulciuc
    Fixes a recent change that broke how Marquez handles DAGs with dots and tasks within task groups and adds test cases to validate.
  • Fix version bump in docker/up.sh #2129 @wslulciuc
    Defines a VERSION variable to bump on a release.
  • Use clean when running shadowJar in Dockerfile #2145 @wslulciuc
    Ensures the directory api/build/libs/ is cleaned before building the JAR again and updates .dockerignore to ignore api/build/*.
  • Fix bug that caused a single run event to create multiple jobs #2162 @collado-mike
    Checks to see if a run with the given ID already exists and uses the pre-associated job if so.
  • Fix column lineage returning multiple entries for job run multiple times #2176 @pawel-big-lebowski
    Makes column lineage return a column dependency only once if a job has been run several times.
  • Fix API spec issues #2178 @phixMe
    Fixes issues with type generators in the putDataset API.
  • Fix downstream recursion #2181 @pawel-big-lebowski
    Fixes issue causing same node to be added to recursive table multiple times.
  • Update jobs_current_version_uuid_index and jobs_symlink_target_uuid_index to ignore NULL values #2186 @collado-mike
    Avoids writing to the indices when the indexed values added by #2161 are null.

0.26.0 - 2022-09-15

Added

  • Update FlywayFactory to support an argument to customize the schema programatically #2055 @collado-mike
    Note: this change does not aim to support custom schemas from configuration.
  • Add steps on proposing changes to Marquez #2065 @wslulciuc
    Adds steps on how to submit a proposal for review along with a design doc template.
  • Add --metadata option to seed backend with ol events #2082 @wslulciuc
    Updates the seed command to load metadata from a file containing an array of OpenLineage events via the --metadata option. (Metadata used in the command was not being defined using the OpenLineage standard.)
  • Improve documentation on nodeId in the spec #2084 @howardyoo
    Adds complete examples of nodeId to the spec.
  • Add metadata cmd #2091 @wslulciuc
    Adds cmd metadata to generate OpenLineage events; generated events will be saved to a file called metadata.json that can be used to seed Marquez via the seed cmd. (We lacked a way to performance test the data model of Marquez with significantly large OL events.)
  • Add possibility to soft-delete datasets and jobs #2032 #2099 #2101 @mobuchowski
    Adds the ability to "hide" inactive datasets and jobs through the UI. (This PR does not include the UI part.) The feature works by adding an is_hidden flag to both datasets and jobs tables. Then, it changes jobs_view and adds datasets_view, which hides rows where the is_hidden flag is set to True. This makes writing proper queries easier since there is no need to do this filtering manually. The soft-delete is reversed if the job or dataset is updated again because the new version reverts the flag.
  • Add raw OpenLineage events API #2070 @mobuchowski
    Adds an API that returns raw OpenLineage events sorted by time and optionally filtered by namespace. Filtering by namespace takes into account both job and dataset namespaces
  • Create column lineage endpoint proposal #2077 @julienledem @pawel-big-lebowski
    Adds a proposal to implement a column-level lineage endpoint in Marquez to leverage the column-level lineage facet in OpenLineage.

Changed

  • Update lineage query to only look at jobs with inputs or outputs #2068 @collado-mike
    Changes the lineage query to query the job_versions_io_mapping table and INNER join with the jobs_view so that only jobs that have inputs or outputs are present in the jobs_io CTE. Hence, the table becomes very small and the recursive join in the lineage CTE very fast. (In many environments, a large number of jobs reporting events have no inputs or outputs - e.g., PythonOperators in an Airflow deployment. If a Marquez installation has many of these, the lineage query spends much of its time searching for overlaps with jobs that have no inputs or outputs.)
  • Persist OpenLineage event before updating Marquez model #2069 @fm100
    Switches the order of the code in order to persist the OpenLineage event first and then update the Marquez model. (When the RunTransitionListener was invoked, the OpenLineage event was not persisted to the database. Because the OpenLineage event is the source of truth for all Marquez run transitions, it should be available from RunTransitionListener.)
  • Drop requirement to provide marquez.yml for seed cmd #2094 @wslulciuc
    Use io.dropwizard.cli.Command instead of io.dropwizard.cli.ConfiguredCommand to no longer require passing marquez.yml as an argument to the seed cmd. (The marquez.yml argument is not used in the seed cmd.)

Fixed

  • Fix/rewrite jobs fqn locks #2067 @collado-mike
    Updates the function to only update the table if the job is a new record or if the symlink_target_uuid is distinct from the previous value. (The rewrite_jobs_fqn_table function was inadvertently updating jobs even when no metadata about the job had changed. Under load, this caused significant locking issues, as the jobs_fqn table must be locked for every job update.)
  • Fix enum string types in the OpenAPI spec #2086 @studiosciences
    Changes the type to string. (type: enum was not valid in OpenAPI spec.)
  • Fix incorrect PostgresSQL version #2089 @jabbera
    Corrects the tag for PostgresSQL.
  • Update OpenLineageDao to handle Airflow run UUID conflicts #2097 @collado-mike
    Alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run ID that will be written with the correct namespace. (The Airflow integration was generating conflicting UUIDs based on the DAG name and the DagRun ID without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generated jobs whose parents have the wrong namespace.)

0.25.0 - 2022-08-08

Fixed

0.24.0 - 2022-08-02

Added

  • Add copyright lines to all source files #1996 @merobi-hub
  • Add copyright and license guidelines in CONTRIBUTING.md @wslulciuc
  • Add @FlywayTarget annotation to migration tests to control flyway upgrades #2035 @collado-mike

Changed

Fixed

0.23.0 - 2022-06-16

Added

Changed

  • Set default limit for listing datasets and jobs in UI from 2000 to 25 #2018 @wslulciuc
  • Update OpenLineage write API to be non-transactional and avoid unnecessary locks on records under heavy contention @collado-mike

Fixed

0.22.0 - 2022-05-16

Added

  • Add support for LifecycleStateChangeFacet with an ability to softly delete datasets #1847@pawel-big-lebowski
  • Enable pod specific annotations in Marquez Helm Chart via marquez.podAnnotations #1945 @wslulciuc
  • Add support for job renaming/redirection via symlink #1947 @collado-mike
  • Add Created by view for dataset versions along with SQL syntax highlighting in web UI #1929 @phixMe
  • Add operationId to openapi spec #1978 @phixMe

Changed

Fixed

0.21.0 - 2022-03-03

Added

  • Add MDC to the LoggingMdcFilter to include API method, path, and request ID @fm100
  • Add Postgres sub-chart to Helm deployment for easier installation option @KevinMellott91
  • GitHub Action workflow to validate changes to Helm chart @KevinMellott91

Changed

  • Upgrade from Java11 to Java17 @ucg8j
  • Switch JDK image from alpine to temurin enabling Marquez to run on multiple CPU architectures @ucg8j

Fixed

  • Error when running Marquez on Apple M1 @ucg8j

Removed

  • The /api/v1-beta/lineage endpoint @wslulciuc

  • The marquez-airflow lib. has been removed, Please use the openlineage-airflow library instead. To migrate to using openlineage-airflow, make the following changes @wslulciuc:

    # Update the import in your DAG definitions
    -from marquez_airflow import DAG
    +from openlineage.airflow import DAG
    # Update the following environment variables in your Airflow instance
    -MARQUEZ_URL
    +OPENLINEAGE_URL
    -MARQUEZ_NAMESPACE
    +OPENLINEAGE_NAMESPACE
  • The marquez-spark lib. has been removed. Please use the openlineage-spark library instead. To migrate to using openlineage-spark, make the following changes @wslulciuc:

    SparkSession.builder()
    - .config("spark.jars.packages", "io.github.marquezproject:marquez-spark:0.20.+")
    + .config("spark.jars.packages", "io.openlineage:openlineage-spark:0.2.+")
    - .config("spark.extraListeners", "marquez.spark.agent.SparkListener")
    + .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener")
      .config("spark.openlineage.host", "https://api.demo.datakin.com")
      .config("spark.openlineage.apiKey", "your datakin api key")
      .config("spark.openlineage.namespace", "<NAMESPACE_NAME>")
    .getOrCreate()

0.20.0 - 2021-12-13

Added

Changed

  • Clarify docs on using OpenLineage for metadata collection @fm100
  • Upgrade to gradle 7.x @wslulciuc
  • Use eclipse-temurin for Marquez API base docker image @fm100

Deprecated

  • The following endpoints have been deprecated and are scheduled to be removed in 0.25.0. Please use the /lineage endpoint when collecting source, dataset, and job metadata @wslulciuc:
    • /sources endpoint to collect source metadata
    • /datasets endpoint to collect dataset metadata
    • /jobs endpoint to collect job metadata

Fixed

  • Validation of OpenLineage events on write @collado-mike
  • Increase name column size for tables namespaces and sources @mmeasic

Security

0.19.1 - 2021-11-05

Fixed

  • URI and URL DB mappper should handle empty string as null @OleksandrDvornik
  • Fix NodeId parsing when dataset name contains struct<> @fm100
  • Add encoding for dataset names in URL construction @collado-mike

0.19.0 - 2021-10-21

Added

  • Add simple python client example @wslulciuc
  • Display dataset versions in web UI 🎉 @phixMe
  • Display runs and run facets in web UI 🎉 @phixMe
  • Facet formatting and highlighting as Json in web UI @phixMe
  • Add option for docker/up.sh to run in the background @rossturk
  • Return totalCount in lists of jobs and datatsets @phixMe

Changed

  • Change type column in dataset_fields table to TEXT @wslulciuc
  • Set ZonedDateTime parsing to support optional offsets and default to server timezone @collado-mike

Fixed

  • Job.location and Source.connectionUrl should be in URI format on write @OleksandrDvornik
  • Z-Index fix for nodes and edges in lineage graph @phixMe
  • Format of the index files for web UI @phixMe
  • Fix OpenLineage API to return correct response codes for exceptions propagated from async calls @collado-mike
  • Stopped overwriting nominal time information with nulls @mobuchowski

Removed

  • WriteOnly clients for java and python. Before OpenLineage, we added a WriteOnly implementation to our clients to emit calls to a backend. A backend enabled collecting raw HTTP requests to an HTTP endpoint, console, or file. This was our way of capturing lineage events that could then be used to automatically create resources on the Marquez backend. We soon worked on a standard that eventually became OpenLineage. That is, OpenLineage removed the need to make individual calls to create a namespace, a source, a datasets, etc, but rather accept an event with metadata that the backend could process. @wslulciuc

0.18.0 - 2021-09-14

Added

  • New Add Search API 🎉 @wslulciuc
  • Add .env.example to override variables defined in docker-compose files @wslulciuc

Changed

Fixed

Removed

  • Drop job_versions_io_mapping_inputs and job_versions_io_mapping_outputs tables @OleksandrDvornik

0.17.0 - 2021-08-20

Changed

  • Update Lineage runs query to improve performance, added tests @collado-mike
  • Add POST /api/v1/lineage endpoint to docs and deprecate run endpoints @wslulciuc
  • Drop FieldType enum @wslulciuc

Deprecated

Removed

0.16.1 - 2021-07-13

Fixed

  • dbt packages should look for namespace packages @mobuchowski
  • Add common integration dependency to dbt plugins @mobuchowski
  • DatasetVersionDao queries missing input and output facets @dominiquetipton
  • (De)serialization issue for Run and JobData models @collado-mike
  • Prefix spark openlineage.* configuration parameters with spark.* @collado-mike
  • Parse multi-statement sql in class SqlParser used in Airflow integration @wslulciuc
  • URL-encode namespace on calls to API backend @phixMe

0.16.0 - 2021-07-01

Added

Changed

Fixed

0.15.2 - 2021-06-17

Added

  • Add endpoint to create tags @hanbei

Fixed

  • Fixed build & release process for python marquez-integration-common package @collado-mike
  • Fixed snowflake and bigquery errors when connector libraries not loaded @collado-mike
  • Fixed Openlineage API does not set Dataset current_version_uuid #1361 @collado-mike

0.15.1 - 2021-06-11

Added

  • Factored out common functionality in Python airflow integration @mobuchowski
  • Added Airflow task run macro to expose task run id @collado-mike

Changed

  • Refactored ValuesAverageExpectationParser to ValuesSumExpectationParser and ValuesCountExpectationParser @collado-mike
  • Updated SparkListener to extend Spark's SparkListener abstract class @collado-mike

Fixed

  • Use current project version in spark openlineage client @mobuchowski
  • Rewrote LineageDao queries and LineageService for performance @collado-mike
  • Updated lineage query to include new jobs that have no job version yet @collado-mike

0.15.0 - 2021-05-24

Added

Changed

  • Augment tutorial instructions & screenshots for Airflow example @rossturk
  • Rewrite correlated subqueries when querying the lineage_events table @collado-mike

Fixed

0.14.2 - 2021-05-06

Changed

  • Unpin requests dep in marquez-airflow integration @wslulciuc
  • Unpin attrs dep in marquez-airflow integration @wslulciuc

0.14.1 - 2021-05-05

Changed

  • Updated dataset lineage query to find most recent job that wrote to it @collado-mike
  • Pin http-proxy-middleware to 0.20.0 @wslulciuc

0.14.0 - 2021-05-03

Added

Changed

0.13.1 - 2021-04-01

Changed

  • Remove unused implementation of SQL parser in marquez-airflow @mobuchowski

Fixed

  • Add inputs and outputs to lineage graph @henneberger
  • Updated NodeId regex to support URIs with scheme and ports @collado-mike

0.13.0 - 2021-03-30

Added

  • Secret support for helm chart @KevinMellott91
  • New seed cmd to populate marquez database with source, dataset, and job metadata allowing users to try out features of Marquez (data lineage, view job run history, etc) 🎉
  • Docs on applying db migrations manually
  • New Lineage API to support data lineage queries 🎉
  • Support for logging errors via sentry
  • New Airflow example with Marquez 🎉

Changed

  • Update OpenLinageDao to stop converting URI structures to contain underscores instead of colons and slashes @collado-mike
  • Bump testcontainers dependency to v1.15.2 @ ShakirzyanovArsen
  • Register output datasets for a run lazily @henneberger
  • Refactor spark plan traversal to find input/output datasets from datasources @collado-mike
  • Web UI project settings and default marquez port @phixMe
  • Associate dataset inputs on run start @henneberger

Fixed

0.12.2 - 2021-03-16

Changed

  • Use alpine image for marquez reducing image size by +50% @KevinMellott91
  • Use alpine image for marquez-web reducing image size by +50% @KevinMellott91

Fixed

  • Ensure marquez.DAG is (de)serializable

0.12.0 - 2021-02-08

Added

Changed

  • Drop Source.type enum (now a string type)

Fixed

  • Replace jdbi.getHandle() with jdbi.withHandle() to free DB connections from pool @henneberger
  • Fix RunListener when registering outside of the MarquezContext builder @henneberger

0.11.3 - 2020-11-02

Added

  • Add support for external ID on run creation @julienledem
  • Throw RunAlreadyExistsException on run ID already exists
  • Add BigQuery, Pulsar, and Oracle source types @sreev
  • Add run ID support in job meta; the optional run ID will be used to link a newly created job version to an existing job run, while supporting updating the run state and avoiding having to create another run

Fixed

0.11.2 - 2020-08-21

Changed

  • Always migrate db schema on app start in development config
  • Update default db username / password
  • Use marquez.dev.yml in on docker compose up

0.11.1 - 2020-08-19

Added

  • Use shorten name for namespaces in version IDs

  • Add namespace to Dataset and Job models

  • Add ability to deserialize int type to columns @phixMe

  • Add SqlLogger for SQL profiling

  • Add DatasetVersionId.asDatasetId() and JobVersionId.asJobId()

  • Add DatasetService.getBy(DatasetVersionId): Dataset

  • Add JobService.getBy(JobVersionId): Job

  • Allow for run transition override via at=<TIMESTAMP>, where TIMESTMAP is an ISO 8601 timestamp representing the date/time of the state transition. For example:

    POST /jobs/runs/{id}/start?at=<TIMESTAMP>
    

Changed

  • config.yml -> marquez.yml

Fixed

  • Fix dataset version column mappings

0.11.0 - 2020-05-27

Added

Changed

  • Job inputs / outputs are defined as DatasetId
  • Bump to JDK 11

Removed

  • Use of API models under marquez.api.models pkg

Fixed

  • API docs example to show correct SQL key in job context @frankcash

0.10.4 - 2020-01-17

Fixed

  • Fix RunState.isComplete()

0.10.3 - 2020-01-17

Added

  • Add new logo
  • Add JobResource.locationFor()

Fixed

  • Fix dataset field versioning
  • Fix list job runs

0.10.2 - 2020-01-16

Added

  • Added Location header to run creation @nkijak

0.10.1 - 2020-01-11

Changed

  • Rename datasets.last_modified

0.10.0 - 2020-01-08

Changed

  • Rename table dataset_tag_mapping

0.9.2 - 2020-01-07

Added

  • Add Flyway.baselineOnMigrate flag

0.9.1 - 2020-01-06

Added

  • Add redshift data types
  • Add links to dropwizard overrides in config.yml

0.9.0 - 2020-01-05

Added

  • Validate runID when linked to dataset change
  • Add Utils.toUuid()
  • Add tests for class TagDao
  • Add default tags to config
  • Add tagging support for dataset fields
  • Add docker/config.dev.yml
  • Add flyway config support

Changed

  • Replace deprecated App.onFatalError()

Fixed

  • Fix error on tag exists
  • Fix malformed sql in RunDao.findAll()

0.8.0 - 2019-12-12

Added

  • Add `Dataset.lastModified``
  • Add tags table schema
  • Add GET /tags

Changed

  • Use new Flyway version to fix migration with custom roles
  • Modify args column in table `run_args

0.7.0 - 2019-12-05

Added

  • Link dataset versions with run inputs
  • Add schema required by tagging
  • More tests for class common.Utils
  • Add ColumnsTest
  • Add RunDao.insert()
  • Add RunStateDao.insert()
  • Add METRICS.md
  • Add prometheus dep and expose GET /metrics

Fixed

  • Fix dataset field serialization

0.6.0 - 2019-11-29

Added

  • Add Job.latestRun
  • Add debug logging

Changed

  • Adjust class RunResponse property ordering on serialization
  • Update logging on default namespace creation

0.5.1 - 2019-11-20

Added

  • Add dataset field versioning support
  • Add link to web UI
  • Add Job.context

Changed

  • Update semver regex in build-and-push.sh
  • Minor updates to job and dataset versioning functions
  • Make Job.location optional

0.5.0 - 2019-11-04

Added

  • Add lombok.config
  • Add code review guidelines
  • Add JobType
  • Add limit and offset support to NamespaceAPI
  • Add Development section to CONTRIBUTING.md
  • Add class DatasetMeta
  • Add class MorePreconditions
  • Added install instructions for docker

Changed

  • Rename guid column to uuid
  • Use admin ping and health
  • Update owner to ownerName

Removed

  • Remove experimental db table versioning code

Fixed

  • Fix marquez.jar rename on COPY

0.4.0 - 2019-06-04

Added

  • Add quickstart
  • Add GET /namespaces/{namespace}/jobs/{job}/runs

0.3.4 - 2019-05-17

Changed

  • Change Datasetdao.findAll() to order by Dataset.name

0.3.3 - 2019-05-14

Changed

  • Set timestamps to CURRENT_TIMESTAMP

0.3.2 - 2019-05-14

Changed

  • Set job_versions.updated_at to CURRENT_TIMESTAMP

0.3.1 - 2019-05-14

Added

  • Handle Flyway.repair() error

0.3.0 - 2019-05-14

Added

  • Add JobResponse.updatedAt

Changed

  • Return timestamp strings as ISO format

Removed

  • Remove unused tables in db schema

0.2.1 - 2019-04-22

Changed

  • Support dashes (-) in namespace

0.2.0 - 2019-04-15

Added

  • Add @NoArgsConstructor to exceptions
  • Add license to *.java
  • Add column constants
  • Add response/error metrics to API endpoints
  • Add build info to jar manifest
  • Add release steps and plugin
  • Add /jobs/runs/{id}/run
  • Add jdbi metrics
  • Add gitter link
  • Add column constants
  • Add MarquezServiceException
  • Add -parameters compiler flag
  • Add JSON logging support

Changed

Fixed

  • Fix dataset list error

0.1.0 - 2018-12-18

  • Marquez initial public release.

SPDX-License-Identifier: Apache-2.0 Copyright 2018-2023 contributors to the Marquez project.