Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/improve jobdao query #2609

Merged

Conversation

algorithmy1
Copy link
Contributor

Problem

Hello,

While working with Marquez, I noticed a significant performance bottleneck with a specific SQL query in the JobDao.java file. For the namespaceName "MyNameSpace", the query was originally taking 17 seconds to execute with a limit of 100, and 12 seconds with a limit of 25. Given that this query runs every time the Marquez web UI is accessed, this presented a major user experience challenge.
db.t4g.medium (vCPU: 2, RAM: 4 GB)

See : #2608

Solution

To address this, I've revised the query. The optimized query makes use of Common Table Expressions to fetch the required data more efficiently and before the join. Here's the optimized query:

    WITH jobs_view_page
    AS (
      SELECT
        *
      FROM 
        jobs_view AS j
      WHERE 
        j.namespace_name = :namespaceName
      ORDER BY
        j.name
      LIMIT 
        :limit
      OFFSET 
        :offset
    ),
    job_versions_temp AS (
      SELECT 
        *
      FROM 
        job_versions AS j
      WHERE 
        j.namespace_name = :namespaceName
    ),
    facets_temp AS (
    SELECT 
      run_uuid,
        JSON_AGG(e.facet) AS facets
    FROM (
        SELECT 
          jf.run_uuid,
            jf.facet
        FROM 
          job_facets_view AS jf
        INNER JOIN job_versions_temp jv2 
          ON jv2.latest_run_uuid = jf.run_uuid
        INNER JOIN jobs_view_page j2 
          ON j2.current_version_uuid = jv2.uuid
        ORDER BY 
          lineage_event_time ASC
        ) e
    GROUP BY e.run_uuid
    )
    SELECT 
      j.*,
      f.facets
    FROM 
      jobs_view_page AS j
    LEFT OUTER JOIN job_versions_temp AS jv 
      ON jv.uuid = j.current_version_uuid
    LEFT OUTER JOIN facets_temp AS f
      ON f.run_uuid = jv.latest_run_uuid
    ORDER BY
        j.name

On the same cluster db.t4g.medium (vCPU: 2, RAM: 4 GB), the optimization reduced the execution time from 17 seconds with limit=100 to a mere 300ms. For limit=25, it dropped from 12 seconds to under 100ms.

Furthermore, I believe there's potential for even more optimization. If job_facets_view included the column namespace_name, it might allow for further refinements.

One-line summary: Optimized a critical SQL query in JobDao.java, resulting in a significant reduction in execution time.

Checklist

  • I've signed-off on my work.
  • My changes are accompanied by tests (if relevant).
  • The change contains a small diff and is self-contained.
  • I've updated the relevant documentation (if necessary).
  • I've included a one-line summary of my change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • I've versioned my .sql database schema migration according to Flyway's naming convention (if relevant).
  • I've included the appropriate header in any source code files (if relevant).

Signed-off-by: Abdallah Terrab <[email protected]>
Signed-off-by: Abdallah Terrab <[email protected]>
@boring-cyborg boring-cyborg bot added the api API layer changes label Sep 6, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 6, 2023

Thanks for opening your first pull request in the Marquez project! Please check out our contributing guidelines (https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md).

@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #2609 (0db03bc) into main (3f19508) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2609   +/-   ##
=========================================
  Coverage     83.31%   83.31%           
  Complexity     1289     1289           
=========================================
  Files           243      243           
  Lines          5940     5940           
  Branches        280      280           
=========================================
  Hits           4949     4949           
  Misses          844      844           
  Partials        147      147           
Files Changed Coverage Δ
api/src/main/java/marquez/db/JobDao.java 100.00% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries are equivalent and new one is proven to be faster, so the PR deserves an approval 👍 🔢 🥇

I thought that CTEs do just work in a way as copy pasting the syntax to make it more readable, while not affecting the query plan and performance. In this case, it looks like it allowed limit before the joins. Great finding.

@algorithmy1 First-class work. Thank you.

@algorithmy1
Copy link
Contributor Author

algorithmy1 commented Sep 7, 2023

Thank you for the compliment 🥰

By the way we can improve the query without using CETs but we will need to use limit 2 times (filter on the page two times). Moreover this way, the query is more readable.

@pawel-big-lebowski pawel-big-lebowski merged commit 3243a0f into MarquezProject:main Sep 7, 2023
12 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 7, 2023

Great job! Congrats on your first merged pull request in the Marquez project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants