Performance Issue with Query on Large Data Sets in JobDao.java #2608

algorithmy1 · 2023-09-06T18:33:07Z

Hello team,

I'd like to raise a performance concern regarding a specific SQL query in the JobDao.java file.

File & Line Reference:

marquez/api/src/main/java/marquez/db/JobDao.java

Line 126 in e85127c

@SqlQuery(

Query:

SELECT j.*, f.facets
  FROM jobs_view AS j
  LEFT OUTER JOIN job_versions AS jv ON jv.uuid = j.current_version_uuid
LEFT OUTER JOIN (
  SELECT run_uuid, JSON_AGG(e.facet) AS facets
  FROM (
    SELECT jf.run_uuid, jf.facet
    FROM job_facets_view AS jf
    INNER JOIN job_versions jv2 ON jv2.latest_run_uuid=jf.run_uuid
    INNER JOIN jobs_view j2 ON j2.current_version_uuid=jv2.uuid
    WHERE j2.namespace_name=:namespaceName
    ORDER BY lineage_event_time ASC
  ) e
  GROUP BY e.run_uuid
) f ON f.run_uuid=jv.latest_run_uuid
WHERE j.namespace_name = :namespaceName
ORDER BY j.name LIMIT :limit OFFSET :offset

Problem Description:
For the namespaceName value of "MyNameSpace", which encompasses 854,743 facets for their 15,650 jobs, the above query used to take more than 10 minutes to execute due to a large join.

However, after upgrading our infrastructure to a PostgreSQL cluster db.t4g.medium (vCPU: 2, RAM: 4 GB), the execution time improved, but it still takes around 11 seconds with a limit of 25. This remains a concern especially considering this query runs every time I open the Marquez web UI, causing a noticeable delay in accessing the interface.

Proposed Solution:
I believe there might be optimization opportunities for this query or, if feasible, a way to cache some of its results, especially if they are frequently accessed and don't change often.

I'd be more than happy to collaborate, provide more information, or help in any way to improve this.

Thank you!

Labels: performance, database

Now with this added context, it's clearer that even with the database upgrade, the query's performance is still suboptimal and requires attention.

The text was updated successfully, but these errors were encountered:

algorithmy1 mentioned this issue Sep 6, 2023

Perf/improve jobdao query #2609

Merged

7 tasks

algorithmy1 closed this as completed Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue with Query on Large Data Sets in JobDao.java #2608

Performance Issue with Query on Large Data Sets in JobDao.java #2608

algorithmy1 commented Sep 6, 2023

Performance Issue with Query on Large Data Sets in JobDao.java #2608

Performance Issue with Query on Large Data Sets in JobDao.java #2608

Comments

algorithmy1 commented Sep 6, 2023