Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Hello,
While working with Marquez, I noticed a significant performance bottleneck with a specific SQL query in the
JobDao.java
file. For the namespaceName "MyNameSpace", the query was originally taking 17 seconds to execute with a limit of 100, and 12 seconds with a limit of 25. Given that this query runs every time the Marquez web UI is accessed, this presented a major user experience challenge.db.t4g.medium (vCPU: 2, RAM: 4 GB)
See : #2608
Solution
To address this, I've revised the query. The optimized query makes use of Common Table Expressions to fetch the required data more efficiently and before the join. Here's the optimized query:
On the same cluster
db.t4g.medium (vCPU: 2, RAM: 4 GB)
, the optimization reduced the execution time from 17 seconds withlimit=100
to a mere 300ms. Forlimit=25
, it dropped from 12 seconds to under 100ms.Furthermore, I believe there's potential for even more optimization. If
job_facets_view
included the columnnamespace_name
, it might allow for further refinements.One-line summary: Optimized a critical SQL query in
JobDao.java
, resulting in a significant reduction in execution time.Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant).