Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it: This PR addresses a bug reported in #10379 where a cached value did not include the choice of whether to report local, remote (harvested), or all datasets for the /datasets and /datasets/monthly endpoints, resulting in incorrect values when different values were used and a cached value was available. The PR uses the existing mechanism to cache the results with the local/remote/all dataLocation choices to avoid this.
The PR also addresses the issue raised in #10710 (comment) where the old queries to find the latest relevant version were broken if/when the minor version was greater than 9. The new queries rely on ordering the results by both the major and minor version numbers and using DISTINCT ON to pick the first entry (the one with the highest major version, and among those, the highest minor version.
Which issue(s) this PR closes:
Closes #10379
Special notes for your reviewer: I did some checking of the results from the old/new queries in sql: e.g. in simplest form, comparing
select DISTINCT ON (datasetversion.dataset_id) datasetversion.id from datasetversion join dataset on dataset.id = datasetversion.dataset_id where versionstate='RELEASED' order by datasetversion.dataset_id, datasetversion.versionnumber desc, datasetversion.minorversionnumber desc;
with
select datasetversion.dataset_id || ':' || max(datasetversion.versionnumber || datasetversion.minorversionnumber) from datasetversion join dataset on dataset.id = datasetversion.dataset_id where versionstate='RELEASED' group by dataset_id;
and they appear to give the same results. In going through the queries for different metrics, I realized that the string value being returned was never used, so I've simplified the selects to just return just the dataset id (when all we're doing is count(*)) or the datasetversion id (when we're matching against the version to, for example, find filemetadatas in that version).
Suggestions on how to test this:
Compare the results of any/all of the metrics APIs with the changed code. A potential shortcut that covers most endpoints would be to install/run the https://github.com/gdcc/dv-metrics app and view it's webpage and then compare the graphs produced before and after the PR. Whether testing manually or via the app, you need to
truncate metric;
to remove the cached values or call the /clearMetricsCache api call.Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: included.
Additional documentation: