TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

hillarymarler · 2024-06-17T16:40:44Z

Describe the bug

TADA.ComparableDataIdentifier is used for the legend labels in TADA_TwoCharacteristicScatterplot even when a different column has been selected for id_cols. This is confusing when comparing the same characteristic between two monitoring locations, for example:

To Reproduce

df <- dplyr::filter(Data_6Tribes_5y_Harmonized, TADA.ComparableDataIdentifier == "TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L")
#' # Creates a scatterplot including the two specified sites in the same plot:
TADA_TwoCharacteristicScatterplot(df, id_cols = "MonitoringLocationName", groups = c("Upper Red Lake: West", "Upper Red Lake: West-Central"))

Expected behavior

Labels for the legend should reflect the column selected in id_cols.

Bug fixes should include all the following work:

Create or edit the function/code.
Document all code using line/inline and/or multi-line/block comments
to describe what is does.
Create or edit tests in tests/testthat folder to help prevent and/or
troubleshoot potential future issues.
Create or edit the function documentation. Include working
examples.
Update or add the new functionality to the appropriate vignette
(or create new one).
If function/code edits made as part of this issue impact other
functions in the package or functionality in the shiny app, ensure
those are updated as well.
Run TADA_UpdateAllRefs(), TADA_UpdateExampleData(), styler::style_pkg(),
devtools::document(), and devtools::check() and address any new notes or
issues before creating a pull request.
Run more robust check for releases: devtools::check(manual = TRUE,
remote = TRUE, incoming = TRUE)

The text was updated successfully, but these errors were encountered:

hillarymarler · 2024-06-26T16:20:57Z

Should we limit the columns that can be used for id_cols?

wokenny13 · 2024-06-26T18:44:10Z

A limitation on column arguments for id_cols could be good to consider.

Just a thought process for this function: If a different column, like the monitoring location name, is being compared, and not two characteristics (like for this example when just TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L is being compared for two monitoring location, would the function TADA_TwoCharacteristicScatterplot be a bit 'misleading' as it's only a single characteristic? Since both y axis are based on the same characteristic, wouldn't a single y column be sufficient if there's only one characteristic, or should there be considerations on keeping the same scale when it is the same characteristic (ex. 0 to 100 MG/L for both y-axis 1 and y-axis 2?).

Above is the view on the legend labels when id_cols are for monitoring location names to try to address this issue.

hillarymarler · 2024-06-26T20:51:55Z

Below is an example of a scatterplot I modified for a demo. See (https://usepa.github.io/EPATADA/articles/TADAWaterSciConWorkshopDemo.html) if you want to see how to create the data set used in the example below.

Maybe it would be possible to conditionally remove the 2nd y-axis if the same characteristic is plotted in both traces? But base the scale on both traces?

What do you think of changing the name to TADA_TwoGroupScatterplot? That might be more descriptive if we are making it flexible enough to accommodate different id_cols inputs.

Or is this starting to get so convoluted that it may make sense to create a separate function for comparing different locations rather than different characteristics? We could discuss tomorrow.

`# create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot
twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>%
dplyr::filter(ActivityStartDate > "2014-12-31",
TADA.ComparableDataIdentifier ==
"SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25c"),
id_cols = "ATTAINS.assessmentunitname",
groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE),
title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>%
dplyr::filter(ActivityStartDate > "2014-12-31",
TADA.ComparableDataIdentifier ==
"SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25c"),
id_cols = "ATTAINS.assessmentunitname",
groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE),
title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))`

wokenny13 · 2024-07-01T17:08:42Z

Made some edits to the function for TADA_TwoCharacteristicScatterplot. Currently, I decided to stick with just one function rather than creating a new function. However, depending on thoughts with this update, a decision can be made if this should be separated into two different functions instead.

Required arguments : id_cols = "TADA.ComparableDataIdentifier" and groups, groups must contain exactly two Characteristic Names. Allowable value can be the same characteristic name repeated twice though if we only want to view one characteristic name, but compare it by another column value grouping.
id_cols2 and groups2 are now optional arguments. This allows for comparing two other groups within a column by a characteristic name or by two characteristic name. groups2 requires the input of two group names, likewise allowable value for groups2 can be the same group name repeated twice if desired.
If a characteristic name is repeated, then only one y-axis will be used.
If two characteristics by two additional groupings (ex. ATTAINS.assessmentunitname) are used, legend will create 4 legend names that differentiates by a coloring or shape. Inputs on what coloring and shapes should be considered.
Function may result in a characteristic not associated with an additional grouping, if this is the case, then the plot should still function, but it will not return any plots or legend for that subgroup, ie a legend with 1, 2, 3 or 4 returned names are possible. (needs to be validated).

`data <- data %>%
TADA_FindQCActivities(clean = TRUE) %>%
TADA_FlagMeasureQualifierCode(clean = TRUE)
data <- TADA_IDCensoredData(data)
ATTAINS_data <- TADA_GetATTAINS(data)
data <- ATTAINS_data$TADA_with_ATTAINS %>%

Remove geometry to reduce size of data set

sf::st_drop_geometry() %>%

Assing "Other" as name for unnamed assessment units

dplyr::mutate(ATTAINS.assessmentunitname =
ifelse(is.na(ATTAINS.assessmentunitname),
"Other", ATTAINS.assessmentunitname))`

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)"))

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "San Juan River (Navajo bnd at Hogback to Animas River)"))

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)"))

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "San Juan River (Navajo bnd at Hogback to Animas River)"))

hillarymarler added Tables&Figures Good First Issue Good issue for first time contributors Usability Module 1 MVP labels Jun 17, 2024

wokenny13 self-assigned this Jun 21, 2024

wokenny13 linked a pull request Jul 1, 2024 that will close this issue

485 tadacomparabledataidentifier is used for legend labels in tada twocharacteristicscatterplot uses a different column for id cols #490

Merged

hillarymarler closed this as completed in #490 Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

hillarymarler commented Jun 17, 2024

hillarymarler commented Jun 26, 2024

wokenny13 commented Jun 26, 2024 •

edited

Loading

hillarymarler commented Jun 26, 2024

wokenny13 commented Jul 1, 2024

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

Comments

hillarymarler commented Jun 17, 2024

hillarymarler commented Jun 26, 2024

wokenny13 commented Jun 26, 2024 • edited Loading

hillarymarler commented Jun 26, 2024

remove default plot features that are not applicable for a location comparison

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

remove default plot features that are not applicable for a location comparison

wokenny13 commented Jul 1, 2024

Remove geometry to reduce size of data set

Assing "Other" as name for unnamed assessment units

wokenny13 commented Jun 26, 2024 •

edited

Loading