Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

Closed
8 tasks
hillarymarler opened this issue Jun 17, 2024 · 4 comments · Fixed by #490
Assignees
Labels

Comments

@hillarymarler
Copy link
Collaborator

Describe the bug

TADA.ComparableDataIdentifier is used for the legend labels in TADA_TwoCharacteristicScatterplot even when a different column has been selected for id_cols. This is confusing when comparing the same characteristic between two monitoring locations, for example:

image

To Reproduce

df <- dplyr::filter(Data_6Tribes_5y_Harmonized, TADA.ComparableDataIdentifier == "TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L")
#' # Creates a scatterplot including the two specified sites in the same plot:
TADA_TwoCharacteristicScatterplot(df, id_cols = "MonitoringLocationName", groups = c("Upper Red Lake: West", "Upper Red Lake: West-Central"))

Expected behavior

Labels for the legend should reflect the column selected in id_cols.

Bug fixes should include all the following work:

  • Create or edit the function/code.

  • Document all code using line/inline and/or multi-line/block comments
    to describe what is does.

  • Create or edit tests in tests/testthat folder to help prevent and/or
    troubleshoot potential future issues.

  • Create or edit the function documentation. Include working
    examples.

  • Update or add the new functionality to the appropriate vignette
    (or create new one).

  • If function/code edits made as part of this issue impact other
    functions in the package or functionality in the shiny app, ensure
    those are updated as well.

  • Run TADA_UpdateAllRefs(), TADA_UpdateExampleData(), styler::style_pkg(),
    devtools::document(), and devtools::check() and address any new notes or
    issues before creating a pull request.

  • Run more robust check for releases: devtools::check(manual = TRUE,
    remote = TRUE, incoming = TRUE)

@wokenny13 wokenny13 self-assigned this Jun 21, 2024
@hillarymarler
Copy link
Collaborator Author

Should we limit the columns that can be used for id_cols?

@wokenny13
Copy link
Collaborator

wokenny13 commented Jun 26, 2024

A limitation on column arguments for id_cols could be good to consider.

Just a thought process for this function: If a different column, like the monitoring location name, is being compared, and not two characteristics (like for this example when just TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L is being compared for two monitoring location, would the function TADA_TwoCharacteristicScatterplot be a bit 'misleading' as it's only a single characteristic? Since both y axis are based on the same characteristic, wouldn't a single y column be sufficient if there's only one characteristic, or should there be considerations on keeping the same scale when it is the same characteristic (ex. 0 to 100 MG/L for both y-axis 1 and y-axis 2?).
newplot

Above is the view on the legend labels when id_cols are for monitoring location names to try to address this issue.

@hillarymarler
Copy link
Collaborator Author

Below is an example of a scatterplot I modified for a demo. See (https://usepa.github.io/EPATADA/articles/TADAWaterSciConWorkshopDemo.html) if you want to see how to create the data set used in the example below.

Maybe it would be possible to conditionally remove the 2nd y-axis if the same characteristic is plotted in both traces? But base the scale on both traces?

What do you think of changing the name to TADA_TwoGroupScatterplot? That might be more descriptive if we are making it flexible enough to accommodate different id_cols inputs.

Or is this starting to get so convoluted that it may make sense to create a separate function for comparing different locations rather than different characteristics? We could discuss tomorrow.

`# create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot
twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>%
dplyr::filter(ActivityStartDate > "2014-12-31",
TADA.ComparableDataIdentifier ==
"SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25c"),
id_cols = "ATTAINS.assessmentunitname",
groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE),
title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>%
dplyr::filter(ActivityStartDate > "2014-12-31",
TADA.ComparableDataIdentifier ==
"SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25c"),
id_cols = "ATTAINS.assessmentunitname",
groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE),
title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))`

image

@wokenny13
Copy link
Collaborator

Made some edits to the function for TADA_TwoCharacteristicScatterplot. Currently, I decided to stick with just one function rather than creating a new function. However, depending on thoughts with this update, a decision can be made if this should be separated into two different functions instead.

  1. Required arguments : id_cols = "TADA.ComparableDataIdentifier" and groups, groups must contain exactly two Characteristic Names. Allowable value can be the same characteristic name repeated twice though if we only want to view one characteristic name, but compare it by another column value grouping.
  2. id_cols2 and groups2 are now optional arguments. This allows for comparing two other groups within a column by a characteristic name or by two characteristic name. groups2 requires the input of two group names, likewise allowable value for groups2 can be the same group name repeated twice if desired.
  3. If a characteristic name is repeated, then only one y-axis will be used.
  4. If two characteristics by two additional groupings (ex. ATTAINS.assessmentunitname) are used, legend will create 4 legend names that differentiates by a coloring or shape. Inputs on what coloring and shapes should be considered.
  5. Function may result in a characteristic not associated with an additional grouping, if this is the case, then the plot should still function, but it will not return any plots or legend for that subgroup, ie a legend with 1, 2, 3 or 4 returned names are possible. (needs to be validated).

`data <- data %>%
TADA_FindQCActivities(clean = TRUE) %>%
TADA_FlagMeasureQualifierCode(clean = TRUE)
data <- TADA_IDCensoredData(data)
ATTAINS_data <- TADA_GetATTAINS(data)
data <- ATTAINS_data$TADA_with_ATTAINS %>%

Remove geometry to reduce size of data set

sf::st_drop_geometry() %>%

Assing "Other" as name for unnamed assessment units

dplyr::mutate(ATTAINS.assessmentunitname =
ifelse(is.na(ATTAINS.assessmentunitname),
"Other", ATTAINS.assessmentunitname))`

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)"))
Mod3-1 Char-2 location

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "San Juan River (Navajo bnd at Hogback to Animas River)"))
Mod3-2 Char-1 location

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C", "TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)"))
Mod3-2 Char-2 location

TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier %in% c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","TOTAL DISSOLVED SOLIDS_DISSOLVED_NA_UG/L")), groups = c("SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C","SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols2 = "ATTAINS.assessmentunitname", groups2 = c("San Juan River (Navajo bnd at Hogback to Animas River)", "San Juan River (Navajo bnd at Hogback to Animas River)"))
Mod3-1 Char-1 location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants