Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Can we add a setting to convert all missing character variable to "" instead of NA #2466

Closed
WangLaoK opened this issue Jun 18, 2024 · 9 comments
Labels
enhancement New feature or request programming

Comments

@WangLaoK
Copy link

Feature Idea

By default, admiral package will derive NA for missing character variables.
e.g.: if paramcd/param not mapped for a derive_vars_merged_lookup() function.

However, character variables should be "" instead of NA there.
could we add below code or add some global setting to ensure all characters variable use "" instead of NA.
na2blank <- function(x){x <- ifelse(is.character(x) & is.na(x),"",x)}

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

@WangLaoK WangLaoK added enhancement New feature or request programming labels Jun 18, 2024
@bundfussr
Copy link
Collaborator

Hi @WangLaoK ,

Such a function is already available: convert_na_to_blanks()

Please note that all admiral functions assume that a missing character value (in CDISC sense) is represented as NA. I.e., all input dataset must adhere to that. Otherwise, the results may be wrong.

@WangLaoK
Copy link
Author

Hi @bundfussr ,

I agree with you we treat blanks string and NA as different types in R environment, .

But I think if we convert the data to XPT(For submission), then they will be treated as "" for all. ( XPT don't have NA for object type).

Then this may cause a lot of messy in define.xml rule/derivation and not good for reviewer to see two or more conditions for a missing rule.

For example, when I want to do some post process to handle both NA and "" records from BASEC and derive another variable DTYPE
case_when(
is.na(BASEC) | BASEC == "" ~ "NONBASE", (If we have a setting and put all NA to blank, then only one condition here)
T ~ "BASE")

BR,
Hans

@bms63
Copy link
Collaborator

bms63 commented Jun 20, 2024

Hi @WangLaoK thanks for the Feature Request.

I'm wondering why you would allow NAs and blanks into a variable that represent the same thing? You should just pick one and I really recommend using NAs.

For submission - I would keep the derivations in the define language agnostic so an FDA Reviewer can use the tools in SAS or R to follow along with your derivation.

@bundfussr
Copy link
Collaborator

For example, when I want to do some post process to handle both NA and "" records from BASEC and derive another variable DTYPE
case_when(
is.na(BASEC) | BASEC == "" ~ "NONBASE", (If we have a setting and put all NA to blank, then only one condition here)
T ~ "BASE")

@WangLaoK , if you need to read in a xpt file, I would call convert_blanks_to_na directly after that. Then you can use a single condition (is.na()) and you are in line with the admiral functions.

@WangLaoK
Copy link
Author

WangLaoK commented Jun 21, 2024

Hi @bundfussr and @bms63 thank you for suggestions. But I still prefer to have a setting to avoid different type of missing value when we create ADaM variables.
I found that in source code derive_vars_merged() have a parameter may fit my request.
I try to call this missing_value but it doesn't work.
it seems it is not written in the function, could you please help to add this missing_value to other function?
image

@bundfussr
Copy link
Collaborator

Hi @bundfussr and @bms63 thank you for suggestions. But I still prefer to have a setting to avoid different type of missing value when we create ADaM variables.
I found that in source code derive_vars_merged() have a parameter may fit my request.

Hi @WangLaoK , some of the admiral functions like derive_vars_merged() or derive_var_merge_exist_flag() have a missing_value or missing_values argument. However, the intention of these arguments is different from yours. These arguments allow to specify a value for subjects which are not included in the additional dataset (dataset_add). By default all new variables are set to NA for subjects which are not included in the additional dataset. If these subjects should be included in a summary, NA isn't a good value. The missing_values argument allows to define a non-NA value like "Missing" or "Unknown" for them. In principle you could use the argument to set the value to "" but I would disadvise that for the reasons mentioned before.

I don't like to add a global setting for specifying whether NA or "" should be used as missing value because then we would need to update all functions and templates such that they either take the global setting into account or consider both NA and "" as missing value. This would be a lot of work and would make the code and templates less readable. It would also be an additional burden for the users as they would need to do the same in their code.

Just considering NA as a missing value seems much easier and less error-prone to me.

@pangchaoran
Copy link

Hi @WangLaoK ,

Such a function is already available: convert_na_to_blanks()

Please note that all admiral functions assume that a missing character value (in CDISC sense) is represented as NA. I.e., all input dataset must adhere to that. Otherwise, the results may be wrong.

@bundfussr , in the final R data (e.g. adsl.rds to be used in TFLs), do you also suggest to present missing as NA, which is more compatible with admiral or other pharmaverse packages?
In this case, in the review guidance submitted to HA, we may also need to suggest HA to use convert_blanks_to_na after read_xpt.

@bundfussr
Copy link
Collaborator

Hi @WangLaoK ,
Such a function is already available: convert_na_to_blanks()
Please note that all admiral functions assume that a missing character value (in CDISC sense) is represented as NA. I.e., all input dataset must adhere to that. Otherwise, the results may be wrong.

@bundfussr , in the final R data (e.g. adsl.rds to be used in TFLs), do you also suggest to present missing as NA, which is more compatible with admiral or other pharmaverse packages? In this case, in the review guidance submitted to HA, we may also need to suggest HA to use convert_blanks_to_na after read_xpt.

Yes, I think it makes sense to use the same approach in all programs (SDTMs, ADaMs, and TLFs).

I'm not sure if we need to add anything to the review guidance because it depends on which tools are used by the HA. If they have their own tools which represent a missing value as "", they shouldn't use convert_blanks_to_na(). If they use pharmaverse tools like admiral, they should use it. As this is already described in the documentation (Handling of Missing Values), I wouldn't add it to the review guidance.

@WangLaoK
Copy link
Author

Hi @bundfussr and @bms63 thank you for suggestions. But I still prefer to have a setting to avoid different type of missing value when we create ADaM variables.
I found that in source code derive_vars_merged() have a parameter may fit my request.

Hi @WangLaoK , some of the admiral functions like derive_vars_merged() or derive_var_merge_exist_flag() have a missing_value or missing_values argument. However, the intention of these arguments is different from yours. These arguments allow to specify a value for subjects which are not included in the additional dataset (dataset_add). By default all new variables are set to NA for subjects which are not included in the additional dataset. If these subjects should be included in a summary, NA isn't a good value. The missing_values argument allows to define a non-NA value like "Missing" or "Unknown" for them. In principle you could use the argument to set the value to "" but I would disadvise that for the reasons mentioned before.

I don't like to add a global setting for specifying whether NA or "" should be used as missing value because then we would need to update all functions and templates such that they either take the global setting into account or consider both NA and "" as missing value. This would be a lot of work and would make the code and templates less readable. It would also be an additional burden for the users as they would need to do the same in their code.

Just considering NA as a missing value seems much easier and less error-prone to me.

Hi @bundfussr , thank you so much for your time. I will follow the guidance when I use admiral packages. please close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request programming
Development

No branches or pull requests

4 participants