Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language selection in country files #11

Open
nickdickinson opened this issue Apr 9, 2024 · 1 comment
Open

Language selection in country files #11

nickdickinson opened this issue Apr 9, 2024 · 1 comment
Assignees

Comments

@nickdickinson
Copy link
Member

@donmezayca
There was an issue introduced with scraping data from country files because of the way that the language selection is implemented. At the moment the column names are translated automatically when a new language is selected. It is difficult in the R script to switch the language. Making it easier to scrape these files would be good for a number of reasons:

  • ETL scripts used by jmpwashdata, the JMP team, and others should have predictable column names/codes
  • There shouldn't be a dependency on running a heavy piece of software like Excel

At the moment, solutions are:

  • hardcoding column names and ranges in the R script, but with the disadvantage that we cannot check when there are mistakes (because a column was shifted for example). This requires coordination with JMP to make sure the assumed fixed columns are indeed correct.
  • adding a raw data sheet that only has the data + column code names that is used as the data source both in the Excel sheet and by ETL scripts (this requires an action by JMP).
@nickdickinson nickdickinson self-assigned this Apr 9, 2024
@nickdickinson
Copy link
Member Author

Interestingly this is happening because we are extracting the data from "Chart Data" instead of from "Data Summary". In Data Summary, this is already fixed and must be the source used by the Stata files as the hidden column names are not translated. The chart column names are translated, perhaps for the charts translations but probably isn't needed because there are the same column names with readable variable names above it.

For now, the workaround in jmpwashdata is to manually rename the first three columns that are the only ones translated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant