Exploring qualitative indicators via text mining methods.
This repository contains the source code. The data used in the analysis can only be accessed by using ActivityInfo with proper permissions.
The website is generated by using bookdown. Here are the common steps to generate the analysis are explained below. For more detailed information about how bookdown works, it is advised to take a look at the (online) book.
Yihui Xie (2019). bookdown: Authoring Books and Technical Documents with R Markdown. Chapman and Hall/CRC. ISBN 978-1138700109 https://bookdown.org/yihui/bookdown/
The call below authenticates the current user for the requests to get the data from ActivityInfo. See Notes section for more details.
activityinfo::activityInfoLogin()
Source etl.R
file in the R/
directory to pull the data from
ActivityInfo API and process it to make it ready for analyses. At the
end of the pull, a JSON file containing the data will be saved in the
data/
directory.
Render RMarkdown files are placed in
the analysis/
directory.
For instance, you create a new file called, e.g.
analysis/new-section.Rmd
.
Then, add this new file name to the analysis/_bookdown.yml
inside the
rmd_files
array. The place where you add the path in the array is
important because the array orders the sections of the notebook.
Call this command in R console in order to render the bookdown site:
source("R/render.R"); render_bookdown()
Rendering does not require any connection with the ActivityInfo API
but the JSON data file must exist in the data/
directory.
You will see in the git status that there are a bunch of new files
created inside the docs/
folder because the rendered site is set to
live in the docs/
folder in GitHub (see
why).
Once you are happy with the changes in local, commit the rendered files
in the docs/
folder and push them to the remote. You can use the
following command in your shell:
git add docs/
git commit -m "Render site"
git push origin master
-
The data can be accessed with ActivityInfo API by using user credentials. The ActivityInfo R Language Client provides good documentation.
-
dplyr package is chosen as it is useful for rapid ad-hoc analyses. The selected analysis code can be rewritten in base R, which is proven to be more robust and stable for production environments, towards the end of the project.