Skip to content

The repository provides popular practices to increase the reproducibility of scientific work.

Notifications You must be signed in to change notification settings

gmu-cil/reproducibility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

General Practices for Reproducibility

Myeong Lee ([email protected])

This seminar focuses on popular scripting languages and tools for data analysis - Python, R, and web development platforms, and the management of them using Github. This covers only part of reproducibilty topics.

Basic premises

General Things to Concern for Reproducibility

  • Folder structure matters.
  • Markdown documentation
    • Userful tools: MacDown, MarkdownPad
    • Rmd: R Markdown
    • Jupyter: Supporting Markdown for Python and R
    • Github: Markdown is the default format for README files.
    • Using Markdown pages as index to ohter resources (e.g., a link in a MD file -> a Google Drive folder)
  • A project introduction webpage using Github
  • Docstring
  • Testing
  • Web-based presentations of the project
  • Development environment

General Folder Structure

+src
  -R
  -python
  -jupyter	
+doc
  -...Markdown documents
+data
  -input
  -results (empty)
+html
-vm (Vagrant, Docker, or other environment configuration files)

.gitignore (including confidential files and script results)
README.md (providing entry point to other resources and general descriptions)
LICENSE.md

R

  • .Rmd rather than .R
    • Good documentation of each code block.
    • Can export the overall work as a HTML file.
    • When running scripts on clouds, .R might work better.
  • Make functions if possible
  • R Docstring
  • In-line comments
  • Specify the R version correctly

Python

  • Jupyter for development along with Markdown comments (Anaconda)
  • Virtual environment for converting different versions of Python (e.g., Anaconda Tutorial)
  • Once a set of functions are completed and ready for distribution, convert them to .py with docstrings, and save them in a separate location so they can be used by just importing the package (e.g., src/python/)
  • Python Docstring
  • Automatic testing
  • In-line comments
  • Specify the Python version correctly (2.x and 3.x are a LOT different).

Web Applications

  • Specify PHP, Apache, Database, and Javascript versions correctly
  • Provide step-by-step instructions to set up the development environment.
  • A better way of making system configurations consistent: Virtual Machines + Auto Configurations
  • It's the best if you can provide software architecture docs
    • UML
    • Functions documentation.
  • In-line comments

Contributions are Welcomed

This seminar covers only part of the reproducibility topics, so any further practicies/concerns through pull requests are welcomed.

About

The repository provides popular practices to increase the reproducibility of scientific work.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages