Skip to content

uclhal/learning_resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Learning Resources for Clinical Data Science

This is an overview for those from a clinical background to begin a career in statistics / data science / machine learning. Re-training in this field requires that you unlearn most of what you have been taught about statistics. Forget tests of significance. Most importantly, it requires a significant comitment of time, energy and love. But it's extremely rewarding.

Path to Data Science (Short Version)

Below is a comprehensive path to learning data science, focussed around the R statistical computer language. We choose R because its powerful, free and has a thriving online community. We've tried a lot of different things and have listed what we feel were of most use. Everyone is different, so if you found something particularly useful, let us know so we can add it to the list. Ultimately the langugage doesn't matter, and Python is equally a good choice. Our recommendation would be to pick one language and become very proficient in it.

There is no shortcut to advancing in this field. Find a project, something that you feel passionate about, and get stuck in; learn by doing. A fantastic reasource to start is the tidyverse and R for data science, availible for free online here: http:https://r4ds.had.co.nz

Skill development (The Long Version)

Below is a list of some areas that are important to develop on your journey into clinical data science. These are aimed at someone who is interested and motivated, but from a clinical background, and so not formally trained in these methods. I have inlcuded links to amazon for books, but mostly just for reference. Ask for a link to a shared dropbox folder for electronic materials for the majority of them, plenty of others are published online for free. We also keep other copies in the lab which we are happy for you to borrow (please ask first).

Maths

  • The foundation of data science is rooted in probability and statistics. As clinicians this can be extremely limiting in the long run as we typically have at most an A-level in maths (often from a very long time ago). Key areas to focus on are:
    • Probability
    • Linear algebra
    • Statistical notation
  • It isn't necessary to become completely independently proficient in these areas. Rather, the focus should be to gain an intution for what is going on. Ultimately you want to become an applied researcher, but some core theory is required.
  • Once you have these basic tools, you will have the basic skills to understand the language of statistics and make the most out of your data (... and get away from hypothesis testing!). All that is needed here is a conceptual overview of how these things work.

The following resources are extremely useful:

Other resources for statistics:

Online Courses

There has been an explosion in the availibility of online courses. They vary from free to expensive, and their quality is often not related to cost. Here we compile a list of the best:

Name Subject Website Cost Rating
Essence of Linear Algebra Linear Algebra https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/playlists Free 5/5
The World of Maths General Maths at all levels www.khanacademy.com Free 5/5
Machine Learning - Coursera ML https://www.coursera.org/learn/machine-learning £58 Recommended
Maths for Machine Learning Linear Algebra https://www.coursera.org/learn/linear-algebra-machine-learning/home/welcome £48 4/5

R

Python

  • We need some recommendations here

UNIX

Databases

  • Need stuff here

Specific Statistical Topics

  • Machine Learning - "An Introduction to Statistical Learning", James, Spriner. Solid basic introduction to the common methods used in machine learning. Accessible, written for R and lots of practice questions.
  • Mixed Effects Models - "Mixed Effects Models and Extensions in Ecology with R" Zuur, Springer. Absolutely brilliant and accessible books to the topic of modelling correlated data. Be a little cautious of their model selection proceedures however, as they are questionable.

Interesting Reads

  • The Book of Why (causal inference)
  • Observation and Experiment
  • The Lady Tasting Tea, Salsburg (history of statistics)

Lab Meetings

  • We meet every Monday morning to discuss ongoing projects. Everyone is welcome to attend and watch or present. No matter the size, scale or complexity, we would encourge you to present any work as a way to crowd source solutions to problems and figure out what the next step might be.

About

The Path to Clinical Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%