# Selecting Patient for Diabetes Drug Testing from EHR Data **Context**: Working as data scientist for an exciting unicorn healthcare startup that has created a groundbreaking diabetes drug that is ready for Phase III clinical trial testing. It is a very unique and sensitive drug that requires administering and screening the drug over at least 5-7 days of time in the hospital with frequent monitoring/testing and patient medication adherence training with a mobile application. You have been provided a patient dataset from a client partner and are tasked with building a predictive model that can identify which type of patients the company should focus their efforts testing this drug on. Target patients are people that are likely to be in the hospital for this duration of time and will not incur significant additional costs for administering this drug to the patient and monitoring. In order to achieve your goal you must build a regression model that can predict the estimated hospitalization time for a patient and use this to select/filter patients for your study. **Expected Hospitalization Time Regression Model:** Utilizing a synthetic dataset(denormalized at the line level augmentation) built off of the UCI Diabetes readmission dataset, students will build a regression model that predicts the expected days of hospitalization time and then convert this to a binary prediction of whether to include or exclude that patient from the clinical trial. This project will demonstrate the importance of building the right data representation at the encounter level, with appropriate filtering and preprocessing/feature engineering of key medical code sets. This project will also require students to analyze and interpret their model for biases across key demographic groups. ### Dataset Due to healthcare PHI regulations (HIPAA, HITECH), there are limited number of publicly available datasets and some datasets require training and approval. So, for the purpose of this exercise, we are using a dataset from UC Irvine that has been modified for this course. Please note that it is limited in its representation of some key features such as diagnosis codes which are usually an unordered list in 835s/837s (the HL7 standard interchange formats used for claims and remits). - https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008 ## Getting Started Follow the instructions in `starter_code/student_project.ipynb` and be sure to set up your Anaconda environment to get started! ### Dependencies Using Anaconda consists of the following: 1. Install [`miniconda`](http://conda.pydata.org/miniconda.html) on your computer, by selecting the latest Python version for your operating system. If you already have `conda` or `miniconda` installed, you should be able to skip this step and move on to step 2. 2. Create and activate * a new `conda` [environment](http://conda.pydata.org/docs/using/envs.html). \* Each time you wish to work on any exercises, activate your `conda` environment! --- ## 1. Installation **Download** the latest version of `miniconda` that matches your system. | | Linux | Mac | Windows | |--------|-------|-----|---------| | 64-bit | [64-bit (bash installer)][lin64] | [64-bit (bash installer)][mac64] | [64-bit (exe installer)][win64] | 32-bit | [32-bit (bash installer)][lin32] | | [32-bit (exe installer)][win32] [win64]: https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe [win32]: https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86.exe [mac64]: https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh [lin64]: https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh [lin32]: https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh **Install** [miniconda](http://conda.pydata.org/miniconda.html) on your machine. Detailed instructions: - **Linux:** http://conda.pydata.org/docs/install/quick.html#linux-miniconda-install - **Mac:** http://conda.pydata.org/docs/install/quick.html#os-x-miniconda-install - **Windows:** http://conda.pydata.org/docs/install/quick.html#windows-miniconda-install ## 2. Create and Activate the Environment For Windows users, these following commands need to be executed from the **Anaconda prompt** as opposed to a Windows terminal window. For Mac, a normal terminal window will work. #### Git and version control These instructions also assume you have `git` installed for working with Github from a terminal window, but if you do not, you can download that first with the command: ``` conda install git ``` If you'd like to learn more about version control and using `git` from the command line, take a look at our [free course: Version Control with Git](https://www.udacity.com/course/version-control-with-git--ud123). **Now, we're ready to create our local environment!** 1. Clone the repository, and navigate to the downloaded folder. This may take a minute or two to clone due to the included image data. ``` git clone https://github.com/udacity/nd320-c1-emr-data-starter.git cd nd320-c1-emr-data-starter/project/ ``` 2. Create (and activate) a new environment, named `udacity-ehr-env` with Python 3.7. If prompted to proceed with the install `(Proceed [y]/n)` type y. - __Linux__ or __Mac__: ``` conda create -n udacity-ehr-env python=3.7 source activate udacity-ehr-env ``` - __Windows__: ``` conda create --name udacity-ehr-env python=3.7 activate udacity-ehr-env ``` At this point your command line should look something like: `(udacity-ehr-env) :USER_DIR $`. The `(udacity-ehr-env)` indicates that your environment has been activated, and you can proceed with further package installations. 6. Install a few required pip packages, which are specified in the requirements text file. Be sure to run the command from the project root directory since the requirements.txt file is there. I also added a line for installing the environment in your notebook in case this is new for you. You should be able to now look for the environment when you select the kernel. ``` pip install -r requirements.txt ipython3 kernel install --name udacity-ehr-env --user ``` ## License This project is licensed under the MIT License - see the [LICENSE.md]()