I am a dedicated Data Scientist, who also has over 12 years of Life Science experience, based in Los Angeles with a passion for solving business problems through Data Science techniques and effectively communicating insights to stakeholders using visualizations. With a systematic and creative approach, I consistently aim to add tangible value to teams, businesses, and end-users. I am committed to continuous learning and self-improvement.
Programming: Python, SQL
Tools & Technologies: Git, AWS, Databricks
Data Manipulation & Analysis: Pandas, Numpy, Statsmodels, SciPy
Visualization: Tableau, Seaborn, Matplotlib
Machine Learning & Deep Learning: PySpark, SparkSQL, TensorFlow, XGBoost
Statistical Analysis: A/B Testing, Regression (Linear, Logistic), Classification, Clustering, PCA, Forecasting, Anomaly Detection
Model Interpretation: SHAP, Interpretability Techniques
Model Deployment: Docker, Flask, Streamlit
Research, Communication, Accountability, Initiative, Collaboration, Critical Thinking, Passion, Presentation, Project Delivery, Idea Generation
- My main portfolio website that contains the following data science projects
- Analyzed the impact of a "Delivery Club" special on sales for a grocery retailer using a Python causal impact library, revealing a 41.1% uplift in sales.
- Developed a predictive model for Parkinson's severity using boosted tree models with feature engineering, resulting in significant improvement in F1 score and recall.
- Analyzed customer buying patterns to uncover product relationships in alcohol retail, revealing insights about customer preferences.
- Predicted customer behavior using historical music sales data, achieving high accuracy through feature space compression and Random Forest.
- Segmented customers for a grocery chain to provide marketing insights based on dietary preferences.
- Created a dashboard for targeted marketing campaigns based on bank customer demographics.
- Boosted tree models for predicting the maximum severity of Parkinsons for clinical patients based on their protein and peptide mass spectrometry quantities.
- Explainable results using Feature Importance for each patient and a description of the top proteins.
- Uses an XGBoost Classifier to predict how likely an employee will leave based on the HR data.
- Provides SHAP values to explain the probability score assigned to the employee.
- Used DenseNet-201 pre-trained neural network and fine-tuned one hidden layer with 4096 nodes and 30% dropout for regularization.
- Solved imbalanced target distribution using class weights to the parameters for the neural network.
- Applied hyperparameter tuning of # of hidden layers, nodes, learning rate, batch size, learning rate decay, and momentum to find optimal values.
- Final Test Statistics:
- AUC: 0.990
- F1: 0.972
- Recall: 0.971
- Precision: 0.973