Skip to content
View dagartga's full-sized avatar

Block or report dagartga

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dagartga/README.md

Hi there 👋

I am a dedicated Data Scientist, who also has over 12 years of Life Science experience, based in Los Angeles with a passion for solving business problems through Data Science techniques and effectively communicating insights to stakeholders using visualizations. With a systematic and creative approach, I consistently aim to add tangible value to teams, businesses, and end-users. I am committed to continuous learning and self-improvement.

Technical Skills

Programming: Python, SQL

Tools & Technologies: Git, AWS, Databricks

Data Manipulation & Analysis: Pandas, Numpy, Statsmodels, SciPy

Visualization: Tableau, Seaborn, Matplotlib

Machine Learning & Deep Learning: PySpark, SparkSQL, TensorFlow, XGBoost

Statistical Analysis: A/B Testing, Regression (Linear, Logistic), Classification, Clustering, PCA, Forecasting, Anomaly Detection

Model Interpretation: SHAP, Interpretability Techniques

Model Deployment: Docker, Flask, Streamlit


Soft Skills

Research, Communication, Accountability, Initiative, Collaboration, Critical Thinking, Passion, Presentation, Project Delivery, Idea Generation


Github Projects

Data Science Portfolio Website

  • My main portfolio website that contains the following data science projects
  • Analyzed the impact of a "Delivery Club" special on sales for a grocery retailer using a Python causal impact library, revealing a 41.1% uplift in sales.
  • Developed a predictive model for Parkinson's severity using boosted tree models with feature engineering, resulting in significant improvement in F1 score and recall.
  • Analyzed customer buying patterns to uncover product relationships in alcohol retail, revealing insights about customer preferences.
  • Predicted customer behavior using historical music sales data, achieving high accuracy through feature space compression and Random Forest.
  • Segmented customers for a grocery chain to provide marketing insights based on dietary preferences.
  • Created a dashboard for targeted marketing campaigns based on bank customer demographics.

Streamlit Apps

  • Boosted tree models for predicting the maximum severity of Parkinsons for clinical patients based on their protein and peptide mass spectrometry quantities.
  • Explainable results using Feature Importance for each patient and a description of the top proteins.
  • Uses an XGBoost Classifier to predict how likely an employee will leave based on the HR data.
  • Provides SHAP values to explain the probability score assigned to the employee.

Deep Learning Project

  • Used DenseNet-201 pre-trained neural network and fine-tuned one hidden layer with 4096 nodes and 30% dropout for regularization.
  • Solved imbalanced target distribution using class weights to the parameters for the neural network.
  • Applied hyperparameter tuning of # of hidden layers, nodes, learning rate, batch size, learning rate decay, and momentum to find optimal values.
  • Final Test Statistics:
    • AUC: 0.990
    • F1: 0.972
    • Recall: 0.971
    • Precision: 0.973

Pinned Loading

  1. Boosted-Models-for-Parkinsons-Prediction Boosted-Models-for-Parkinsons-Prediction Public

    Comparison of XGBoost, CatBoost, and LGBM for Parkinsons Classification

    Jupyter Notebook

  2. data-science-portfolio data-science-portfolio Public

    Projects to showcase my data science skills

    Python

  3. Salifort_Motors_Employee_Retention_Project Salifort_Motors_Employee_Retention_Project Public

    The Capstone project for the Google Advanced Data Analytics Certificate

    Jupyter Notebook

  4. Transfer_Learning_X-Ray_Classification Transfer_Learning_X-Ray_Classification Public

    Compared 4 pre-trained CNN models for Pneumonia Detection. Fine-tuned the DenseNet-201 model.

    Jupyter Notebook