I am a Data Scientist, with experience in machine learning, computer science, statistics, mathematics, deep learning, data visualization, and communication in different STEM disciplines.
Ph.D. in Machine Learning applied to Computational Immunology in the department of Infection and Immunity at UCL, doctorate founded by Microsoft Research Cambridge.
Double background in computer science (M.Sc.) and molecular biology (B.Sc.). Extensive research experience gained in several international research laboratories across all my academic career. Motivated and interested in challenging and stimulating new goals.
-
Team Leader Data Scientist at Department of Health and Social Care (contractor position) (Nov 2020 - Sep 2021)
- Role: Team Leader in the Test Analytics and Modelling
- Team: DHSC/T&T/Test Demand Modelling group Under
- I'm the team leader for the group responsible for the creation and maintaining of the Python software that has enabled a reliable meta-analysis of the Covid-19 tests performed in England and, the creation of the weekly reporting on the prediction of testing for the government. My work in the department validated the use of custom created Python software for the pandemic effort, an approach that has been later adopted throughout the DHSC
-
Data Scientist at Sensyne Health (contractor position) (March 2020 – July 2020)
- Analysis of NHS Medical Database for patients affected by Covid-19, resulting in a near-live ML prediction of patient outcome from lab data. Deployment of such code in two partnership hospitals.
-
- Early risk assessment for COVID-19 patients from emergency department data using machine learning Nature - Scientific Reports 2021
- COVID-19 patient outcome assessment using selected features from emergency department data and feed-forward neural networks MedAI conference 2020
- Patent resulting from above works
-
Data Scientist at Intellegens (Dec 2019 – March 2020)
- Role: To build the company’s data analysis and feature engineering toolboxes
- ETL and analyses of big data databases and presentation of the results to the business’ clients
- Direct interaction with sales, clients to design and develop innovative solutions to customer needs
-
Computational Scientist at Inivata (Jan 2017 – Nov 2019)
- Role: To use main computational statistical research for the analysis of cancer genomics data, biomarker development, optimisation of Inivata’s technology and analysis of clinical data in collaboration with Inivata’s partners.
- Development of Inivata’s liquid biopsy technology, on the analysis of NGS for genomics lung cancer
- Coding, testing, design of R packages and optimization of the production pipeline in Python and reporting in R/Latex for clinical decision support.
Doctor of Philosophy in Infection & Immunity
University College London (UCL), United Kingdom, 2013 – 2017
Field Of Study: Computational Immunology
-
Thesis title: Analysis of murine CDR3β repertoires using machine learning techniques. (2018)
-
Role: I analysed the role and mechanisms of the CDR3 sequence, a short protein region present on the T cell receptor, using state of the art of machine learning methods. The intent of my research was to apply and develop methods for the classification of CDR3 repertoires and, though the classification process, identify the amino acids and positions that play a major role in the mechanism of antigen recognition.
Founded by Microsoft 2013 PhD Scholarship Programme in EMEA Microsoft scholarships consist of an annual bursary up to a maximum of three years for PhD supervisors and students to do collaborative research projects with Microsoft Research Cambridge.
- Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence. (Bioinformatics 2014)
- Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires. (Bioinformatics 2017)
- Specificity, Privacy, and Degeneracy in the CD4 T Cell Receptor Repertoire Following Immunization. (Front Immunol 2017)
Master’s Degree (MSc) in Bioinformatics
Alma Mater Studiorum – University of Bologna, Italy, 2011 – 2013
Field Of Study: Bioinformatics
- Thesis title: Simulating gene co-regulatory networks in the development of B-cells.
Erasmus and thesis completion: University of Tampere, Finland. January-July 2013 Thesis in B cell genomics modelling using Random Neural Network in the Computational Biology Group, Institute of Biomedical Technology, University of Tampere (Finland)
The 10th International Workshop on Computational Systems Biology, WCSB 2013, June 10-12, Tampere, Finland
Bachelor’s Degree (BSc) Molecular Biology
Università degli Studi di Padova, Italy, 2009 – 2011 and Università degli Studi dell’Aquila, Italy, 2007 – 2009
Field Of Study: Molecular Biology / Biotechnology
- Thesis title: In Silico study on ADAM22 and its interactions with LGI1 and KV1.1 (Molecular proteomics), in the lab of Professor Tosatto
- SOLID Coding in Python
- Multiple Bayesian Tests in Row
- Bayes’ Theorem for Medical Tests
- Hidden Markov Model for Biological Sequence
- How to process Bio-Sequences for use in Data Science
- A Tutorial on Luigi, the Spotify’s Pipeline
- How Probability Calibration Works
- Best free API for weather records: ERA5!
- Logging in Python
- How to do a Sankey Plot in Python
- How to combine LaTeX and R for report generation
In this Kaggle competion a set of 5,863 chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients, between the age of one to five years old, in Guangzhou. All chest X-ray imaging was performed as part of the patient's routine clinical care.
All chest radiographs were screened by two expert physicians for quality control and removing all low quality or unreadable scans.
In the picture below are show the three type of chest X-ray present in the database:
A fully published Python Packege
This Python package handles user created color palettes scheme to be used in Python plotting libraries (matplotlib, seaborn, etch).
Users can use specific colors in accordance with their taste or company guidelines that are different from the defaults in matplotlib. Therefore, mypalette helps to create, store, and use color palettes.
It can use a text file from https://coolors.co or a list of hexadecimal codes and saves a JSON format, comprising of colors: name, hexadecimal and RGB codes.
Example of Data Visualization
If you have been more than five seconds on r/dataisbeautiful/, you will have probably encountered a Sankey plot. Everyone uses to track their expenses, job searching and every multi step processes. Indeed, it is very suitable to visualize the progression of events and their outcome.And in my opinion, they look great!