We have developed an end-to-end pipeline for RNA-Seq and protein structure prediction that utilizes BigQuery and Vertex AI to efficiently handle and process terabyte-scale data. We hope to provide insights into how Google Cloud can be used to tackle computational challenges in modern biology and medicine, ultimately paving the way for new discoveries and innovations.
FASTQ files are sourced from a public NCBI dataset GSE181830.
The steps of the RNA-Seq pipeline are:
- Adapter and quality trimming with Trim Galore
- Quality control readout with FastQC
- Estimation of gene and isoform expression with RSEM
- Write of gene and isoform expression data to BigQuery.