GitHub - userbz/DeMix-Q

DeMix-Q: Quantification-centered LC-MS/MS data processing

Dependencies:

OpenMS 2.0
- For Windows and macOS: https://www.openms.de/download/openms-binaries
- For Ubuntu 16.10 (yakkety yak): sudo apt-get install openms
Anaconda Python 2.7 or 3.5 pip install numpy pandas lxml pymzml pyteomics
- numpy
- pandas
- lxml
- pymzml https://pymzml.github.io/
- pyteomics https://pythonhosted.org//pyteomics/

Optional:

msconvert (ProteoWizard) https://proteowizard.sourceforge.net/index.shtml
MS-GF+ https://omics.pnl.gov/software/ms-gf

Procedures:

First of all, in order to enable external tools for XIC extraction:

Copy the consensus2edta.ttd config file to the OpenMS installation path (e.g. C:\Program Files\OpenMS-2.0\share\OpenMS\TOOLS\EXTERNAL).

After copying the configure file to the OpenMS path:

Load the DeMix_PoseCluster.toppas (new version, recommended) or the DeMix.toppas pipeline (old version) in the TOPPAS program.

for DeMix_PoseCluster.toppas:

Prepare centroid MS1 spectra in mzML format and load in "Node 1".
Load the corresponding MS/MS identification files (".idXML" format) in "Node 2". MS-GF+ resultant mzid files can be converted by the python script: mzid2idxml_theomass.py . (Note: the script only fetches RT information from spectrum titles that are generated by DeMix. In the case that you do not work with the DeMix identification workflow and MS-GF+, please revise it before use.)
In "Node 3", load the corresponding ".FeatureXML" files in (i.e. individual feature maps) that are previously generated by the FeatureFinderCentroided (for example, from the DeMix workflow).
In "Node 4", load ONE ".FeatureXML" file as the reference for retention time alignment. (Recommend to choose one of the files in Node 3 that has the largest number of features. RT scales of other runs will be re-calibrated to the scale of the reference run after alignment).
Modify "Node 14" in the pipeline, set the converter_script option to point to the python script: consensus2edta.py

alternatively, for DeMix.toppas:

Prepare centroid MS1 spectra in mzML format and load in "Node 1".
Load the corresponding MS/MS identification files (".idXML" format) in "Node 3". MS-GF+ resultant mzid files can be converted by the python script: mzid2idxml_theomass.py . (Note: the script only fetches RT information from spectrum titles that are generated by DeMix. In the case that you do not work with the DeMix identification workflow and MS-GF+, please revise it before use.)
In "Node 4", load ONE ".idXML" file as the reference for retention time alignment. (Recommend to choose one of the files in Node 3 that has the largest number of identifications. RT scales of other runs will be re-calibrated to the scale of the reference run after alignment.)
Modify "Node 17" in the pipeline, set the converter_script option to point to the python script: consensus2edta.py

After configuration,

Execute the pipeline in TOPPAS (by pressing F5).
Collect two output files from processes of TextExporter and EICExtractor.

Finally,

Apply the post processing Python script DeMixQ_data_processing.py in the script folder. Set parameters as follows:

-consensus The consensus feature map (default: consensus.csv) -eic The extracted ion-chromatography (default: eic.csv) -n_samples number of samples (default: 4) -n_repeats number of replicate experiments per sample (default: 3) -fdr Quality cutoff by estimating FDR from decoy extractions (default: 0.05) -knn_k K nearest neighbors for abundance correction, set 0 to disable correction (default: 5) -knn_step Number of sequential features used for estimating local medians (default: 500) -out Output table in csv format (default: peptides.DeMixQ.csv)

Alternatively, there is an example iPython notebook in the script folder, which can be executed step-by-step using Jupyter notebook (Python2.7).

Note:
The example workflow is tested with high resolution Orbitrap (FTMS 70,000) datasets. You may need to adjust the parameters to fit with your dataset.

Map RT alignment can be done without MS/MS identifications, or done with other tools then convert into the OpenMS-compatible format (i.e. trafoXML).

For a large dataset (e.g. >10 samples), feature detection and linking may require considerable computing resources: CPU, RAM and disk space.

Download the example data set
https://ki.box.com/s/15kv73pu2cvi4gwakzd6kdxcwch7dbl0

Reference

Zhang, B., Käll, L. & Zubarev, R. A., (2016). DeMix-Q: Quantification-centered Data Processing Workflow. Molecular & Cellular Proteomics, mcp.O115.055475 https://www.ncbi.nlm.nih.gov/pubmed/26729709
Zhang, B., Pirmoradian, M., Chernobrovkin, A., & Zubarev, R. A. (2014). DeMix Workflow for Efficient Identification of Co-fragmented Peptides in High Resolution Data-dependent Tandem Mass Spectrometry. Molecular & Cellular Proteomics : MCP. doi:10.1074/mcp.O114.038877 https://www.ncbi.nlm.nih.gov/pubmed/25100859

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
TOPP		TOPP
example		example
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeMix-Q: Quantification-centered LC-MS/MS data processing

Reference

About

Releases

Packages

Languages

userbz/DeMix-Q

Folders and files

Latest commit

History

Repository files navigation

DeMix-Q: Quantification-centered LC-MS/MS data processing

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages