A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics. / Luo, Xiyang; Bittremieux, Wout; Griss, Johannes; Deutsch, Eric W; Sachsenberg, Timo; Levitsky, Lev I; Ivanov, Mark V; Bubis, Julia A; Gabriels, Ralf; Webel, Henry; Sanchez, Aniel; Bai, Mingze; Käll, Lukas; Perez-Riverol, Yasset.
In: Journal of Proteome Research, Vol. 21, No. 6, 2022, p. 1566-1574.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics
AU - Luo, Xiyang
AU - Bittremieux, Wout
AU - Griss, Johannes
AU - Deutsch, Eric W
AU - Sachsenberg, Timo
AU - Levitsky, Lev I
AU - Ivanov, Mark V
AU - Bubis, Julia A
AU - Gabriels, Ralf
AU - Webel, Henry
AU - Sanchez, Aniel
AU - Bai, Mingze
AU - Käll, Lukas
AU - Perez-Riverol, Yasset
PY - 2022
Y1 - 2022
N2 - Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
AB - Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
KW - Algorithms
KW - Cluster Analysis
KW - Consensus
KW - Databases, Protein
KW - Proteomics/methods
KW - Software
KW - Tandem Mass Spectrometry/methods
U2 - 10.1021/acs.jproteome.2c00069
DO - 10.1021/acs.jproteome.2c00069
M3 - Journal article
C2 - 35549218
VL - 21
SP - 1566
EP - 1574
JO - Journal of Proteome Research
JF - Journal of Proteome Research
SN - 1535-3893
IS - 6
ER -
ID: 311609614