Quick and clean: Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Quick and clean : Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search. / Niu, Lili; Mann, Matthias.

In: EuPA Open Proteomics, Vol. 22-23, 2019, p. 30-35.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Niu, L & Mann, M 2019, 'Quick and clean: Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search', EuPA Open Proteomics, vol. 22-23, pp. 30-35. https://doi.org/10.1016/j.euprot.2019.07.010

APA

Niu, L., & Mann, M. (2019). Quick and clean: Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search. EuPA Open Proteomics, 22-23, 30-35. https://doi.org/10.1016/j.euprot.2019.07.010

Vancouver

Niu L, Mann M. Quick and clean: Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search. EuPA Open Proteomics. 2019;22-23:30-35. https://doi.org/10.1016/j.euprot.2019.07.010

Author

Niu, Lili ; Mann, Matthias. / Quick and clean : Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search. In: EuPA Open Proteomics. 2019 ; Vol. 22-23. pp. 30-35.

Bibtex

@article{0475089dd44445a2825f0b8ef8637cff,
title = "Quick and clean: Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search",
abstract = "In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E. coli in such a way that the amino acid sequence encodes two concatenated English sentences. The letters 'O' and 'U' in the sentence are both replaced by 'K' in the protein. The sequence cannot be found online and carried to-be-discovered modifications. With limited information in hand, to solve the challenge, we developed a workflow consisting of bottom-up proteomics, de novo sequencing and a bioinformatics pipeline for data processing and searching for frequently appearing words. We assembled a complete first question: {"}Have you ever wondered what the most fundamental limitations in life are?{"} and validated the result by sequence database search against a customized FASTA file. We also searched the spectra against an E. coli proteome database and found close to 600 endogenous, co-purified E. coli proteins and contaminants introduced during sample handling, which made the inference of the sentence very challenging. We conclude that E. coli can express English sentences, and that de novo sequencing combined with clever sequence database search strategies is a promising tool for the identification of uncharacterized proteins.",
author = "Lili Niu and Matthias Mann",
year = "2019",
doi = "10.1016/j.euprot.2019.07.010",
language = "English",
volume = "22-23",
pages = "30--35",
journal = "EuPA Open Proteonomics",
issn = "2212-9685",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Quick and clean

T2 - Cracking sentences encoded in E. coli by LC-MS/MS, de novo sequencing, and dictionary search

AU - Niu, Lili

AU - Mann, Matthias

PY - 2019

Y1 - 2019

N2 - In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E. coli in such a way that the amino acid sequence encodes two concatenated English sentences. The letters 'O' and 'U' in the sentence are both replaced by 'K' in the protein. The sequence cannot be found online and carried to-be-discovered modifications. With limited information in hand, to solve the challenge, we developed a workflow consisting of bottom-up proteomics, de novo sequencing and a bioinformatics pipeline for data processing and searching for frequently appearing words. We assembled a complete first question: "Have you ever wondered what the most fundamental limitations in life are?" and validated the result by sequence database search against a customized FASTA file. We also searched the spectra against an E. coli proteome database and found close to 600 endogenous, co-purified E. coli proteins and contaminants introduced during sample handling, which made the inference of the sentence very challenging. We conclude that E. coli can express English sentences, and that de novo sequencing combined with clever sequence database search strategies is a promising tool for the identification of uncharacterized proteins.

AB - In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E. coli in such a way that the amino acid sequence encodes two concatenated English sentences. The letters 'O' and 'U' in the sentence are both replaced by 'K' in the protein. The sequence cannot be found online and carried to-be-discovered modifications. With limited information in hand, to solve the challenge, we developed a workflow consisting of bottom-up proteomics, de novo sequencing and a bioinformatics pipeline for data processing and searching for frequently appearing words. We assembled a complete first question: "Have you ever wondered what the most fundamental limitations in life are?" and validated the result by sequence database search against a customized FASTA file. We also searched the spectra against an E. coli proteome database and found close to 600 endogenous, co-purified E. coli proteins and contaminants introduced during sample handling, which made the inference of the sentence very challenging. We conclude that E. coli can express English sentences, and that de novo sequencing combined with clever sequence database search strategies is a promising tool for the identification of uncharacterized proteins.

U2 - 10.1016/j.euprot.2019.07.010

DO - 10.1016/j.euprot.2019.07.010

M3 - Journal article

C2 - 31890553

VL - 22-23

SP - 30

EP - 35

JO - EuPA Open Proteonomics

JF - EuPA Open Proteonomics

SN - 2212-9685

ER -

ID: 239207726