KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

KinMutRF : a random forest classifier of sequence variants in the human protein kinase superfamily. / Pons, Tirso; Vazquez, Miguel; Matey-Hernandez, María Luisa; Brunak, Søren; Valencia, Alfonso; Izarzugaza, Jose Mg.

In: BMC Genomics, Vol. 17, No. Suppl 2, 396, 23.06.2016.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Pons, T, Vazquez, M, Matey-Hernandez, ML, Brunak, S, Valencia, A & Izarzugaza, JM 2016, 'KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily', BMC Genomics, vol. 17, no. Suppl 2, 396. https://doi.org/10.1186/s12864-016-2723-1

APA

Pons, T., Vazquez, M., Matey-Hernandez, M. L., Brunak, S., Valencia, A., & Izarzugaza, J. M. (2016). KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily. BMC Genomics, 17(Suppl 2), [396]. https://doi.org/10.1186/s12864-016-2723-1

Vancouver

Pons T, Vazquez M, Matey-Hernandez ML, Brunak S, Valencia A, Izarzugaza JM. KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily. BMC Genomics. 2016 Jun 23;17(Suppl 2). 396. https://doi.org/10.1186/s12864-016-2723-1

Author

Pons, Tirso ; Vazquez, Miguel ; Matey-Hernandez, María Luisa ; Brunak, Søren ; Valencia, Alfonso ; Izarzugaza, Jose Mg. / KinMutRF : a random forest classifier of sequence variants in the human protein kinase superfamily. In: BMC Genomics. 2016 ; Vol. 17, No. Suppl 2.

Bibtex

@article{df0fbb34beb34be88f1e69be5649a733,
title = "KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily",
abstract = "BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.RESULTS: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online ( http://kinmut2.bioinfo.cnio.es ). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2 .CONCLUSIONS: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.",
keywords = "Journal Article",
author = "Tirso Pons and Miguel Vazquez and Matey-Hernandez, {Mar{\'i}a Luisa} and S{\o}ren Brunak and Alfonso Valencia and Izarzugaza, {Jose Mg}",
year = "2016",
month = jun,
day = "23",
doi = "10.1186/s12864-016-2723-1",
language = "English",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",
number = "Suppl 2",

}

RIS

TY - JOUR

T1 - KinMutRF

T2 - a random forest classifier of sequence variants in the human protein kinase superfamily

AU - Pons, Tirso

AU - Vazquez, Miguel

AU - Matey-Hernandez, María Luisa

AU - Brunak, Søren

AU - Valencia, Alfonso

AU - Izarzugaza, Jose Mg

PY - 2016/6/23

Y1 - 2016/6/23

N2 - BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.RESULTS: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online ( http://kinmut2.bioinfo.cnio.es ). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2 .CONCLUSIONS: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

AB - BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.RESULTS: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online ( http://kinmut2.bioinfo.cnio.es ). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2 .CONCLUSIONS: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

KW - Journal Article

U2 - 10.1186/s12864-016-2723-1

DO - 10.1186/s12864-016-2723-1

M3 - Journal article

C2 - 27357839

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - Suppl 2

M1 - 396

ER -

ID: 165179941