Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles. / Pan, Xiaoyong; Jensen, Lars Juhl; Gorodkin, Jan.

In: Bioinformatics, Vol. 35, No. 9, 2019, p. 1494-1502.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Pan, X, Jensen, LJ & Gorodkin, J 2019, 'Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles', Bioinformatics, vol. 35, no. 9, pp. 1494-1502. https://doi.org/10.1093/bioinformatics/bty859

APA

Pan, X., Jensen, L. J., & Gorodkin, J. (2019). Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles. Bioinformatics, 35(9), 1494-1502. https://doi.org/10.1093/bioinformatics/bty859

Vancouver

Pan X, Jensen LJ, Gorodkin J. Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles. Bioinformatics. 2019;35(9):1494-1502. https://doi.org/10.1093/bioinformatics/bty859

Author

Pan, Xiaoyong ; Jensen, Lars Juhl ; Gorodkin, Jan. / Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles. In: Bioinformatics. 2019 ; Vol. 35, No. 9. pp. 1494-1502.

Bibtex

@article{812713edfd7d423fb29b6a658533348f,
title = "Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles",
abstract = "Motivation: Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative.Results: In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard data set and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent data sets.Availability: https://github.com/xypan1232/DislncRF.Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Xiaoyong Pan and Jensen, {Lars Juhl} and Jan Gorodkin",
year = "2019",
doi = "10.1093/bioinformatics/bty859",
language = "English",
volume = "35",
pages = "1494--1502",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "9",

}

RIS

TY - JOUR

T1 - Inferring disease-associated long non-coding RNAs using genome - wide tissue expression profiles

AU - Pan, Xiaoyong

AU - Jensen, Lars Juhl

AU - Gorodkin, Jan

PY - 2019

Y1 - 2019

N2 - Motivation: Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative.Results: In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard data set and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent data sets.Availability: https://github.com/xypan1232/DislncRF.Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative.Results: In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard data set and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent data sets.Availability: https://github.com/xypan1232/DislncRF.Supplementary information: Supplementary data are available at Bioinformatics online.

U2 - 10.1093/bioinformatics/bty859

DO - 10.1093/bioinformatics/bty859

M3 - Journal article

C2 - 30295698

VL - 35

SP - 1494

EP - 1502

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 9

ER -

ID: 203559337