Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Diseases 2.0 : a weekly updated database of disease-gene associations from text mining and data integration. / Grissa, Dhouha; Junge, Alexander; Oprea, Tudor I; Jensen, Lars Juhl.

In: Database : the journal of biological databases and curation, Vol. 2022, No. 2022, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Grissa, D, Junge, A, Oprea, TI & Jensen, LJ 2022, 'Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration', Database : the journal of biological databases and curation, vol. 2022, no. 2022. https://doi.org/10.1093/database/baac019

APA

Grissa, D., Junge, A., Oprea, T. I., & Jensen, L. J. (2022). Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. Database : the journal of biological databases and curation, 2022(2022). https://doi.org/10.1093/database/baac019

Vancouver

Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. Database : the journal of biological databases and curation. 2022;2022(2022). https://doi.org/10.1093/database/baac019

Author

Grissa, Dhouha ; Junge, Alexander ; Oprea, Tudor I ; Jensen, Lars Juhl. / Diseases 2.0 : a weekly updated database of disease-gene associations from text mining and data integration. In: Database : the journal of biological databases and curation. 2022 ; Vol. 2022, No. 2022.

Bibtex

@article{06bb2c3667cc4a12811d6b3e5d2c5db2,
title = "Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration",
abstract = "The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.",
author = "Dhouha Grissa and Alexander Junge and Oprea, {Tudor I} and Jensen, {Lars Juhl}",
note = "{\textcopyright} The Author(s) 2022. Published by Oxford University Press.",
year = "2022",
doi = "10.1093/database/baac019",
language = "English",
volume = "2022",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",
number = "2022",

}

RIS

TY - JOUR

T1 - Diseases 2.0

T2 - a weekly updated database of disease-gene associations from text mining and data integration

AU - Grissa, Dhouha

AU - Junge, Alexander

AU - Oprea, Tudor I

AU - Jensen, Lars Juhl

N1 - © The Author(s) 2022. Published by Oxford University Press.

PY - 2022

Y1 - 2022

N2 - The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

AB - The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

U2 - 10.1093/database/baac019

DO - 10.1093/database/baac019

M3 - Journal article

C2 - 35348648

VL - 2022

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

IS - 2022

ER -

ID: 303111049