CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

CoCoScore : Context-aware co-occurrence scoring for text mining applications using distant supervision. / Junge, Alexander; Jensen, Lars Juhl.

In: Bioinformatics, 2019.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Junge, A & Jensen, LJ 2019, 'CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision', Bioinformatics. https://doi.org/10.1093/bioinformatics/btz490

APA

Junge, A., & Jensen, L. J. (2019). CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz490

Vancouver

Junge A, Jensen LJ. CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision. Bioinformatics. 2019. https://doi.org/10.1093/bioinformatics/btz490

Author

Junge, Alexander ; Jensen, Lars Juhl. / CoCoScore : Context-aware co-occurrence scoring for text mining applications using distant supervision. In: Bioinformatics. 2019.

Bibtex

@article{eb7bccc429744c10897bca2590bbe4b0,
title = "CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision",
abstract = "MOTIVATION: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence.RESULTS: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications.AVAILABILITY: CoCoScore is available at: https://github.com/JungeAlexander/cocoscore.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Alexander Junge and Jensen, {Lars Juhl}",
year = "2019",
doi = "10.1093/bioinformatics/btz490",
language = "English",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",

}

RIS

TY - JOUR

T1 - CoCoScore

T2 - Context-aware co-occurrence scoring for text mining applications using distant supervision

AU - Junge, Alexander

AU - Jensen, Lars Juhl

PY - 2019

Y1 - 2019

N2 - MOTIVATION: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence.RESULTS: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications.AVAILABILITY: CoCoScore is available at: https://github.com/JungeAlexander/cocoscore.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence.RESULTS: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications.AVAILABILITY: CoCoScore is available at: https://github.com/JungeAlexander/cocoscore.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

U2 - 10.1093/bioinformatics/btz490

DO - 10.1093/bioinformatics/btz490

M3 - Journal article

C2 - 31199464

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

ER -

ID: 222693949