A Guide to Dictionary-Based Text Mining

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Standard

A Guide to Dictionary-Based Text Mining. / Cook, Helen V.; Jensen, Lars Juhl.

Bioinformatics and Drug Discovery. ed. / Richard S. Larson; Tudor I. Oprea. Vol. 1939 3. ed. Humana Press, 2019. p. 73-89 (Methods in Molecular Biology).

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Harvard

Cook, HV & Jensen, LJ 2019, A Guide to Dictionary-Based Text Mining. in RS Larson & TI Oprea (eds), Bioinformatics and Drug Discovery. 3 edn, vol. 1939, Humana Press, Methods in Molecular Biology, pp. 73-89. https://doi.org/10.1007/978-1-4939-9089-4_5

APA

Cook, H. V., & Jensen, L. J. (2019). A Guide to Dictionary-Based Text Mining. In R. S. Larson, & T. I. Oprea (Eds.), Bioinformatics and Drug Discovery (3 ed., Vol. 1939, pp. 73-89). Humana Press. Methods in Molecular Biology https://doi.org/10.1007/978-1-4939-9089-4_5

Vancouver

Cook HV, Jensen LJ. A Guide to Dictionary-Based Text Mining. In Larson RS, Oprea TI, editors, Bioinformatics and Drug Discovery. 3 ed. Vol. 1939. Humana Press. 2019. p. 73-89. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-9089-4_5

Author

Cook, Helen V. ; Jensen, Lars Juhl. / A Guide to Dictionary-Based Text Mining. Bioinformatics and Drug Discovery. editor / Richard S. Larson ; Tudor I. Oprea. Vol. 1939 3. ed. Humana Press, 2019. pp. 73-89 (Methods in Molecular Biology).

Bibtex

@inbook{908e1ce58ec24448b100deaa69e50bd7,
title = "A Guide to Dictionary-Based Text Mining",
abstract = "PubMed contains more than 27 million documents, and this number is growing at an estimated 4{\%} per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.",
author = "Cook, {Helen V.} and Jensen, {Lars Juhl}",
year = "2019",
doi = "10.1007/978-1-4939-9089-4_5",
language = "English",
isbn = "978-1-4939-9088-7",
volume = "1939",
series = "Methods in Molecular Biology",
publisher = "Humana Press",
pages = "73--89",
editor = "Larson, {Richard S.} and Oprea, {Tudor I.}",
booktitle = "Bioinformatics and Drug Discovery",
address = "United States",
edition = "3",

}

RIS

TY - CHAP

T1 - A Guide to Dictionary-Based Text Mining

AU - Cook, Helen V.

AU - Jensen, Lars Juhl

PY - 2019

Y1 - 2019

N2 - PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.

AB - PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.

U2 - 10.1007/978-1-4939-9089-4_5

DO - 10.1007/978-1-4939-9089-4_5

M3 - Book chapter

C2 - 30848457

SN - 978-1-4939-9088-7

VL - 1939

T3 - Methods in Molecular Biology

SP - 73

EP - 89

BT - Bioinformatics and Drug Discovery

A2 - Larson, Richard S.

A2 - Oprea, Tudor I.

PB - Humana Press

ER -

ID: 223876548