A Guide to Dictionary-Based Text Mining
Research output: Chapter in Book/Report/Conference proceeding › Book chapter › Research › peer-review
Standard
A Guide to Dictionary-Based Text Mining. / Cook, Helen V.; Jensen, Lars Juhl.
Bioinformatics and Drug Discovery. ed. / Richard S. Larson; Tudor I. Oprea. Vol. 1939 3. ed. Humana Press, 2019. p. 73-89 (Methods in Molecular Biology).Research output: Chapter in Book/Report/Conference proceeding › Book chapter › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - CHAP
T1 - A Guide to Dictionary-Based Text Mining
AU - Cook, Helen V.
AU - Jensen, Lars Juhl
PY - 2019
Y1 - 2019
N2 - PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.
AB - PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.
U2 - 10.1007/978-1-4939-9089-4_5
DO - 10.1007/978-1-4939-9089-4_5
M3 - Book chapter
C2 - 30848457
SN - 978-1-4939-9088-7
VL - 1939
T3 - Methods in Molecular Biology
SP - 73
EP - 89
BT - Bioinformatics and Drug Discovery
A2 - Larson, Richard S.
A2 - Oprea, Tudor I.
PB - Humana Press
ER -
ID: 223876548