S1000: a better taxonomic name corpus for biomedical information extraction
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
S1000 : a better taxonomic name corpus for biomedical information extraction. / Luoma, Jouni; Nastou, Katerina; Ohta, Tomoko; Toivonen, Harttu; Pafilis, Evangelos; Jensen, Lars Juhl; Pyysalo, Sampo.
In: Bioinformatics, Vol. 39, No. 6, btad369, 2023.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - S1000
T2 - a better taxonomic name corpus for biomedical information extraction
AU - Luoma, Jouni
AU - Nastou, Katerina
AU - Ohta, Tomoko
AU - Toivonen, Harttu
AU - Pafilis, Evangelos
AU - Jensen, Lars Juhl
AU - Pyysalo, Sampo
N1 - Publisher Copyright: © 2023 The Author(s).
PY - 2023
Y1 - 2023
N2 - Motivation: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. Results: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods.
AB - Motivation: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. Results: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods.
U2 - 10.1093/bioinformatics/btad369
DO - 10.1093/bioinformatics/btad369
M3 - Journal article
C2 - 37289518
AN - SCOPUS:85163845001
VL - 39
JO - Bioinformatics (Online)
JF - Bioinformatics (Online)
SN - 1367-4811
IS - 6
M1 - btad369
ER -
ID: 360982850