S1000: a better taxonomic name corpus for biomedical information extraction

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

S1000 : a better taxonomic name corpus for biomedical information extraction. / Luoma, Jouni; Nastou, Katerina; Ohta, Tomoko; Toivonen, Harttu; Pafilis, Evangelos; Jensen, Lars Juhl; Pyysalo, Sampo.

In: Bioinformatics, Vol. 39, No. 6, btad369, 2023.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Luoma, J, Nastou, K, Ohta, T, Toivonen, H, Pafilis, E, Jensen, LJ & Pyysalo, S 2023, 'S1000: a better taxonomic name corpus for biomedical information extraction', Bioinformatics, vol. 39, no. 6, btad369. https://doi.org/10.1093/bioinformatics/btad369

APA

Luoma, J., Nastou, K., Ohta, T., Toivonen, H., Pafilis, E., Jensen, L. J., & Pyysalo, S. (2023). S1000: a better taxonomic name corpus for biomedical information extraction. Bioinformatics, 39(6), [btad369]. https://doi.org/10.1093/bioinformatics/btad369

Vancouver

Luoma J, Nastou K, Ohta T, Toivonen H, Pafilis E, Jensen LJ et al. S1000: a better taxonomic name corpus for biomedical information extraction. Bioinformatics. 2023;39(6). btad369. https://doi.org/10.1093/bioinformatics/btad369

Author

Luoma, Jouni ; Nastou, Katerina ; Ohta, Tomoko ; Toivonen, Harttu ; Pafilis, Evangelos ; Jensen, Lars Juhl ; Pyysalo, Sampo. / S1000 : a better taxonomic name corpus for biomedical information extraction. In: Bioinformatics. 2023 ; Vol. 39, No. 6.

Bibtex

@article{ed4bbda4eee64505b2720e7cbf0f4151,
title = "S1000: a better taxonomic name corpus for biomedical information extraction",
abstract = "Motivation: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. Results: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods. ",
author = "Jouni Luoma and Katerina Nastou and Tomoko Ohta and Harttu Toivonen and Evangelos Pafilis and Jensen, {Lars Juhl} and Sampo Pyysalo",
note = "Publisher Copyright: {\textcopyright} 2023 The Author(s).",
year = "2023",
doi = "10.1093/bioinformatics/btad369",
language = "English",
volume = "39",
journal = "Bioinformatics (Online)",
issn = "1367-4811",
publisher = "Oxford University Press",
number = "6",

}

RIS

TY - JOUR

T1 - S1000

T2 - a better taxonomic name corpus for biomedical information extraction

AU - Luoma, Jouni

AU - Nastou, Katerina

AU - Ohta, Tomoko

AU - Toivonen, Harttu

AU - Pafilis, Evangelos

AU - Jensen, Lars Juhl

AU - Pyysalo, Sampo

N1 - Publisher Copyright: © 2023 The Author(s).

PY - 2023

Y1 - 2023

N2 - Motivation: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. Results: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods.

AB - Motivation: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. Results: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods.

U2 - 10.1093/bioinformatics/btad369

DO - 10.1093/bioinformatics/btad369

M3 - Journal article

C2 - 37289518

AN - SCOPUS:85163845001

VL - 39

JO - Bioinformatics (Online)

JF - Bioinformatics (Online)

SN - 1367-4811

IS - 6

M1 - btad369

ER -

ID: 360982850