One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
One tagger, many uses : Illustrating the power of ontologies in dictionary-based named entity recognition. / Jensen, Lars Juhl.
In: CEUR Workshop Proceedings, Vol. 1747, 2016.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - One tagger, many uses
T2 - Illustrating the power of ontologies in dictionary-based named entity recognition
AU - Jensen, Lars Juhl
PY - 2016
Y1 - 2016
N2 - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.
AB - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.
KW - Dictionaries
KW - Named entity recognition
KW - Software
M3 - Conference article
AN - SCOPUS:85018753947
VL - 1747
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
SN - 1613-0073
ER -
ID: 179393917