One tagger, many uses

Novo Nordisk Foundation
Center for Protein Research

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Research output: Contribution to journal › Conference article › Research › peer-review

Standard

One tagger, many uses : Illustrating the power of ontologies in dictionary-based named entity recognition. / Jensen, Lars Juhl.

In: CEUR Workshop Proceedings, Vol. 1747, 2016.

Research output: Contribution to journal › Conference article › Research › peer-review

Harvard

Jensen, LJ 2016, 'One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition', CEUR Workshop Proceedings, vol. 1747. <http://ceur-ws.org/Vol-1747/BIT102_ICBO2016.pdf>

APA

Jensen, L. J. (2016). One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition. CEUR Workshop Proceedings, 1747. http://ceur-ws.org/Vol-1747/BIT102_ICBO2016.pdf

Vancouver

Jensen LJ. One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition. CEUR Workshop Proceedings. 2016;1747.

Author

Jensen, Lars Juhl. / One tagger, many uses : Illustrating the power of ontologies in dictionary-based named entity recognition. In: CEUR Workshop Proceedings. 2016 ; Vol. 1747.

Bibtex

@inproceedings{ea8cb3b98e4544c0b71774c9ca1d7170,

title = "One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition",

abstract = "Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.",

keywords = "Dictionaries, Named entity recognition, Software",

author = "Jensen, {Lars Juhl}",

year = "2016",

language = "English",

volume = "1747",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "ceur workshop proceedings",

}

RIS

TY - GEN

T1 - One tagger, many uses

T2 - Illustrating the power of ontologies in dictionary-based named entity recognition

AU - Jensen, Lars Juhl

PY - 2016

Y1 - 2016

N2 - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

AB - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

KW - Dictionaries

KW - Named entity recognition

KW - Software

M3 - Conference article

AN - SCOPUS:85018753947

VL - 1747

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -

ID: 179393917

Novo Nordisk Foundation Center for Protein Research