One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition
Research output: Contribution to journal › Conference article › peer-review
Documents
- BIT102_ICBO2016
Final published version, 145 KB, PDF document
Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.
Original language | English |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 1747 |
Number of pages | 2 |
ISSN | 1613-0073 |
Publication status | Published - 2016 |
- Dictionaries, Named entity recognition, Software
Research areas
Links
- http://ceur-ws.org/Vol-1747/BIT102_ICBO2016.pdf
Final published version
Number of downloads are based on statistics from Google Scholar and www.ku.dk
ID: 179393917