In BioCreative V, one of the challenge tasks is the automatic extraction of CDRs from biomedical literature. The CDR task comprises two subtasks. The first sub-task involves automatic disease named entity recognition and normalization (DNER) from a set of Medline documents, and can be considered as a first step in CDR extraction. The second subtask consists of extracting chemical-induced diseases (CID) and delivering the chemical-disease pairs per document.
For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, we applied the optimized Peregrine system for disease concept recognition; for chemical concept recognition, we used tmChem, a chemical concept recognizer that was provided by the challenge organizers. A relation extraction module was trained on a rich feature set, including features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the training corpus documents.
The resources are available at
- DNER: http://biosemantics.org:8080/cdr-api/dner
- CID: http://biosemantics.org:8080/cdr-api/cid
$ http POST http://biosemantics.org:8080/cdr-api/cid run==1 format==bioc < bioc-documents.xml > bioc-documents-annotated.xml
Extraction of chemical-induced diseases using prior knowledge and textual information
Ewoud Pons; Benedikt F.H. Becker; Saber A. Akhondi; Zubair Afzal; Erik M. van Mulligen; Jan A. Kors
doi: 10.1093/database/baw046 [PDF]