Exploring Chemical and Biological Named Entity Recognition in Drug Discovery


Start date: 01-12-2011

Duration: 48 months

Status: In progress

People involved: Saber Akhondi, Jan Kors



The chemical information landscape is changing rapidly with a yearly increase of over 1 million new compounds and more than 700,000 publications related to chemistry. Exploring the chemical space covered by relevant journals and patents is a crucial step in early stage medicinal chemistry projects. Manually extracting chemical and biological entities from unstructured text is a complex and cumbersome task.

This project aims at improving the process of chemistry and biological recognition and relation extraction from patents and scientific literature using text-mining approaches. The second part of the project will apply text mining technologies to address some of the current challenges in accessing and processing relevant information in drug discovery. The project is sponsored by AstraZeneca.