Vast amounts of biomedical information are only available in textual form, such as in scientific publications, electronic health records, and patents. The sheer volume of these unstructured sources makes it impossible for researchers, physicians, or database curators to keep abreast of all information that is being poured out, and to relate this information to already available knowledge. The BioSemantics Group at Erasmus University Medical Center investigates and develops natural language processing tools and techniques for facilitating the manual information extraction process, and for automatically engendering new hypotheses and insights.
A central theme in the research of the BioSemantics Group is the recognition of biomedical concepts in unstructured text, using statistical and dictionary-based methods. We develop multilingual resources to recognize concepts in different languages, as well as silver- and gold-standard annotated corpora to test the performance of our systems. Another important research topic is relation mining in unstructured text, and feeding the extracted information into knowledge bases. We explore the use of statistical techniques in combination with prior knowledge. Finally, we investigate the properties and use of knowledge bases to elucidate hidden, implicit knowledge.
Application areas of our research are pharmacoepidemiology (supporting drug and vaccine safety studies by mining information from scientific literature and electronic health records), chemistry (mining of scientific literature and patents for chemical compounds and relationships), and clinical practice (generating prediction rules and coding based on electronic health records).