Contextual Text Mining from the Biomedical Scientific Literature (BioLitContextMining)

Scientific publications are the main media through which researchers report their new findings. The huge amount and the continuing rapid growth of the number of published articles in biomedicine, has made it particularly difficult for researchers to access and utilize the knowledge contained in them. Developing text mining techniques to automatically extract biologically important information such as relationships between biomolecules is not only useful, but also necessary to facilitate biomedical research and to speed-up scientific progress. Most of the prior studies in the biomedical text mining field tackle the problem of extracting the fact that there is a relationship between a pair of biomolecules. However, for extracted information to make sense, a great deal of biological context is required. While some of this context such as relationship type and directionality is found in the sentence that actually reports the relationship, some of it such as species and experimental method is likely stated elsewhere in the article.
The objectives of the BioLitContextMining project can be summarized as follows:
(i) Design methods based on natural language processing and machine learning to extract relationships among biomolecules and their local (sentence-level) and non-local (document-level) context information;
(ii) Apply the developed methods on a large-scale to the electronically available biomedical scientific literature and build web-based systems to make the extracted information accessible and useful to the biomedical scientists;
(iii) Design novel knowledge discovery methods that utilize contextual information.

Interaction Network Ontology (INO) and Ignet

In order to extract context information regarding the types of interactions among biomolecules an Interaction Network Ontology (INO) that collects, classifies, and assigns semantic relations on more than 800 interaction keywords (e.g. bind, associate, and phosphorylate) has been developed. These terms are organized in INO using a hierarchical structure and aligned with the Basic Formal ontology (BFO; http://www.ifomis.org/bfo). An example of INO term is “increase”, whose parent term in INO is “positive regulation”, which is a child term of “regulation” and “interaction”. INO was developed using the format of W3C standard Web Ontology Language (OWL2) (http://www.w3.org/TR/owl-guide/). INO is available at http://www.ontobee.org/browser/index.php?o=INO. A method based on using the INO ontology and the syntactic analysis of the sentences has been designed for enriching the interactions among bio-molecules with their corresponding types. The INO ontology and the interaction type detection method have been integrated with the Ignet (Integrated Gene Networks) system (http://ignet.hegroup.org).

INO and Ignet are being developed with Junguk Hur, Arzucan Ozgur, Zuoshuang Xiang, Dragomir Radev, and Yongqun He

Experimental Method Detection

Important context information for interactions among biomolecules is the experimental method used by scientists to detect the interactions. Approximate pattern matching approaches based on string matching, ontology based TF-IDF (Term Frequency – Inverse Document Frequency), and Language Modeling have been developed for identifying the experimental methods reported in the scientific publications. The string matching based approach has been applied in the web-based system PHISTO (http://www.phisto.org/) to enrich the pathogen human protein-protein interactions with their corresponding experimental methods.

Experimental method detection methods are being developed with İlknur Karadeniz. Phisto is being developed with Saliha Durmuş Tekir, Tunahan Çakır, Emre Ardıç, Ali Semih Sayılırbaş, Gökhan Konuk, Mithat Konuk, Hasret Sarıyer, Azat Uğurlu, İlknur Karadeniz, Arzucan Özgür, F. Erdoğan Sevilgen, and Kutlu Ö. Ülgen.

Enriching Bacteria with Context Information

Bacteria lie at the heart of several infectious diseases. However, this domain has not been addressed much by the text mining community. New ontology-centered methods based on the linguistic analysis of the text have been developed for extracting bacteria context information, in particular, the habitat information where they live. Participation in the BioNLP Shared Task on Bacteria Biotopes in 2013 as the Boun Team was achieved (http://2013.bionlp-st.org/tasks/bacteria-biotopes).

Methods for enriching bacteria with habitat information are developed with İlknur Karadeniz.

Publications:

Refereed Journal Papers:

[1] Hakime Ozturk, Elif Ozkirimli, Arzucan Ozgur. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinformatics, 17:128, 2016.

[2] Ilknur Karadeniz, Junguk Hur, Yongqun He, Arzucan Ozgur. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network. Frontiers in Microbiology, 6:1386, 2015.

[3] Ilknur Karadeniz and Arzucan Ozgur. Detection and Categorization of Bacteria Habitats using Shallow Linguistic Analysis. BMC Bioinformatics, 16 (Suppl 10):S5, 2015.

[4] Saliha Durmus, Tunahan Cakir, Arzucan Ozgur, Reinhard Guthke. A Review on Computational Systems Biology of Pathogen-Host Interactions. Frontiers in Microbiology, 6:235, 2015.

[5] Hakime Ozturk, Elif Ozkirimli, Arzucan Ozgur. Classification of Beta-Lactamases and Penicillin Binding Proteins Using Ligand-Centric Network Models. PLoS ONE, 10(2): e0117874, 2015.

[6] Junguk Hur*, Arzucan Ozgur*, Zuoshuang Xiang, Yongqun He. Development and Application of an Interaction Network Ontology for Literature Mining of Vaccine-associated Gene-Gene Interactions. Journal of Biomedical Semantics, 6:2, 2015. (*Equal contribution)

[7] Saliha Durmuş Tekir, Tunahan Çakır, Emre Ardıç, Ali Semih Sayılırbaş, Gökhan Konuk, Mithat Konuk, Hasret Sarıyer, Azat Uğurlu, İlknur Karadeniz, Arzucan Özgür, F. Erdoğan Sevilgen, and Kutlu Ö. Ülgen. PHISTO: Pathogen-Host Interaction Search Tool. Bioinformatics, Vol. 29 no. 10, pages 1357-1358, 2013.

[8] Junguk Hur*, Arzucan Ozgur*, Zuoshuang Xiang, and Yongqun He. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining. Journal of Biomedical Semantics, 3:18, 2012. (*Equal contribution)

Refereed Conference and Workshop Papers:

[1] Arzucan Ozgur*, Junguk Hur*, Yongqun He. Extension of the Interaction Network Ontology for literature mining of gene-gene interaction networks from sentences with multiple interaction keywords. Proceedings of the BDM2I Workshop at ISWC, USA, October 11-15, 2015. (*Equal contribution)

[2] Ferhat Aydin, Zehra Melce Husunbeyi, Arzucan Ozgur. Retrieving Passages Describing Experimental Methods using Ontology and Term Relevance based Query Matching. Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 42-50, Spain, September 9-11, 2015.

[3] İlknur Karadeniz and Arzucan Özgür. “Bacteria Biotope Detection, Ontology-based Normalization, and Relation Extraction using Syntactic Rules”. In Proceedings of BioNLP Shared Task 2013 Workshop, Sofia, Bulgaria, August, 2013. Association for Computational Linguistics.

[4] Saliha Durmuş Tekir, Tunahan Çakır, Emre Ardıç, İlknur Karadeniz, Arzucan Özgür, F. Erdoğan Sevilgen, and Kutlu Ö. Ülgen. PHISTO: A New Web Platform for Pathogen-Human Interactions. Computational Methods in Systems Biology, Series: Lecture Notes in Bioinformatics (LNBI), Vol. 8130, pp. 268-269, Springer, 2013.

[5] Junguk Hur*, Arzucan Ozgur*, Zuoshuang Xiang, and Yongqun He. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining. International Conference on Biomedical Ontology (ICBO) – Vaccine and Drug Ontology in the Study of Mechanism and Effect (VDOSME), Graz, Austria, July 21, 2012. (*Equal contribution) (Extended version published in Journal of Biomedical Semantics)

[6] Junguk Hur*, Arzucan Ozgur*, Zuoshuang Xiang, Dragomir Radev, Eva Feldman, Yongqun He. Ontology-based Enrichment Analysis of Gene-Gene Interaction Terms and Application on Literature-derived IFN-gamma network. International Conference on Intelligent Systems for Molecular Biology (ISMB) – Bio-Ontologies Special Interest Group, July 14, 2012. Long Beach, California (flash update presentation). (*Equal contribution)

Acknowledgements

This work has been supported by Marie Curie FP7-Reintegration-Grants within the 7th European Community Framework Programme