SIFR project
Semantic Indexing of French Biomedical Data Resources
The SIFR project investigates the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data
Scientific CONTEXT
The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data. A key aspect to address those issues is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable.
Building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data
The community has turned toward ontologies to design semantic indexes of data that leverage the medical knowledge for better information mining and retrieval. However, besides the existence of various English tools, there are considerably less ontologies available in French and there is a strong lack of related tools and services to exploit them. This lack does not match the huge amount of biomedical data produced in French, especially in the clinical world (e.g., electronic health records).
The Semantic Indexing of French Biomedical Data Resources (SIFR) project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.
The SIFR project brings together several young researchers at LIRMM to achieve this objective. Dr. Clement Jonquet, assistant professor at University of Montpellier since 2010, coordinates the project and capitalize on a strong experience in the field acquired after a 3 year postdoc at Stanford. He is accompanied by 2 young researchers (HDR): Dr. Sandra Bringay and Dr. Mathieu Roche both expert in biomedical data/text mining. In addition, highly qualified and experienced partners are associated to the project: (i)°Stanford BMIR, a worldwide leader providing (English-)ontology-based services to assist health professionals and researchers in the use of ontologies to design biomedical knowledge-based systems; (ii)°The TETIS group, a joint applied research unit (AgroParisTech, Irstea, Cirad) specialized in geographic information, environment and agriculture. (iii)°the Computational Biology Institute (IBC) of Montpellier.
Ontology-based indexing workflow (i.e., French Annotator) similar to what exists for English resources but dedicated and specialized for French
http://bioportal.lirmm.fr/annotator
This service is available within a portal of ~28 French biomedical ontologies/terminologies which reuse the BioPortal technology, developed at Stanford University. Ontologies has been offered by the CISMEF group from Rouen University Hospital, or taken from the UMLS, or directly uploaded by users. The SIFR BioPortal has been released in June 2015.
Need of automated annotation methods
http://bioportal.lirmm.fr/annotator
Researchers have called for the need of automated annotation methods and for leveraging natural language processing tools in the curation process. Still, even if the issue is being currently addressed for English, French is not in the same situation: there is little readily available technology (i.e., “off-the-shelf” technology) that allows the use of ontologies uniformly in various annotation and curation pipelines with minimal effort.
Within the project, we work on several research questions from semantic indexing, text mining, terminology extraction, ontology enrichment, disambiguation, multilingualism in ontologies and semantic annotation in order to offer the community with services and applications capable of leveraging the use of biomedical ontologies in their data workflows. For instance, in order to extract specialized terminology from free texts in French, our approaches are based on new ranking functions that combine statistical and linguistic methods for highlighting relevant terms. Then we offer a complete methodology to identify (non)polysemic terms and choose the appropriate attachment in an already existing ontology. As another example, we develop a new agent-centered graph-based knowledge representation approach that enables to merge formal data representation (e.g., from the semantic Web) with informal users’ contributions (from the social Web) and reveal relevant semantic paths between resources.
We plan to capitalize upon the work already accomplished in the last 16 years in France, however, SIFR enables the emergence of new research domains and applications at LIRMM and materialize an important international collaboration with Stanford BMIR. SIFR will offer the French biomedical community (e.g., clinicians, health professionals, researchers) highly valuable ontology-based indexing services that will enhance their data production and consumption workflows. In addition, the results of the project are not limited to French (also include English, Spanish) and we are also transferring our results in the agronomic domain in the context of the new AgroPortal project (http://agroportal.lirmm.fr).
Publications
66 scientific communications
All project communications are uploaded to HAL.
9 international journal, 2 national journal, 29 international conferences or workshops (such as ISWC, IDEAS, MIE, KEOD, MEDINFO).
Featured publications
Amina Annane, Vincent Emonet, Faical Azouaou & Clement Jonquet. Multilingual Mapping Reconciliation between English-French Biomedical Ontologies, In 6th International Conference on Web Intelligence, Mining and Semantics, WIMS'16. Nimes, France, June 2016. (13), pp. 12. ACM. [DOI] [PDF] [RelatedLink]
Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche & Maguelonne Teisseire. Automatic Biomedical Term Polysemy Detection, In 10th International Conference on Language Resources and Evaluation, LREC'16. Portoroz, Slovenia, May 2016. pp. 23-28. European Language Resources Association. [PDF] [RelatedLink]
Project Outcomes
SIFR Annotator
Design, development and deployment of the French Annotator
A publicly accessible ontology-based annotation tool to process French biomedical text data. A service that for a given piece of text will return biomedical ontology concepts directly mentioned in the text or semantically expanded.
See details in this publication: 10.1186/s12859-018-2429-2
OTHER RESEARCH
Obtain new research results to exploit and enhance ontology-based indexing services
DEVELOPMENT
24+ repositories on GitHub, 18 contributors
We actively reuse the technology of the US National Center for Biomedical Ontology and develop new features and software for our research.
SIFR BioPortal
An open platform to host French biomedical ontologies and terminologies based on the technology developed by the National Center for Biomedical Ontology
ViewpointS
A formalism for subjective knowledge
We work on semantic indexing and knowledge representation with the goal of capturing formal data and informal contributions into an evolutionary knowledge graph
Some figures
66 scientific publications, 3 PhD thesis, 1 cotuelle, 2 postdocs, 6 master interns, 2 years of developer, 5 conferences co-organized, 1 mobility project
Partner projects
Prices & distinctions
Team
Over the 6 years of the project (2013-2019), the team included:
Advisors
Stefano A. Cerri
Maguelonne Teisseire
Pascal Poncelet
Zohra Bellahsene
Collaborators
Anne Toulet (LIRMM)
Philippe Lemoisson (TETIS)
Pierre Larmande (IRD / IBC)
Mark Musen (BMIR / NCBO)
John Graybeal (NCBO)
Stefan Darmoni (CISMEF)
Sebastien Harispe (LGI2P)
Students
Mike Tapi-Nzali
Stella Zevio
Soumia Melzi
Kevin Cauchois
Khadidja Bouarech
Solène Eholié
Alexandre Lerbet
Chafik EL Ghandour
Mohamed Serhani
Olivier Duplouy
Pierre Burc
Other helpers
Julien Diener
Sebastien Harispe
Dissemination
Events
Conferences, hackathons, meetup, tutorial, etc.
Presentations
Multiple presentations uploaded on Slideshare, including
Working group activities
Talking about us ...
Open Positions
Master Intern (@LIRMM)
Closed Positions
Postdoc
Research engineer
PhD fellowship
Master Intern
CONTACT
Support & Funding
SIFR was mainly supported by ANR and H2020
JCJC program
This project has received funding from the French National Research Agency (ANR) under the Young Researcher program grant ANR-12-JS02-01001.
MSCA program
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 701771 (Marie Skłodowska-Curie Individual Fellowship).