1. Introduction

Project title: European Federation for Cancer Images
Project acronym: EUCAIM
Grant Agreement: 101100633
Call identifier: DIGITAL-2022-CLOUD-AI-02
D5.2. The EUCAIM CDM and Hyper-Ontology for Data Interoperability: initial version
Responsible partner: LIMICS
Author(s): Mirna El Ghosh (LIMICS), Melanie Sambres (LIMICS), Catherine Duclos (LIMICS), Ferdinand Dhombres (LIMICS), Xavier Tannier (LIMICS), Valia Kalokyri (FORTH), Stelios Sfakianakis (FORTH), Manolis Tsiknanis (FORTH), Christel Daniel (APHP-LIMICS)
Contributors: Olga Giraldo (DKFZ), Heimo Muller (BBMRI-ERIC), Laure Fournier (APHP), Aurélien Maire (APHP), Jean Nembo (APHP), Kevin Mondet (APHP), Maciej Bobowicz (GUMed), Jean Charlet (LIMICS), Alexandra Kosvyra (AUTH), Ioanna Chouvarda (AUTH), Antonis Aletras (AUTH), Aikaterini Lazou (AUTH), Vasileia Paschaloudi (AUTH), Teresa Garcia Lezana (CRG-CERCA), Michal Kosno (GUMed), Gianna Tsakou (MAG), Celia Martin Vicario (QUIBIM), Laure Saint-Aubert, (BC Platforms), Alejandro Rodríguez Pardavilla (BAHIA), Roberto Romero (CIBER), Jose Tapia (CIBER), Haridimos Kondylakis (FORTH), Maria Christodoulou (HCS), Maria Gonzalez Lopez (SAS), Federica Cruciani (IFOM)
Reviewers: Pedro Mallol (HULAFE); Francesco Cremonesi (INRIA)
Date of delivery: 29/06/2024
Version: Final version
Table of Contents
Table of Figures 4
List of Tables 6
1. Introduction 7
2. Interoperability requirements 9
3. Data interoperability framework for dataset cataloguing 11
3.1 EUCAIM DCAT-AP 11
3.2 FAIR principles compliance 18
4. Data interoperability framework for federated query 19
4.1 Why do we need the EUCAIM Hyper-Ontology? 19
4.2 The EUCAIM Hyper-Ontology 21
4.3 Data Resources 22
4.4 Development Process 23
4.4.1 Requirements Analysis and Specification 23
4.4.2 Knowledge Acquisition 26
4.4.3 Design and Conceptualization 27
4.4.4 Formalization 29
4.4.5 Evaluation and Validation 36
4.4.6 Ontology Enrichment and Maintenance 37
5. Interoperability framework for federated processing 39
5.1 CDM business requirements 39
5.2 Data harmonization approaches for the federated processing/analysis. 41
5.2.1 Scenario 1: EUCAIM Hyper-Ontology Based CDM for Analysis 41
5.2.2 Scenario 2: Integration with OMOP-FHIR for Wider Compatibility 42
5.2.3 Scenario 3: Simplifying Integration Through ETL process 43
5.2.4 Scenario 4. EUCAIM hyper-ontology only for federated query purposes, OMOP-CDM for analysis 44
5.3. The EUCAIM Common Data Model 45
5.3.1. CDM Selection Rationale 45
5.3.2. EUCAIM Data Dictionary 47
6. Integration of CDM and Hyper-Ontology 67
7. Demonstration scenarios 69
7.1 Prostate Cancer Use Cases 69
ProCAncer-I Scenario 69
INCISIVE Scenario 74
7.2 Breast Cancer Use Cases 77
CHAIMELEON Scenario 77
EuCanImage scenario 80
8. Future work and perspective 84
9. Conclusion 86
10. Publications 87
11. ANNEX 88
Table of Figures
Figure 1: An excerpt of the hyper-ontology (v1.0beta) around representing PSA concepts and their relations 20
Figure 2: An excerpt of the hyper-ontology (v1.0) around combining atomic concepts (TNM Path M, pM1a) to represent specific concepts (AJCC/UICC 7th pathological M1a Category) 21
Figure 3: An excerpt of the hyper-ontology (v1.0) around Cancer Patient represented in Protege. 21
Figure 4. An illustration of the Hyper-ontology iterative development process 23
Figure 5. An illustration of mappings with Biomedical terminologies/ontologies. 27
Figure 6. An excerpt of the ontological model of mCODE around the Disease characterization represented using OntoUML 28
Figure 7. An excerpt of the hyper-ontology structure 29
Figure 8. Part of the hyper-ontology around the concept “Primary malignant neoplasm of prostate” represented using Protege. 31
Figure 9. Part of the hyper-ontology around the concept “Malignant neoplasm”, represented using Protege. 31
Figure 10. Part of the hyper-ontology around the concept “AJCC/UICC 7th clinical M1a Category” represented using Protege. 32
Figure 11. Part of the hyper-ontology around the concept of ”Prostate specific antigen measurement” represented using Protege. 32
Figure 12. Part of the hyper-ontology around the concept of ”Image series” represented using Protege. 33
Figure 13. Part of the hyper-ontology around the concept of ”MRI of breast for screening for malignant neoplasm” represented using Protege. 33
Figure 14. Part of the hyper-ontology around the concept of ”Cancer Patient” represented using Protege. 35
Figure 15. Part of the hyper-ontology around the concept of ”Histological grades” represented using Protege. 35
Figure 16. Part of the hyper-ontology around the concept of ”International Society of Pathology histologic grade group” represented using Protege. 36
Figure 17. Part of the hyper-ontology around the concept of ”Grade group 3 (Gleason score 4 + 3 = 7)” represented using Protege 36
Figure 18. Part of the hyper-ontology around representing tumor marker test results (Protege). 37
Figure 19. Part of the hyper-ontology around primary and secondary cancer relationship (Protege). 38
Figure 20: EUCAIM CDM for analysis & OMOP, FHIR, EUCAIM local data models. For OMOP and FHIR a mediator and mapping component is necessary. 41
Figure 21: OMOP-FHIR local adopted standards– EUCAIM based CDM for analysis with mediator and mapping components necessary for all nodes in the federation. 43
Figure 22: EUCAIM based CDM for all nodes participating in the federation. This would require a one-time transformation and no mediator/mapping component is necessary. 44
Figure 23: OMOP-CDM as the EUCAIM CDM for federated processing and analysis. Hyper-ontology only for federated queries. 44
Figure 24. An excerpt of the hyper-ontology around “Cancer of prostate” represented in Protege 68
Figure 25. A semantic representation and inference of the ProCAncer-I prostate cancer use case (Protege) 70
Figure 26: The ProCAncer-I prostate cancer patient journey. 71
Figure 27: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer clinical information. 72
Figure 28: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer imaging information. 73
Figure 29: The INCISIVE prostate cancer patient journey. 74
Figure 30. A semantic representation and inference of the INCISIVE prostate cancer use case (Protege) 75
Figure 31: The EUCAIM CDM instantiation with the INCISIVE prostate cancer clinical information. 76
Figure 32. The CHAIMELEON breast cancer patient journey. 77
Figure 33. A semantic representation and inference of the CHAIMELEON breast cancer use case (Protege) 78
Figure 34. The EUCAIM CDM instantiation with the CHAIMELEON breast cancer clinical information. 79
Figure 35. The EUCANIMAGE breast cancer patient journey. 80
Figure 36. A semantic representation and inference of the EuCanImage breast cancer use case (Protege) 81
Figure 37 The EUCAIM CDM instantiation with the EuCanImage breast cancer clinical information 82
List of Tables
Table 1: General dataset metadata (DCAT-AP specification with stricter semantics in some cases) 12
Table 2: EUCAIM DCAT-AP domain-specific metadata 15
Table 3: Some metrics of hyper-ontology version 1.0 29
Table 4: List of vocabularies supported by the hyper-ontology version 1.0 classified by domain. 30
Table 5 . DICOM tags mapped to the EUCAIM hyper-ontology (version 1.0) 34
Table 6 . DICOM tags whose values are represented in the EUCAIM hyper-ontology (version 1.0) 34
Table 7 : The EUCAIM CDM: Patient group 47
Table 8 : The EUCAIM CDM: Health assessment group 48
Table 9 : The EUCAIM CDM: Disease group 50
Table 10: The EUCAIM CDM: Cancer treatment group 55
Table 11 : The EUCAIM CDM: Outcome group 59
Table 12: The EUCAIM CDM: Imaging group 63
Table 13 : Data elements required to describe a primary cancer condition in mCODE 67
This document offers a detailed overview of the key features and contributions of the initial version of the EUCAIM Common Data Model (CDM) and Hyper-ontology.
In this deliverable, we provide a detailed explanation of the various challenges we encountered and our strategy for addressing the heterogeneity in data representation and semantics across various sources of information.
Interoperability in healthcare facilitates the exchange and utilization of health information across diverse systems, improving communication and standardizing patient data sharing. It includes technical, syntactic, and semantic components supported by international standards like HL7's FHIR, and terminologies such as SNOMED CT, which ensure accurate data interpretation and integration.
However, the first step in conducting research in the health domain is finding and requesting access to datasets that fulfill criteria based on the clinical use cases that need to be answered. In order to achieve this, it is essential to appropriately catalog the information held by various data sources and make these catalogues accessible for browsing and querying. EUCAIM has worked on extending DCAT-AP, a Data Catalog Vocabulary Application Profile for data portals in Europe, specifically for health imaging datasets, by establishing mandatory metadata for medical images in the EUCAIM public catalogue, aligning with the on-going efforts of the HealthDCAT-AP specification as well as utilizing the EUCAIM Hyper-ontology specification to define controlled vocabularies for semantic interoperability (section 3).
However, typically, public catalogues are anticipated to include metadata outlining the fundamental and high-level characteristics of the datasets, and as such, the bare minimum metadata required for cataloguing datasets across various cancer types has been included at this level (the set of metadata was extracted after the analysis and the methodology adopted, which is outlined in D5.1[1]). For data users seeking to conduct a more fine-grained search at a subject-level based on cancer-specific criteria, the EUCAIM hyper-ontology’s concepts and terms shall be used, through the EUCAIM federated query user interface. The EUCAIM hyper-ontology, developed through an iterative and systematic process, integrates diverse clinical and imaging knowledge from projects like CHAIMELEON, ProCAncer-I, EuCanImage, INCISIVE, and PRIMAGE, addressing the semantic and syntactic disparities that exist among diverse data models and standards (section 4).
In the context of EUCAIM, we examined different scenarios for federated processing/analysis and AI model development tasks, guiding decisions regarding the CDM structure and format, with each scenario presenting distinct advantages and challenges regarding data integration, harmonization, and usability (section 5).
The hyper-ontology is semantically represented to ensure alignment with the EUCAIM CDM based on mCode specification. Regarding the integration of the EUCAIM CDM and hyper-ontology, an example of a formalization profile (Primary cancer condition), detailing data elements and their corresponding value sets, is presented in section 6.
To finish, four proof of concept scenarios related to prostate and breast cancer, provided by four AI4HI projects: INCISIVE, ProCAncer-I, CHAIMELEON, and EuCanImage, are presented in section 7, in order to demonstrate the feasibility and validation of the EUCAIM hyper-ontology and CDM, based on the clinical/biological and imaging information collected and modeled by the four AI4HI projects.
Last updated