1. Introduction

Project title: European Federation for Cancer Images

Project acronym: EUCAIM

Grant Agreement: 101100633

Call identifier: DIGITAL-2022-CLOUD-AI-02

D5.2. The EUCAIM CDM and Hyper-Ontology for Data Interoperability: initial version

Responsible partner: LIMICS

Author(s): Mirna El Ghosh (LIMICS), Melanie Sambres (LIMICS), Catherine Duclos (LIMICS), Ferdinand Dhombres (LIMICS), Xavier Tannier (LIMICS), Valia Kalokyri (FORTH), Stelios Sfakianakis (FORTH), Manolis Tsiknanis (FORTH), Christel Daniel (APHP-LIMICS)

Contributors: Olga Giraldo (DKFZ), Heimo Muller (BBMRI-ERIC), Laure Fournier (APHP), Aurélien Maire (APHP), Jean Nembo (APHP), Kevin Mondet (APHP), Maciej Bobowicz (GUMed), Jean Charlet (LIMICS), Alexandra Kosvyra (AUTH), Ioanna Chouvarda (AUTH), Antonis Aletras (AUTH), Aikaterini Lazou (AUTH), Vasileia Paschaloudi (AUTH), Teresa Garcia Lezana (CRG-CERCA), Michal Kosno (GUMed), Gianna Tsakou (MAG), Celia Martin Vicario (QUIBIM), Laure Saint-Aubert, (BC Platforms), Alejandro Rodríguez Pardavilla (BAHIA), Roberto Romero (CIBER), Jose Tapia (CIBER), Haridimos Kondylakis (FORTH), Maria Christodoulou (HCS), Maria Gonzalez Lopez (SAS), Federica Cruciani (IFOM)

Reviewers: Pedro Mallol (HULAFE); Francesco Cremonesi (INRIA)

Date of delivery: 29/06/2024

Version: Final version

Table of Contents

Table of Figures 4

List of Tables 6

1. Introduction 7

2. Interoperability requirements 9

3. Data interoperability framework for dataset cataloguing 11

3.1 EUCAIM DCAT-AP 11

3.2 FAIR principles compliance 18

4. Data interoperability framework for federated query 19

4.1 Why do we need the EUCAIM Hyper-Ontology? 19

4.2 The EUCAIM Hyper-Ontology 21

4.3 Data Resources 22

4.4 Development Process 23

4.4.1 Requirements Analysis and Specification 23

4.4.2 Knowledge Acquisition 26

4.4.3 Design and Conceptualization 27

4.4.4 Formalization 29

4.4.5 Evaluation and Validation 36

4.4.6 Ontology Enrichment and Maintenance 37

5. Interoperability framework for federated processing 39

5.1 CDM business requirements 39

5.2 Data harmonization approaches for the federated processing/analysis. 41

5.2.1 Scenario 1: EUCAIM Hyper-Ontology Based CDM for Analysis 41

5.2.2 Scenario 2: Integration with OMOP-FHIR for Wider Compatibility 42

5.2.3 Scenario 3: Simplifying Integration Through ETL process 43

5.2.4 Scenario 4. EUCAIM hyper-ontology only for federated query purposes, OMOP-CDM for analysis 44

5.3. The EUCAIM Common Data Model 45

5.3.1. CDM Selection Rationale 45

5.3.2. EUCAIM Data Dictionary 47

6. Integration of CDM and Hyper-Ontology 67

7. Demonstration scenarios 69

7.1 Prostate Cancer Use Cases 69

ProCAncer-I Scenario 69

INCISIVE Scenario 74

7.2 Breast Cancer Use Cases 77

CHAIMELEON Scenario 77

EuCanImage scenario 80

8. Future work and perspective 84

9. Conclusion 86

10. Publications 87

11. ANNEX 88

Table of Figures

Figure 1: An excerpt of the hyper-ontology (v1.0beta) around representing PSA concepts and their relations 20

Figure 2: An excerpt of the hyper-ontology (v1.0) around combining atomic concepts (TNM Path M, pM1a) to represent specific concepts (AJCC/UICC 7th pathological M1a Category) 21

Figure 3: An excerpt of the hyper-ontology (v1.0) around Cancer Patient represented in Protege. 21

Figure 4. An illustration of the Hyper-ontology iterative development process 23

Figure 5. An illustration of mappings with Biomedical terminologies/ontologies. 27

Figure 6. An excerpt of the ontological model of mCODE around the Disease characterization represented using OntoUML 28

Figure 7. An excerpt of the hyper-ontology structure 29

Figure 8. Part of the hyper-ontology around the concept “Primary malignant neoplasm of prostate” represented using Protege. 31

Figure 9. Part of the hyper-ontology around the concept “Malignant neoplasm”, represented using Protege. 31

Figure 10. Part of the hyper-ontology around the concept “AJCC/UICC 7th clinical M1a Category” represented using Protege. 32

Figure 11. Part of the hyper-ontology around the concept of ”Prostate specific antigen measurement” represented using Protege. 32

Figure 12. Part of the hyper-ontology around the concept of ”Image series” represented using Protege. 33

Figure 13. Part of the hyper-ontology around the concept of ”MRI of breast for screening for malignant neoplasm” represented using Protege. 33

Figure 14. Part of the hyper-ontology around the concept of ”Cancer Patient” represented using Protege. 35

Figure 15. Part of the hyper-ontology around the concept of ”Histological grades” represented using Protege. 35

Figure 16. Part of the hyper-ontology around the concept of ”International Society of Pathology histologic grade group” represented using Protege. 36

Figure 17. Part of the hyper-ontology around the concept of ”Grade group 3 (Gleason score 4 + 3 = 7)” represented using Protege 36

Figure 18. Part of the hyper-ontology around representing tumor marker test results (Protege). 37

Figure 19. Part of the hyper-ontology around primary and secondary cancer relationship (Protege). 38

Figure 20: EUCAIM CDM for analysis & OMOP, FHIR, EUCAIM local data models. For OMOP and FHIR a mediator and mapping component is necessary. 41

Figure 21: OMOP-FHIR local adopted standards– EUCAIM based CDM for analysis with mediator and mapping components necessary for all nodes in the federation. 43

Figure 22: EUCAIM based CDM for all nodes participating in the federation. This would require a one-time transformation and no mediator/mapping component is necessary. 44

Figure 23: OMOP-CDM as the EUCAIM CDM for federated processing and analysis. Hyper-ontology only for federated queries. 44

Figure 24. An excerpt of the hyper-ontology around “Cancer of prostate” represented in Protege 68

Figure 25. A semantic representation and inference of the ProCAncer-I prostate cancer use case (Protege) 70

Figure 26: The ProCAncer-I prostate cancer patient journey. 71

Figure 27: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer clinical information. 72

Figure 28: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer imaging information. 73

Figure 29: The INCISIVE prostate cancer patient journey. 74

Figure 30. A semantic representation and inference of the INCISIVE prostate cancer use case (Protege) 75

Figure 31: The EUCAIM CDM instantiation with the INCISIVE prostate cancer clinical information. 76

Figure 32. The CHAIMELEON breast cancer patient journey. 77

Figure 33. A semantic representation and inference of the CHAIMELEON breast cancer use case (Protege) 78

Figure 34. The EUCAIM CDM instantiation with the CHAIMELEON breast cancer clinical information. 79

Figure 35. The EUCANIMAGE breast cancer patient journey. 80

Figure 36. A semantic representation and inference of the EuCanImage breast cancer use case (Protege) 81

Figure 37 The EUCAIM CDM instantiation with the EuCanImage breast cancer clinical information 82

List of Tables

Table 1: General dataset metadata (DCAT-AP specification with stricter semantics in some cases) 12

Table 2: EUCAIM DCAT-AP domain-specific metadata 15

Table 3: Some metrics of hyper-ontology version 1.0 29

Table 4: List of vocabularies supported by the hyper-ontology version 1.0 classified by domain. 30

Table 5 . DICOM tags mapped to the EUCAIM hyper-ontology (version 1.0) 34

Table 6 . DICOM tags whose values are represented in the EUCAIM hyper-ontology (version 1.0) 34

Table 7 : The EUCAIM CDM: Patient group 47

Table 8 : The EUCAIM CDM: Health assessment group 48

Table 9 : The EUCAIM CDM: Disease group 50

Table 10: The EUCAIM CDM: Cancer treatment group 55

Table 11 : The EUCAIM CDM: Outcome group 59

Table 12: The EUCAIM CDM: Imaging group 63

Table 13 : Data elements required to describe a primary cancer condition in mCODE 67

This document offers a detailed overview of the key features and contributions of the initial version of the EUCAIM Common Data Model (CDM) and Hyper-ontology.

In this deliverable, we provide a detailed explanation of the various challenges we encountered and our strategy for addressing the heterogeneity in data representation and semantics across various sources of information.

Interoperability in healthcare facilitates the exchange and utilization of health information across diverse systems, improving communication and standardizing patient data sharing. It includes technical, syntactic, and semantic components supported by international standards like HL7's FHIR, and terminologies such as SNOMED CT, which ensure accurate data interpretation and integration.

However, the first step in conducting research in the health domain is finding and requesting access to datasets that fulfill criteria based on the clinical use cases that need to be answered. In order to achieve this, it is essential to appropriately catalog the information held by various data sources and make these catalogues accessible for browsing and querying. EUCAIM has worked on extending DCAT-AP, a Data Catalog Vocabulary Application Profile for data portals in Europe, specifically for health imaging datasets, by establishing mandatory metadata for medical images in the EUCAIM public catalogue, aligning with the on-going efforts of the HealthDCAT-AP specification as well as utilizing the EUCAIM Hyper-ontology specification to define controlled vocabularies for semantic interoperability (section 3).

However, typically, public catalogues are anticipated to include metadata outlining the fundamental and high-level characteristics of the datasets, and as such, the bare minimum metadata required for cataloguing datasets across various cancer types has been included at this level (the set of metadata was extracted after the analysis and the methodology adopted, which is outlined in D5.1[1]). For data users seeking to conduct a more fine-grained search at a subject-level based on cancer-specific criteria, the EUCAIM hyper-ontology’s concepts and terms shall be used, through the EUCAIM federated query user interface. The EUCAIM hyper-ontology, developed through an iterative and systematic process, integrates diverse clinical and imaging knowledge from projects like CHAIMELEON, ProCAncer-I, EuCanImage, INCISIVE, and PRIMAGE, addressing the semantic and syntactic disparities that exist among diverse data models and standards (section 4).

In the context of EUCAIM, we examined different scenarios for federated processing/analysis and AI model development tasks, guiding decisions regarding the CDM structure and format, with each scenario presenting distinct advantages and challenges regarding data integration, harmonization, and usability (section 5).

The hyper-ontology is semantically represented to ensure alignment with the EUCAIM CDM based on mCode specification. Regarding the integration of the EUCAIM CDM and hyper-ontology, an example of a formalization profile (Primary cancer condition), detailing data elements and their corresponding value sets, is presented in section 6.

To finish, four proof of concept scenarios related to prostate and breast cancer, provided by four AI4HI projects: INCISIVE, ProCAncer-I, CHAIMELEON, and EuCanImage, are presented in section 7, in order to demonstrate the feasibility and validation of the EUCAIM hyper-ontology and CDM, based on the clinical/biological and imaging information collected and modeled by the four AI4HI projects.

Last updated