6. Integration of CDM and Hyper-Ontology
The hyper-ontology is developed using a hybrid approach composed of top-down and bottom-up strategies. While the bottom-up considers the clinical and imaging knowledge provided by the AI4HI projects, the top-down grounds the hyper-ontology in the mCODE conceptual model. Therefore, the mCODE profiles and data elements are analyzed and semantically represented in the ontological model using a high-level conceptual modeling language, OntoUML. By applying this strategy, the hyper-ontology ensures seamless integration with the EUCAM CDM, which is based on the mCODE specifications. The mCODE specifications are syntactic representations of entities, their key elements, and the associated value sets. Thus, there is a need for an ontological analysis that helps to unpack the ontological content of the oncology domain based on mCODE generic specifications.
In the following, we give an example of an ontological analysis and formalization of the Primary Cancer Condition profile[22] and the associated elements. Table 13 presents basic data elements required for describing a primary cancer condition: Code, HistologyMorphologyBehavior, BodySite, and Stage. The value sets of these elements are specified in mCODE, such as Malignant tumor of prostate (ICD10:C61) and Malignant Neoplasm (SNOMED:1240414004) value sets for the Code and HistologyMorphologyBehavior data elements.
Section 4.4.3 (Core Layer) outlines the ontological analysis of Primary Cancer Condition and the associated semantic relations (see Figure 6). For instance, the data element HistologyMorphologyBehavior is explicitly and semantically represented in the hyper-ontology using the semantic property/relation “Has associated morphology” and BodySite is represented using “Has finding site” association, which links the cancer condition to the morphology/histology (e.g., Malignant Neoplasm (SNOMED:1240414004)) and affected body structure (e.g., Prostate (SNOMED:41216001)), respectively. The formalization of this profile using OWL is illustrated in Figure 24 .
Table 13 13: Data elements required to describe a primary cancer condition in mCODE
Data Element
Example of Value Set (Standard concepts)
Code
Malignant tumor of prostate (ICD10:C61)
HistologyMorphologyBehavior
Malignant Neoplasm (SNOMED:1240414004)
BodySite
Prostate (SNOMED:41216001)
Stage
TNM staging classifications (SNOMED:258234001)
Figure 24. An excerpt of the hyper-ontology around “Cancer of prostate” represented in Protege
As part of our upcoming activities, we will explicitly state each attribute and its associated value set as defined in the EUCAIM CDM, ensuring precise terminology binding using the semantics and terminologies from the EUCAIM hyper-ontology. For example, while SNOMED is the standard terminology for coding conditions in OMOP, the oncology domain uses different reference terminologies: ICD-10 for clinical diagnosis of cancers and ICD-O for histological diagnosis, with ICD-O-3 being the global standard for cancer registries. Given that various terminologies have been used across underlying repositories to represent conditions, the integration of the EUCAIM Hyper-ontology with the CDM will specify the terminologies to be used for specific properties. Additionally, there are multiple ways to represent properties such as tumor marker test results (as discussed in section 4.6), either as a finding (e.g. triple negative) or as an observation ( with an attribute-value ) (e.g. ER negative, PR negative, HER2 negative). The hyper-ontology will clarify the representation and usage of these properties when populating the CDM. These topics will be discussed with experts in the WP5 CDM and hyper-ontology working group and incorporated into the next version of the EUCAIM CDM and Hyper-ontology.
7. Demonstration scenarios
To evaluate the efficacy of the EUCAIM CDM and Hyper-ontology, four proof-of-concept scenarios are provided for mapping and structuring clinical and imaging metadata related to prostate and breast cancer information. This information is provided by four AI4HI projects: ProCAncer-I and INCISIVE for the prostate cancer scenario, which adopt the OMOP-CDM and FHIR standards respectively, as well as the CHAIMELEON and EuCanImage projects for the breast cancer related scenarios, that they adopt an OMOP-like CDM and FHIR standards respectively. Two main demonstration strategies are introduced per cancer type: 1) semantic-based and 2) syntactic-based. The semantic-based strategy aims to demonstrate hyper-ontology's completeness in representing knowledge from real-world scenarios by populating the ontology semantic content (concepts and relations) using individuals extracted from the provided use cases. For the syntactic-based, the objective is to ascertain the usability of the hyper-ontology in instantiating the EUCAIM-CDM.
7.1 Prostate Cancer Use Cases
ProCAncer-I Scenario
The following case is a real case scenario for a patient registered into the ProCAncer-I platform:
Patient’s journey
The patient is a 59-year-old male, with a PSA value equal to 7.16 (ng/mL) and free PSA equal to 5.04 (ng/mL). The patient had a positive digital rectal examination, and he was sent by the urologist to perform a multiparametric MRI. The MRI that was performed 22 days after the PSA lab test was deemed positive, and revealed a PI-RADS 5 lesion, with a max diameter of 10mm, in the right peripheral zone basal posterolateral, with a clinical stage of cT2b, cN0. The patient underwent a fusion biopsy, which revealed a cT2 cancer stage. Because of the positive findings the patient was referred to perform a prostatectomy. The results of the prostatectomy also confirmed the positive findings, revealing a 4+3 Gleason score lesion of an overall volume of 0.7cc of 17mm maximum diameter, with stage pT3b, pN0, and intraductal carcinoma. After 6 months, MRI and PET examinations were performed, where a liver metastasis was identified with reported stage cNX, cM1c.
Hyper-Ontology Population
Figure 25. A semantic representation and inference of the ProCAncer-I prostate cancer use case (Protege)
The real-world scenario provided by ProCAncer-I around prostate cancer is considered to (manually) extract a set of instances (individuals) and associate them with the hyper-ontology classes/concepts. Semantic relationships are maintained among the individuals considering the use case presented scenario and the object properties specified in the hyper-ontology. Figure 25 depicts the population results. In this scenario, a diagnosis has been performed on a patient, including imaging (e.g., multiparametric MRI and fusion biopsy) and clinical/surgical procedures (e.g., digital rectal examination and prostatectomy). Different imaging and pathologic results have been interpreted based on the performed procedures, such as imaging assessment observations (e.g., PI-RADS 5), histological grading (e.g., 4+3 Gleason score), clinical staging (e.g., cT2b, cN0), and pathologic staging (e.g., pT2, pN0). The tumor's maximum dimension and volume have been considered throughout the diagnosis. Also, the PSA labLab test was performed on the patient.
By assigning the various information to their semantic reference, the hyper-ontology is populated with real-world details with which the logic reasoner (Pellet in this example) has deduced the complete diagnosis, including the imaging and clinical results.
Model Instantiation
The EUCAIM CDM instantiation of the clinical and imaging related information is provided below along with a graphical representation of the events and timepoints in the patient’s journey.
Figure 26: The ProCAncer-I prostate cancer patient journey.
Figure 27: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer clinical information.
Figure 28: The EUCAIM CDM instantiation with the ProCAncer-I prostate cancer imaging information.
INCISIVE Scenario
The following case is a real case scenario for a patient registered into the INCISIVE platform:
Patient’s journey
The patient is a 74-year-old white male with a history of Dyslipidemia, who initially presented with painful ejaculation. An MRI scan revealed a tumor with a PIRADS score of 4. His PSA level was measured at 5.6, and staging was determined as T1, N0, M0. One month later, a targeted biopsy was performed, resulting in a Gleason score of 6 and ISUP grade 5. Two months post-diagnosis, the patient underwent a radical prostatectomy. Follow-up screenings began one month after surgery, showing a complete response with a PSA level of 0.04. Subsequent PSA tests were conducted 2, 5, 9, and 12 months after surgery, with values of 0.07, 0.04, 0.04, and 0.04 respectively.
Figure 29: The INCISIVE prostate cancer patient journey.
Hyper-Ontology Population
Similarly to the ProCAncer-I scenario, we assigned the individuals extracted fro the INCISIVE use case to the hyper-ontology concepts and relations. Figure 30 depicts the population results. In this scenario, a diagnosis has been performed on a patient who is initially suffering from Dyslipidemia, including imaging (e.g., MRI scan and biopsy) and clinical/surgical procedures (e.g., radical prostatectomy). Different results have been interpreted based on the performed procedures, such as imaging assessment observations (e.g., PI-RADS 4), histological grading (e.g., Gleason score 6, ISUP grade 5), and cancer staging (e.g., T1, N0, M0). PSA lab tests were also performed throughout the diagnostic process. By running the Pellet reasoner, the complete cancer patient diagnosis, including the imaging and clinical interpretation results, is deduced (see Figure 30 ).
Figure 30. A semantic representation and inference of the INCISIVE prostate cancer use case (Protege)
Model Instantiation
Figure 31: The EUCAIM CDM instantiation with the INCISIVE prostate cancer clinical information.
7.2 Breast Cancer Use Cases
CHAIMELEON Scenario
The following case is a real case scenario for a patient registered into the CHAIMELEON platform:
Patient’s journey
A Mammography and Ultrasound were performed on a 59-year-old female patient born in February 1957 that detected a lump in her breast. The Ultrasound indicated suspicious cancer (BI-RADS 5). For that reason, a fine needle aspiration biopsy of breast was performed 6 weeks later, confirming the suspicion, diagnosing her with Ductal Carcinoma grade II, cT2N0, RE positive, RP positive, HER2 negative, and Ki67 at 12%. A thorax, abdomen, and pelvis CT scan 2 weeks after the biopsy showed no evidence of metastatic disease, confirming the clinical stage of the patient to cT2N0M0 (stage IIA) .
The patient received neoadjuvant therapy, starting one month after the CT scan. A radical mastectomy was performed six months after the chemotherapy, and no tumor was found (pT0N0). Another thorax, abdomen, and pelvis CT scan was performed 3 weeks after surgery, showing no evidence of metastatic disease (M0). Six weeks after surgery, the patient began a hypofractioned stereotactic radiotherapy and has achieved a complete response.
Figure 32. The CHAIMELEON breast cancer patient journey.
Hyper-Ontology Population
As for prostate cancer use cases, we populate the hyper-ontology with real-world breast cancer individuals (Figure 33). In this scenario, different procedures, including mammography, ultrasound, and fine needle aspiration biopsy of breast, have been performed on a female patient. Different pathologic and imaging results have been interpreted based on the performed procedures, such as tumor diagnosis (Ductal Carcinoma grade II), imaging assessment observations (e.g., BI-RADS 5), clinical staging (e.g., cT2, cN0), and tumor marker test results (e.g., ER positive, PR positive). Also, radiotherapy procedure (hypofractioned stereotactic radiotherapy) has been performed with a complete response associated result. The complete diagnosis, including the imaging and pathologic interpretation results, has been inferred and generated by the logic reasoner as depicted in Figure 33.
Figure 33. A semantic representation and inference of the CHAIMELEON breast cancer use case (Protege)
Model Instantiation
Figure 34. The EUCAIM CDM instantiation with the CHAIMELEON breast cancer clinical information.
EuCanImage scenario
The following case is a real case scenario for a patient registered into the EuCanImage platform:
Patient’s journey
The patient is a 50-year-old postmenopausal female individual who has never breastfed and has never been pregnant. There is a history of breast cancer in a second degree relative. There is no family history of ovarian cancer. The patient has never used hormone replacement therapy or hormonal contraception. Based on a mammography followed by a needle biopsy one month later, the patient was diagnosed with triple-negative cancer of the right breast, Grade I LCIS histological type and clinical stage of cT1N1M0. In addition, the following tumor characteristics were assessed: ER 0% ,PR 0%, HER2 IHC negative, and Ki67 0%. Neoadjuvant chemotherapy (NAC) with Doxorubicin was started 1 month after the results of the pathological report from the biopsy, lasting about 6 months. After NAC treatment, the patient underwent breast surgery where the pathology report revealed: ypT0N0M0.
Figure 35. The EUCANIMAGE breast cancer patient journey.
Hyper-Ontology Population
Based on the EuCanImage breast cancer scenario, we have populated the hyper-ontology concepts and relations with real-world instances (Figure 36). In this scenario, procedures, such as mammography, needle biopsy, and breast surgery, have been performed on a postmenopausal female patient who has never used hormone therapies. Different interpretation results have been identified, such as tumor diagnosis (triple-negative cancer Grade I LCIS), clinical staging (e.g., cT1, cN1, ypT0), and tumor marker test results (e.g., ER 0%, PR 0%, HER2 negative). The complete diagnosis, including interpretation results, has been inferred and generated by the logic reasoner as depicted in Figure 36.
Figure 36. A semantic representation and inference of the EuCanImage breast cancer use case (Protege)
Model Instantiation
Figure 37 The EUCAIM CDM instantiation with the EuCanImage breast cancer clinical information
From the populating and instantiating validation tasks, we assume that the hyper-ontology has successfully represented domain-specific knowledge in oncology acquired from real-world prostate and breast cancer scenarios, and fulfilled the requirement of seamless integration with EUCAIM-CDM for the instantiation process.
8. Future work and perspective
In further works, we are interested in extending the hyper-ontology cancer types to include new types with the support of clinical experts. The extension process will consider the new use cases expected to be provided by the hospitals or laboratories that will join the EUCAIM community. Besides, the imaging and clinical metadata required for federated querying will be specified explicitly in the hyper-ontology model to permit seamless integration with heterogeneous local datasets and efficient access to these data.
However, one of the main challenges we need to address is the sustainability and evolution of the hyper-ontology facing the continuous syntactic and semantic updates of standard terminologies/ontologies and data models or standards (OMOP/FHIR), especially after the project completion.
The long-term sustainability of the Common Data Model (CDM) and Hyper-Ontology are critical to the success of EUCAIM. We recognize that interoperability is not just a technical challenge but also an organizational one that requires ongoing commitment. To this end, we plan to explore various strategies, including:
Having a clear data governance framework, to oversee the evolution of the hyper-ontology, ensures that changes are managed in a controlled manner.
Developing a clear roadmap for the development of the CDM and the hyper-ontology, including regular updates, and adaptation to new technologies and standards. Currently, as we develop the hyper-ontology, we are creating distinct versions, each with unique identifiers and appropriate metadata and documentation in order to track its evolution. All versions are released periodically and published on Zenodo.
Encouraging contributions and feedback from the wider community to ensure that the CDM and hyper-ontology remain comprehensive, up-to-date, and reflective of the needs of all stakeholders. Towards this end, we have also committed to submitting research papers to workshops and conferences outlining our approach.
Identifying the resources, both financial and human, required to support the ongoing maintenance and development of the hyper-ontology and the CDM. This might include seeking funding, establishing partnerships, or generating revenue through specific services. This strategy will be further explored in collaboration with WP8.
Finally, we plan to make an impact assessment for becoming compliant to the CDM and the Hyper-Ontology on new data holders providing data or joining EUCAIM. This impact assessment will be two-fold: a) identify the challenges to be faced for complying to the CDM and hyper-ontology but also b) identify the benefits that result from successfully complying with such a framework.
Regarding the challenges, we plan to identify and assess the effort required by new data holders to manage and structure their data according to the hyper-ontology and CDM specifications. This might entail training sessions with clinical and technical staff, therefore assessing the total time and the resources required for new data holders to achieve compliance, and possibly identifying any obstacles they might face in the process.
Regarding the benefits, data holders complying to the CDM might potentially achieve increased interoperability as they will gain the ability to share and integrate data with other entities within the EUCAIM network. Compliance will also ensure that their data meets high standards of quality, aligning with international best practices and standards, and enhancing the credibility of their data contributions.
However, to do such an impact assessment, we need an evaluation process that could include:
Conducting surveys and interviews with potential new data holders to understand their current data management practices, capabilities, and readiness for compliance with the CDM and Hyper-Ontology.
Documenting the specific changes and adaptations new data holders would need to make.
Identifying common issues and problems through the EUCAIM helpdesk and the support groups, which can help us gather feedback on the compliance process and thus make necessary adjustments to our approach if necessary, specifically in cases where data holders consistently experience certain issues.
Split the onboarding process into stages/tiers, which we have already defined, so that we distribute the required effort across multiple stages till the final adoption of the EUCAIM CDM in order to minimize but also track possible issues/problems.
To achieve this, we will closely collaborate with WP2 and WP4 respective teams.
9. Conclusion
This deliverable presents the initial version of the EUCAIM CDM and hyper-ontology for data interoperability. In relation to the first deliverable (D5.1), this document provides a well-established analysis of the strategy developed to achieve the initial goals of the EUCAIM CDM and hyper-ontology. Publications submitted and accepted during the hyper-ontology development support the work accomplished.
Regarding the hyper-ontology development process, we encountered challenges in the knowledge acquisition phase (Section 4.3) to collect the standard clinical/biological and imaging data/metadata provided by the AI4HI projects. For the clinical knowledge, some data/metadata were customized depending on the projects’ resources, or standard code/vocabulary was lacking, which required an effort to associate this information with standard ontological/terminological resources. For imaging knowledge, the provided data/metadata was mainly DICOM tags and names used for image querying or segmentation, which is insufficient for a semantic representation of imaging knowledge in the hyper-ontology. Interestingly, the proposed approach (Section 4.4) has helped to overcome these challenges. First, the ORSD document was produced, which helped to organize all the collected data and metadata and classify them by cancer type and project, facilitating the detection of inconsistencies and lack of information. Second, the grounding of the hyper-ontology in mCODE has supported covering the essentials of the oncology domain, mainly for clinical aspects. For the imaging model, we relied on FHIR specifications around Imaging study and Series, and their relationships with Modality, Laterality, and other imaging aspects. Although the bottom-up strategy, which relies on the projects’ clinical and imaging knowledge, is crucial for developing the hyper-ontology as a domain and application-oriented ontology, the top-down has maintained the ontological model by grounding the hyper-ontology in the oncology domain. Also, the intervention of experts in revising and enriching the semantic content hyper-ontology has enhanced the generic content and expanded it by including clinically verified semantic patterns. Finally, the hyper-ontology is validated by:
1- efficiently and explicitly representing the provided use cases by populating the hyper-ontology semantic content, including the concepts and relations, based on the individuals (instances) harvested from these use cases (Section 7);
2- instantiating the EUCAIM-CDM to represent real-world use cases around prostate and breast cancers using the hyper-ontology concepts (Section 7);
3- applying SPARQL queries to request cancer patient information, such as lab tests, procedures, imaging and clinical results (Annex 1).
Interestingly, EUCAIM's hyper-ontology, a FAIR-compliant ontology model that effectively reflects oncology’s real-world entities, has supported a seamless integration with the EUCAIM CDM, a significant fulfillment for maintaining semantic interoperability in the context of the EUCAIM project.
10. Publications
El Ghosh, M., Kalokyri, V., Sambres, M., Vaterkowski, M., Duclos, C., Tannier, X., Taskou, G., Tsiknakis, M., Daniel, C., and Dhombres, F. (2024). Towards semantic interoperability among heterogeneous cancer image data models using a layered modular hyper-ontology. In FOIS 2024.
El Ghosh, M., Kalokyri, V., Sambres, M., Vaterkowski, M., Duclos, C., Tannier, X., Taskou, G., Tsiknakis, M., Daniel, C., and Dhombres, F. (2024). From syntactic to semantic interoperability using a hyper-ontology in the oncology domain. In MIE 2024.
El Ghosh, M., Daniel, C., Duclos, C., Kalokyri, V., Charlet, J., Sambres, M., Tsakou, G., Tsiknakis, M., and Dhombres, F. (2024). Grounding a hyper-ontology on mCODE ontological conceptual model and foundational ontologies for semantic interoperability in the oncology domain. In FOAM@FOIS 2024.
11. ANNEX
Annex 1: SPARQL Queries
Based on the information acquired from the prostate cancer use cases (Section 7), SPARQL queries are applied to request the hyper-ontology regarding diagnosis details. In the following, we give some examples of SPARQL queries to question the cancer patients (COM1001051) who:
had a PSA (CLIN1000227) lab test and to return the PSA levels (Query1);
underwent a prostatectomy (CLIN1000248) and to return the associated pathological interpretation results (Query2);
were subject to imaging procedures and to return the associated imaging interpretation results (Query3).
PREFIX ho: <https://cancerimage.eu/ontology/EUCAIM#>
Query1: SELECT ?p ?r WHERE {
?p rdf:type ho:COM1001051 .
?p ho:Is_Subject_For ?a .
?a rdf:type ho:CLIN1000227 .
?a ho:Has_Value ?r . }
For Query1, both patients of the ProCAncer-i (uc1) and INCISIVE (uc2) use cases have done the PSA lab test. Thus, by executing Query1, we obtain the following results:
ProCAncer-I_patient : PSA level = 7.16
INCISIVE_patient : PSA level = 0.04
INCISIVE_patient : PSA level = 0.07
INCISIVE_patient : PSA level = 5.6
Query2: SELECT ?p ?a ?r WHERE {
?p rdf:type ho:COM1001051 .
?p ho:HasUndergone ?a .
?a rdf:type ho:CLIN1000248 .
?a ho:Has_pathologic_interpretation_result ?r . }
For Query2, only the ProCancer-i patient (uc1) underwent a prostatectomy with different pathologic interpretation results. Thus, the response of this query is obtained as follows:
ProCAncer-I_patient : prostatectomy -> Result: intraductal_carcinoma
ProCAncer-I_patient : prostatectomy -> Result: pN0
ProCAncer-I_patient : prostatectomy -> Result: pT3b
ProCAncer-I_patient : prostatectomy -> Result: 4+3_Gleason_score
Query3: SELECT ?p ?a ?r WHERE {
?p rdf:type ho:COM1001051 .
?p ho:Is_Subject_For ?a .
?a ho:Has_imaging_interpretation_result ?r . }
For Query3, the ProCancer-i patient (uc1) was subject to multiparametric MRI and fusion biopsy with the following interpretation results: PI-RADS 5, cT2b, cN0, and pT2. Meanwhile, the INCISIVE patient (uc2) was subject to MRI scan and biopsy with the following results: PI-RADS score 4, Gleason score 6, and ISUP grade 5. By executing Query3, we obtain the following results:
INCISIVE_patient : MRI_scan -> Result: PIRADS_score_of_4
INCISIVE_patient : biopsy -> Result: Gleason_score_of_6
INCISIVE_patient : biopsy -> Result: ISUP_grade_5
ProCAncer-I_patient : fusion_biopsy -> Result: pT2
ProCAncer-I_patient : multiparametric_MRI -> Result: PI-RADS_5
ProCAncer-I_patient : multiparametric_MRI -> Result: cN0
ProCAncer-I_patient : multiparametric_MRI -> Result: cT2b
EUCAIM D5.1. Early release of the Data Federation Framework, 2023 https://cancerimage.eu/wp-content/uploads/2023/10/D5.1_Early-release-of-the-Data-Federation-Framework_vf.pdf ↑
EUCAIM D5.1. Early release of the Data Federation Framework, 2023 https://cancerimage.eu/wp-content/uploads/2023/10/D5.1_Early-release-of-the-Data-Federation-Framework_vf.pdf ↑
https://healthdcat-ap.github.io/ ↑
https://www.fairdatapoint.org/ ↑
Adopt OMOP conceptual model (terminologies), ↑
https://catalogue.eucaim.cancerimage.eu/ ↑
https://github.com/OHDSI/Athena ↑
https://data.bioportal.lirmm.fr/documentation ↑
https://documentation.uts.nlm.nih.gov/rest/home.html ↑
https://build.fhir.org/ig/HL7/fhir-mCODE-ig/ ↑
https://nemo.inf.ufes.br/en/projetos/ufo/ ↑
https://ontouml.org/ ↑
https://www.w3.org/OWL/ ↑
https://protege.stanford.edu/ ↑
https://github.com/stardog-union/pellet ↑
https://tehdas.eu/app/uploads/2023/09/tehdas-recommendations-on-a-data-quality-framework.pdf ↑
https://build.fhir.org/ig/HL7/fhir-mCODE-ig/ ↑
Guérin, J., Laizet, Y., Le Texier, V., Chanas, L., Rance, B., Koeppel, F., Lion, F., Gourgou, S., Martin, A. L., Tejeda, M., Toulmonde, M., Cox, S., Hess, E., Rousseau-Tsangaris, M., Jouhet, V., & Saintigny, P. (2021). OSIRIS: A Minimum Data Set for Data Sharing and Interoperability in Oncology. JCO clinical cancer informatics, 5, 256–265. https://doi.org/10.1200/CCI.20.00094 ↑
https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems ↑
Varvara Kalokyri et al., MI-Common Data Model: Extending Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM) for Registering Medical Imaging Metadata and Subsequent Curation Processes. JCO Clin Cancer Inform 7, e2300101(2023). DOI:10.1200/CCI.23.00101 ↑
https://build.fhir.org/ig/HL7/fhir-mCODE-ig/StructureDefinition-mcode-primary-cancer-condition.html ↑
Last updated