4. Data interoperability framework for federated query
Upon the completion of dataset cataloguing procedures, which involves publishing only aggregated metadata for the datasets, the subsequent interoperability tier is the provision of federated query support.
For enabling federated query, data holders should implement a semantic interoperability layer across their datasets, which includes a) developing a mapping component between their local data structure and the EUCAIM hyper-ontology, and b) installing a mediator service accessible from the central services of EUCAIM.
4.1 Why do we need the EUCAIM Hyper-Ontology?
To enable federated querying across established repositories, such as the AI4HI repositories that adopt different data models/standards, the integration of a semantic interoperability layer is required. In this section, we will explore, by using examples, all the challenges that have been identified in querying the OMOP-CDM and FHIR-based AI4HI repositories. These challenges will serve as requirements for the development and application of the hyper-ontology within the context of EUCAIM.
Starting with the example of the PSA (Prostate Specific Antigen), a tumor marker for prostate cancer, its representation varies across repositories; it is represented via the SNOMED-CT standard (Prostate specific antigen measurement (4272032)) or the LOINC standard (Prostate specific Ag [Mass/volume] in Serum or Plasma (LP18192-2)). This variability in the representation across different standards poses a challenge when a user wishes to execute a federated query to “find datasets with ‘PSA’ levels over 20”, raising questions about which standard concept to use for querying, which repository uses which one of the two concepts for PSA and whether these two standard concepts are semantically equivalent or not. Addressing these questions is crucial for enabling different repositories, utilizing different standards, to accurately respond to a query regarding PSA levels.
The EUCAIM hyper-ontology should be designed to specify the relationship between such concepts (see Figure 1 for an example). Despite potentially numerous similar standard concepts for PSA, users will be able to select a specific concept, like the LOINC one, for query execution. In this case, the local mediator or service should be able to understand these concept relationships, accurately map them through the hyper-ontology specification, and return query results.
However, only specifying concept relationships in the hyper-ontology isn't sufficient for queries concerning quantitative variables. These queries must specify not only the variable of interest but also its measurement unit, since repositories might encode the same concept in different units. For instance, two repositories could use the same LOINC concept for PSA, but report values in ng/mL and nmol/L, respectively. Thus, local nodes must convert measurements to match the requested unit, necessitating that the hyper-ontology includes a "units of measure" vocabulary, such as UCUM, and possibly a default or preferred unit of measure.

Figure 1: An excerpt of the hyper-ontology (v1.0beta) around representing PSA concepts and their relations
Therefore, the hyper-ontology:
should contain a formal representation of medical concepts/terms, and their relationships within the oncology domain.
serves the purpose of providing a comprehensive vocabulary/terminology to cover the source data.
Beyond semantic interoperability, addressing the syntactic heterogeneity of data models and standards is also crucial for enabling querying. For instance, OMOP-CDM organizes concepts into various domains (e.g., Condition, Measurement, Procedure, etc.), while the FHIR standard categorizes similar concepts within a set of resources (e.g., Observation, Condition, Medication, etc.). Specifically, the PSA concept is represented as a concept within the Measurement domain in OMOP-CDM and as a concept in the Observation resource type in FHIR. Therefore, the hyper-ontology should also specify the corresponding "class" or "entity" of a concept to facilitate accurate querying. It is also important to recognize that different PSA concepts may correspond to different classes/entities based on their semantics. For example, PSA might refer to a laboratory test with a numerical value (e.g., PSA=20 ng/ml), classified as a "Measurement" in OMOP-CDM or an "Observation" in FHIR. Alternatively, PSA can indicate a "Procedure", denoting whether a patient has undergone a PSA test/procedure (PSA=yes/no), or it can represent a "Condition", reflecting an abnormal PSA level (PSA=normal/abnormal), which implies an elevated PSA without specifying the exact value (see Figure 1). This multifaceted nature of concepts necessitates a comprehensive approach in the hyper-ontology to ensure queries can be accurately executed across different data standards and models.
Therefore, the hyper-ontology:
should link the concepts from clinical standard terminologies to the corresponding CDM classes of OMOP and FHIR (similar to how OMOP vocabularies specify the Domain of a concept) (see Figure 1 for an example).
Nonetheless, syntactic heterogeneity remains problematic, even in the same data model. While both OMOP-CDM and FHIR are able to represent a wide range of clinical information, they both allow storing the same piece of information in different ways. To avoid this inconsistency, we need a common way of representation of such concepts. For example, the metastasis cancer staging values of M1 from the “TNM” (Tumor-Node-Metastasis) category could be represented in two different ways: a) as a concept “AJCC/UICC 7th pathological M1a Category”, which is a Cancer Modifier concept of the “Measurement” domain in the OMOP-CDM, or b) as a NAACCR concept “TNM Path M” of the “Measurement” domain with value “pM1a” of the “Meas Value” domain. Therefore, there is the need to decide how to represent the information: by either its complex form or by using a combination of atomic concepts. This binding information should be attached to the hyper-ontology concepts, linking to its corresponding CDM attribute, by including annotations in the hyper-ontology (see Figure 2 for an example).
Finally, for being able to formulate queries spanning multiple associated classes in the field of oncology (e.g. retrieve number of patients within a dataset that have had prostatectomy), we need a common meta-model that bridges classes/entities between OMOP-CDM and FHIR.
Therefore, the hyper-ontology:
should abstract concepts over both data models, and act as a common meta-model so that queries can be formulated.

Figure 2: An excerpt of the hyper-ontology (v1.0) around combining atomic concepts (TNM Path M, pM1a) to represent specific concepts (AJCC/UICC 7th pathological M1a Category)
As an example, the hyper-ontology could define two classes “Cancer Patient” and “Surgical Procedure”, and a relationship “hasUndergone” linking the two classes. (Figure 3) Through the federated query service, a user could formulate a query based on the hyper-ontology targeting the “Cancer Patient” class and based on the “hasUndergone” relationship query the number of patients that have had a “Surgical Procedure” and more specifically a “Prostatectomy”. In this scenario, the local mediator service in each local node should translate the hyper-ontology based query to the local db schema specific query (e.g. SQL query for an OMOP-CDM relational database) based on the semantics and mappings defined in the hyper-ontology (the hyper-ontology should define the mappings of the “Cancer Patient” to the corresponding classes in both OMOP-CDM (e.g. to the ‘Person’) and FHIR (to the ‘Patient’) and how a “Prostatectomy” maps to a procedure record in the two local schemas.

Figure 3: An excerpt of the hyper-ontology (v1.0) around Cancer Patient represented in Protege.
4.2 The EUCAIM Hyper-Ontology
The EUCAIM hyper-ontology is a common semantic meta-model that aims to support and maintain semantic interoperability among heterogeneous cancer image data models/standards. The hyper-ontology model defines a structured and controlled vocabulary permitting disparate and heterogeneous data models/standards to easily and unambiguously communicate and integrate. Using the hyper-ontology, the real-world meaning of essential medical and imaging data/metadata is preserved and exchanged in a standardized, consistent, and meaningful way. Therefore, the main challenge of the hyper-ontology is to facilitate integration and interoperability among data stored and modeled using diverse heterogeneous clinical and imaging data models and associated terminologies. EUCAIM’s hyper-ontology is not only a domain ontology that reflects the essentials of the oncology domain for the clinical and imaging contexts but also an application ontology that permits the exploration of data collections, federated querying and processing, and image annotation/segmentation.
4.3 Data Resources
The main data resources for building the hyper-ontology are the clinical and imaging data/metadata provided by the AI4HI projects CHAIMELEON, ProCAncer-I, EuCanImage, INCISIVE, and PRIMAGE. While the clinical knowledge is provided as standard concepts from various terminologies/ontologies (e.g., SNOMED, LOINC, NAACCR, UCUM, etc.) following the OMOP/FHIR data models/standards, the imaging knowledge is provided either as DICOM tags/values, or standardized concepts from RadLex. The clinical knowledge is collected as use cases (total of 12 UC) organized per cancer type, including information regarding cancer types and subtypes, therapeutic/surgical procedures, cancer staging/grading systems and values, affected body parts, lab tests, etc. Meanwhile, information about image studies, segmentation, or querying is collected from the imaging knowledge. In the following, we outline the diversity of representing data using different models/standards and specifying the minimum common required data among the different projects.
Clinical and biological knowledge: ProCAncer-I and CHAIMELEON have adopted OMOP as a CDM, and INCISIVE and EuCanImage have adopted FHIR as a data standard. This diversity has affected not only the terminologies/ontologies used to represent clinical data/metadata, but also the syntactic assignment of concepts to OMOP/FHIR entities/classes. We differentiate between projects that adopt 1) the same data models or 2) different data models/standards.
ProCAncer-I and CHAIMELEON have adopted OMOP and OMOP-Like[5] as CDM, respectively, but represented clinical concepts using different terminologies/ontologies (semantic level) and assigned different classes/entities to these concepts (syntactic level). As an example, the metastasis cancer staging values of M1 from the TNM (Tumor-node-metastasis) category are represented using two different ways in these projects: 1) AJCC/UICC 7th pathological M1a Category, which is a Cancer Modifier concept of the Measurement OMOP domain; 2) TNM Path M, a NAACCR concept of the Measurement OMOP domain with value pM1a of the Meas Value domain.
ProCAncer-I and INCISIVE have adopted OMOP and FHIR, respectively. ProCAncer-I represents the PSA (prostate-specific antigen measurement) lab test using SNOMED and assigned it to the domain Measurement. Meanwhile, in INCISIVE, PSA is represented using LOINC and assigned to the resource Observation.
Imaging knowledge: diverse types of imaging data/metadata are provided.
For instance, in ProCAncer-I, imaging metadata attributes are defined for querying DICOM_SEG (e.g., study_uid (0020,000D), slice_thickness (0018,0050)). Besides, values are given for imaging attributes, such as segment label (e.g., PZ, TZ, CZ, SV) and method (e.g., Manual, Semiautomatic, Automatic). Standard imaging concepts are also defined in ProCAncer-I, such as Laterality (Radlex, RID5821), Anatomic Region (Radlex, RID13390), and Patient Position (Radlex, RID10420). However, while in CHAIMELEON, DICOM tags are given (e.g., SeriesDescription (0008,103E), BodyPartExamined (0018,0015), etc.), image annotation labels are provided by the INCISIVE project (e.g., Suspicious, Problematic, Malignant, Benign, lymph node, etc.).
Given the diversity and disparity of clinical and imaging knowledge on the semantic and syntactic levels, a common semantic meta-model is required to integrate and generalize the different terminologies/ontologies and the associated OMOP/FHIR domains/resources, permitting seamless communication and information exchange among the heterogeneous cancer image data models.
4.4 Development Process
We propose an iterative, systematic, formally, and semantically well-founded approach to the hyper-ontology development process. The proposed approach helps to simplify the hyper-ontology construction, facing the complexity and heterogeneity of the application domain and the diversity and disparity of the provided clinical and imaging knowledge. Six main phases are defined in this approach (see Figure 4).

Figure 4. An illustration of the Hyper-ontology iterative development process
4.4.1 Requirements Analysis and Specification
After a set of meetings with users and experts from the EUCAIM community, we define the following elements:
Purpose: To support semantic interoperability by integrating heterogeneous cancer image data models in a common semantic meta-model, which provides the ontology-based standard and structured vocabulary of the oncology domain and the associated semantic relations. Besides, to ensure seamless integration with EUCAIM-CDM, permitting consistent mapping with local nodes, thereby federated querying of data collections.
Scope: To cover the basics of the oncology domain based on the clinical and imaging knowledge provided by the AI4HI projects, including the following cancer types: prostate, breast, rectum, lung, colon, colorectal, and liver.
Intended uses and users: To explore data collections through the Public Catalogue[6], Federated Querying (federated search of aggregated data in the collections), and semantic annotation/segmentation of cancer images.
Hyper-ontology's main users are data users/researchers, persons, or entities that want to explore the public catalog, eventually, request access to data, and process it using the tools available on the platform or their own AI tools.
Example of a Data User-Researcher with an experimental lab profile: A Data User-Researcher is leading a project related to prostate cancer. One of the objectives is to allocate treatment based on the analysis of baseline Magnetic Resonance (MR) images at the time of diagnosis. The research team will incorporate AI tools and experience in interpreting the results obtained and applying them in a clinical setting for routine clinical practice.
Example of a Data User-Researcher with a Data Scientist profile: A Data Scientist is developing an AI tool to analyze health images and related clinical and molecular data on the most prevalent cancers in Europe. They have an initial model they want to improve with new data. They seek quality and labeled data and do not accept unstructured data or data without a logical folder structure.
Requirements: Two main types of requirements are defined:
Non-Functional Requirements (NFRs):
NFR1: To support the English language.
NFR2: To comply with the FAIR principles.
NFR3: To align with the General Data Regulation Protection (GDPR).
NFR4: The terminology in the hyper-ontology must be taken from validated biomedical ontologies and standardized terminologies.
NFR5: The ontology model should be extensible to handle the periodical updates of semantic standards and to include future ontological aspects and cancer types.
Functional Requirements (FRs): These are stated as competency questions (CQs) based on the clinical and imaging knowledge provided by the AI4HI projects. We give some examples of FRs and their correspondent CQs/Answers in the following:
FR1: To define the basic cancer types.
CQ1: What are the leading cancer types?
Prostate cancer, Colon cancer, Breast cancer, Rectal cancer, Lung cancer, Neuroblastoma, Diffuse intrinsic pontine glioma, Colorectal Cancer, Primary malignant neoplasm of liver, Malignant neoplasm of colon and/or rectum, Primary malignant neoplasm of breast.
FR2: To define the specific cancer types.
CQ1: Are there any specific types of breast cancer?
Primary malignant neoplasm of female breast (SNOMEDCT, 363346000), Primary malignant neoplasm of breast with axillary lymph node invasion (disorder) (SNOMEDCT,1082901000112103)
CQ2: Are there any specific types of prostate cancer?
Benign prostatic hyperplasia (SNOMEDCT, 266569009), Hormone refractory prostate cancer (SNOMEDCT, 427492003), Hormone sensitive prostate cancer (SNOMEDCT, 722103009)
CQ3: Are there any specific types of liver cancer?
Liver cell carcinoma (disorder) (SNOMEDCT,109841003), Secondary malignant neoplasm of liver (SNOMEDCT, 94381002)
FR3: To define the main tumor staging methods and values.
CQ1: What tumor staging methods are specified for breast cancer?
Edition of American Joint Commission on Cancer, Cancer Staging Manual used for TNM staging (observable entity) (SNOMEDCT, 443941007)
CQ2: What tumor staging (categorical) values are specified for breast cancer?
American Joint Committee on Cancer clinical T category allowable value (qualifier value) (SNOMEDCT, 1222585009), American Joint Committee on Cancer clinical N category allowable value (qualifier value) (SNOMEDCT, 1222588006), Tumor histopathological grade status values (tumor staging) (SNOMEDCT, 258244004)
FR4: To define the histology types of cancers.
CQ1: Are there any histology types specified for prostate cancer?
Acinar cell carcinoma of prostate gland (ICD-O-3) Intraductal carcinoma, noninfiltrating, NOS, of prostate gland (ICD-O-3), Infiltrating duct carcinoma, NOS, of prostate gland (ICD-O-3), Transitional cell carcinoma, NOS, of prostate gland (ICD-O-3), Adenosquamous carcinoma of prostate gland (ICD-O-3)
FR5: To define the necessary lab tests for cancer types
CQ1: Are there any lab tests specified for prostate cancer?
Free prostate specific antigen level (SNOMED), Total PSA level (SNOMED), Free:total PSA ratio (SNOMED), Prostate specific antigen normal (SNOMED)
These requirements, specified as CQs/Answers, are documented in the Ontology Requirements and Specifications Document (ORSD). This document has helped to simplify the Hyper-ontology development process by clarifying the intended content, on which the ontology granularity level depends. In addition, ORSD permits tracking the inconsistencies or lack of information the local nodes provide. The ORSD v1 is available at the following link: https://doi.org/10.5281/zenodo.11109765
Additional functional requirements are defined by EUCAIM experts to help overcome the heterogeneity and disparity of clinical data. We give two examples as follows:
FR6: To ensure the correspondence of concepts to their domains/resources in OMOP and FHIR CDMs. For instance, Primary malignant neoplasm of breast and Chemotherapy have Condition and Procedure as FHIR resourceType and OMOP domain, respectively.
FR7: To represent specific concepts by combining atomic-related concepts. For instance, the cancer staging metastasis value of M1 from TNM (Tumor-node-metastasis) category could be represented in two different ways: 1) AJCC/UICC 7th clinical M1a Category, which is a concept of the Measurement OMOP domain; 2) TNM Clin M, a concept of the Measurement domain with value cM1a of the Meas Value domain.
4.4.2 Knowledge Acquisition
This phase aims to align, or map, the mandatory clinical and imaging knowledge collected from the AI4HI projects and represented in the ORSD document with standard FAIR-compliant terminological and ontological resources. The preferences in terminologies/ontologies (e.g., RadLex[12] for imaging data/metadata) are decided with the help of EUCAIM experts.
Two main types of mappings are performed in this phase:
Hierarchical-based: to build the hierarchy of the hyper-ontology, we rely on the is-a relations extracted from the standard terminologies/ontologies. The extraction process is based on the labels/concepts provided by the AI4HI projects.
Label-based: to enrich the hyper-ontology with codes from standard terminologies/ontologies, alternative labels, and definitions, an exact match similarity approach is applied, considering the clinical and imaging labels of the provided concepts.
The mappings are performed automatically using the following resources, which combine many health and biomedical vocabularies and standards to enable interoperability between computer systems: OHDSI Athena[7], BioPortal RESTful API[8], and UMLS REST API[9]. Figure 5 depicts examples of mappings.

Figure 5. An illustration of mappings with Biomedical terminologies/ontologies.
4.4.3 Design and Conceptualization
Faced with the complexity and diversity of clinical/imaging-provided knowledge and the associated mappings with various terminologies and ontologies, we propose dividing the hyper-ontology structure into layers and modules to simplify the building and extension processes. Therefore, four different layers are specified from bottom to top (see Figure 7):
Domain-Specific Layer (DSL): reflects the granularity level of the hyper-ontology since it includes the domain-specific concepts provided by the OMOP/FHIR projects.
Domain Layer (DL): includes the concepts obtained from the is-a mappings to build the hyper-ontology hierarchy. DL and DSL are maintained using a bottom-up strategy relying on the knowledge provided by the AI4HI network.
Core Layer (CL): defines the core oncology concepts. CL is maintained by considering the conceptual model of mCODE[10]. An ontological analysis is conducted based on well-known foundational ontologies, such as the Unified Foundational Ontology (UFO)[11], to develop a well-founded ontological model of mCODE. The mCODE core model explicitly defines the real-world entities of the oncology domain and their semantic relations. This approach has helped to clarify or overcome the ambiguity and heterogeneity of how well-known terminologies/ontologies defined essential clinical concepts, such as Disease and Morphology. Figure 6 depicts an excerpt of the mCODE core ontological model represented using OntoUML around the Disease characterization. OntoUML[12] is an Ontology-Driven Conceptual Modeling language where the modeling primitives reflect UFO's ontological distinctions and axiomatization.

Figure 6. An excerpt of the ontological model of mCODE around the Disease characterization represented using OntoUML
Upper Layer (UL): This layer is located at the most abstract level and defines the generic concepts of the biomedical domain, such as Disease, Laboratory, Surgical Procedure, Imaging Procedure, etc. UL and CL are developed using a top-down strategy using OntoUML.
Besides, the hyper-ontology content is divided into three generic modules: Clinical, Imaging, and Common (see Figure 7).
Clinical and Biological module: includes the pathological, diagnostic, medical, and biological data/metadata provided by the AI4HI network.
Imaging module: includes the modalities, imaging procedures, and attributes such as laterality, orientation, and position. It also defines imaging assessment, such as the PI-RADS and BI-RADS categories.
Common module: specifies mainly the qualifier values required for cancer staging/grading (e.g., pT1, pM2, cM3, Low histologic grade, etc.) or image annotation/segmentation (e.g., benign, malignant, automatic, manual, etc.). Also, unit measures, such as millimeter, percent, and cubic centimeter, are defined. Besides, Cancer Patient and the associated demographics metadata (e.g., age at diagnosis, gender, and sex assigned at birth), are included in this module.

Figure 7. An excerpt of the hyper-ontology structure
4.4.4 Formalization
The hyper-ontology model, developed using an iterative approach, is a FIR-compliant ontology model formalized as an OWL[13] (Web Ontology Language) file. Two beta versions (v0.1 and v0.2) of the hyper-ontology have been delivered and shared on Zenodo (https://doi.org/10.5281/zenodo.11109765). Table 3 presents some metrics of hyper-ontology latest version v1.0 (available at Zenodo at the following https://doi.org/10.5281/zenodo.12583826), including the source and mapping metrics. In table 4, we outline the main terminologies considered in the hyper-ontology. Parts of the formal hyper-ontology model, represented using Protege[14] are depicted and introduced in the following.
Table 3: Some metrics of hyper-ontology version 1.0
Classes
2029
Mapping to OMOP
1755
LOINC
149
SubClassOf
5395
Mapping to FHIR
353
UCUM
25
Object properties
74
Mapping to DICOM
6
RADLEX
185
Equivalence
63
SNOMEDCT
1431
DICOM
6
Synonyms
2215
ICDO3
68
CPT4
9
Cancer Types/Subtypes
148
ICD10
14
Birnlex
5
Histology/Morphology
105
ICD10PCS
9
Cancer Modifier
158
Image Modalities types/subtypes
35
NCIT
352
UMLS
1304
NAACCR
54
Table 4: List of vocabularies supported by the hyper-ontology version 1.0 classified by domain.
DOMAIN
TERMINOLOGY
Cancer Types/Subtypes
SNOMEDCT, ICDO3, ICD10, NCIT
Morphology/Histology
ICDO3, SNOMEDCT
Body Structure/Topography
SNOMEDCT, ICDO3, NCIT, RADLEX
Clinical Findings
SNOMEDCT, NCIT
Family History
SNOMEDCT
Staging/Grading (e.g., TNM staging, Gleason grading)
SNOMEDCT, Cancer Modifier, NAACCR, NCIT
Tumor Marker Test (e.g., PSA, ER, PR)
LOINC, SNOMEDCT, NCIT
Procedures (surgical, therapeutic, etc.)
SNOMEDCT, NCIT, CPT4, ICD10PCS
Medication
RxNorm, SNOMEDCT, ATC, NCIT
Patient Demographics (e.g., gender, sex, age at diagnosis)
GENDER, SNOMEDCT, LOINC
Absence/Presence Findings (e.g., negative, positive, absent, none)
SNOMEDCT, LOINC
Unit of Measure
UCUM, NCIT, SNOMEDCT
Time Pattern/Time Point (e.g., start time, follow-up)
SNOMEDCT, LOINC, NCIT
Image Modalities (e.g., MRI, CT)
RADLEX, SNOMEDCT, NCIT
Image Procedures (e.g., MRI of prostate)
SNOMEDCT, RADLEX, NCIT, ICD10PCS
Manufacturer (e.g., GE, Philips)
BIRNLEX
Image Assessment (PI-RADS, BI-RADS)
RADLEX, SNOMEDCT
4.4.4.1 Clinical and biological Module
Cancer Condition: Figure 8 depicts the Primary malignant neoplasm of prostate (SNOMEDCT:93974005), a cancer condition with associated morphology, the Malignant neoplasm (SNOMEDCT), and location, the Prostate (SNOMEDCT). The alignment with OMOP is maintained using the semantic relation “Has correspondence” and semantic annotation “OMOP_Domain_ID”.

Figure 8. Part of the hyper-ontology around the concept “Primary malignant neoplasm of prostate” represented using Protege.
Morphology: Figure 9 depicts part of the hyper-ontology around the concept of Malignant neoplasm (SNOMEDCT:1240414004), a morphologic abnormality that inheres in “Malignant neoplastic disease” (SNOMEDCT:363346000).

Figure 9. Part of the hyper-ontology around the concept “Malignant neoplasm”, represented using Protege.
Cancer Staging: Figure 10 illustrates part of the hyper-ontology around the “AJCC/UICC 7th clinical M1a Category” (Cancer Modifier:c-7th_AJCC/UICC-M1a) concept. This concept is represented using other atomic concepts, “TNM Clin M” (NAACCR:960) and “cM1a” (NAACCR:960@c1A), to solve the disparity problem of representing TNM staging (see FR7, section 4.3.1).

Figure 10. Part of the hyper-ontology around the concept “AJCC/UICC 7th clinical M1a Category” represented using Protege.
Tumor Marker Test: Figure 11 depicts part of the hyper-ontology around the concept “Prostate specific antigen measurement” (PSA). This concept is defined as Measurement in OMOP and Observation in FHIR. In the hyper-ontology, this heterogeneity is handled semantically by classifying PSA concepts as Tumor marker measurement, which is a specificity of Measurement of substance (see FR6, section 4.3.1). Meanwhile, the syntactic heterogeneity is maintained by aligning PSA to the corresponding OMOP domain and FHIR resource. PSA is semantically associated with measurement units (nanogram per milliliter, nanogram per deciliter, etc.), abnormality values (Normal, Abnormal), and cancer condition (Cancer of prostate).

Figure 11. Part of the hyper-ontology around the concept of ”Prostate specific antigen measurement” represented using Protege.
4.4.4.2 Imaging Module
Image Series: Figure 12 depicts part of the hyper-ontology around “Image Series” (NCIT:C69225), which is part of “Image Study” (NCIT:C63859). It is associated with the following elements: Body structure (SNOMEDCT:123037004), Imaging modality (RADLEX:RID10311), Patient position (RADLEX: RID10420), and Laterality (RADLEX:RID5821). These concepts are also mapped to DICOM by including the corresponding DICOM tags. For instance, Patient position and Laterality are mapped to the following DICOM tags: (0018,5100) and (0020,0060).

Figure 12. Part of the hyper-ontology around the concept of ”Image series” represented using Protege.
Image Modality: Figure 13 illustrates “MRI of breast for screening for malignant neoplasm”, a specific concept of “Imaging Modality” (SNOMEDCT:360037004) with a direct procedure site “Breast” (SNOMEDCT:76752008). Also, the Imaging Modality general category is aligned to the DICOM tag (0008,0060), permitting a syntactic integration with DICOM.
Figure 13. Part of the hyper-ontology around the concept of ”MRI of breast for screening for malignant neoplasm” represented using Protege.
The hyper-ontology supports the image annotation/segmentation task by considering (standard) specific concepts required as labels/values to annotate the cancer images, permitting a syntactic integration with DICOM SEG. For instance, the image modality label “MRI” or “MR” (Magnetic Resonance Imaging) and the laterality values “Left”/”Right” are defined as specific concepts of Imaging Modality and Laterality, respectively. On the other hand, some DICOM tags, such as segment label (0062,0005) and segment algorithm type (0062,0008), which are provided as imaging metadata, are not defined in standard terminologies/ontologies. Thereby, they are not explicitly specified in the hyper-ontology. However, their associated values, which are effectively required for annotation/segmentation tasks, are considered in the hyper-ontology. For instance, the following segment labels, PZ (peripheral zone of prostate) (RADLEX:RID347) and TZ (transitional zone of prostate) (RADLEX:RID351), are provided by ProCAncer-I as imaging metadata for DICOM SEG querying (see ORSD). They are included in the Body Structure category, specifically in the Region of prostate (SNOMEDCT:314399000). Similarly, the values associated with segment methods, Automatic, Semi-automatic, and Manual, are defined as modifiers in the Common Module. Tables 5 and 6 present the DICOM attributes mapped to the hyper-ontology imaging module and those that are not aligned but whose values are specified.
Table 5 5. DICOM tags mapped to the EUCAIM hyper-ontology (version 1.0)
DICOM name
DICOM ID
Vocabulary source ID
EUCAIM Concept ID
Patient Position
(0018,5100)
RADLEX: RID10420
IMG1016605
Body Part Examined
(0018,0015)
SNOMEDCT:52530000
BP1000024
Manufacturer
(0008,0070)
NCIT:C25392
IMG1000010
Modality
(0008,0060)
SNOMEDCT:360037004
IMG1000009
Laterality
(0020,0060)
RADLEX:RID5821
IMG1016305
Patient Orientation
(0020,0020)
RADLEX: RID10461
IMG1016610
Slice thickness
(0018,0050)
RADLEX:RID28669
IMG1016306
Echo time
(0018,0081)
RADLEX:RID12463
IMG1016641
Table 6 6. DICOM tags whose values are represented in the EUCAIM hyper-ontology (version 1.0)
DICOM name
DICOM ID
Examples of Values
Vocabulary source ID
EUCAIM Concept ID
TZ (Transition Zone of prostate),
CZ (Central Zone of prostate),
PZ (Peripheral Zone of Prostate)
RADLEX:RID351,
RADLEX:RID348,
RADLEX:RID347
BP1000100, BP1000168, BP1000006
Segment method/algorithm type
(0062,0008)
Automatic, Semi-automatic, Manual
SNOMEDCT:8359006,
NCIT:C172484,
SNOMEDCT:87982008
COM1000008, COM1000005, COM1000003
Segmentation Type
(0062,0001)
Binary
NCIT:C45969
COM1000023
Image Type
(0008,0008)
Primary,
Axial
SNOMEDCT:63161005, SNOMEDCT:24422004
COM1000017, COM1000018
4.4.4.3 Common Module
Cancer Patient: Figure 14 illustrates the “Cancer Patient” concept (NCIT:19700) and the associated semantic relations. Cancer patients are diagnosed with “Malignant neoplastic disease” and have undergone some “Surgical procedure”. “Gender” and “Sex assigned at birth” are associated with cancer patients as basic data elements following the mCODE conceptual model.
Figure 14. Part of the hyper-ontology around the concept of ”Cancer Patient” represented using Protege.
Histologic Grades: Figures 15 and 16 depict the concepts “Histological grades‘ (SNOMEDCT:370114008) and “International Society of Pathology histologic grade group” (ISUP) (SNOMEDCT:1515521000004104). The histological grades are represented in the Common Module of the hyper-ontology, specifically in the Disease Grade Qualifier category, based on two main reasons: 1) their specification as “Qualifier Value” in OMOP (Concept_Class_ID) and 2) their classification in SNOMEDCT as Qualifier value (SNOMEDCT:362981000). Meanwhile, from a clinical expert's perspective, histological grades belong to the Clinical and Biological Module, which includes Gleason findings, such as Gleason grade finding for prostatic cancer (SNOMEDCT:385377005) having “Clinical Finding” as Concept_Class_ID in OMOP. Both perspectives can be semantically handled and resolved in the hyper-ontology using the owl:equivalentProperty. For instance, Grade group 3 (Gleason score 4 + 3 = 7) (qualifier value) (SNOMEDCT:1279716004) is equivalent to the union of the following Gleason findings: 'Gleason Primary Pattern Grade 4' and 'Gleason Secondary Pattern Grade 3' (see Figure 17). Therefore, Grade group 3 (Gleason score 4 + 3 = 7) (qualifier value), which semantically belongs to the Common Module, will be automatically classified using the HermiT Reasoner as subClassOf Gleason Primary Pattern Grade 4 and Gleason Secondary Pattern Grade 3 in the Clinical Module.

Figure 15. Part of the hyper-ontology around the concept of ”Histological grades” represented using Protege.

Figure 16. Part of the hyper-ontology around the concept of ”International Society of Pathology histologic grade group” represented using Protege.
Figure 17. Part of the hyper-ontology around the concept of ”Grade group 3 (Gleason score 4 + 3 = 7)” represented using Protege
4.4.5 Evaluation and Validation
The hyper-ontology is validated as an RDF/OWL formal ontology, and its consistency is verified using Pellet[15], an OWL2 inference engine. To revise the medical and imaging content of the hyper-ontology, workshops are organized with clinical/pathologic and radiologic experts from EUCAIM’s community, considering the specified requirements formulated as Competency Questions (CQs) in the ORSD. Also, a term verification process has been performed with the help of EUCAIM (WP5) experts to verify that all terms and associated vocabularies are well considered in the ORSD and hyper-ontology as provided by the projects. Besides, meetings with a group of ontology experts are fixed to revise the semantic content of the hyper-ontology, mainly the semantic patterns applied to define specific concepts and the coherence of the hierarchy and modules. Moreover, the hyper-ontology will be evaluated according to its performance in data collection exploration through the Public Catalog, federated querying, and cancer image segmentation/annotation tasks. For the hyper-ontology validation process, we considered real-world use cases around prostate and breast cancers collected from the AI4HI projects (see Section 7, Demonstration Scenarios). Two main validation tasks are applied to verify the pertinence of the hyper-ontology in representing the acquired use cases: 1) we demonstrate hyper-ontology's completeness in representing knowledge from real-world scenarios; and 2) we show the usability of the hyper-ontology for the instantiation of the EUCAIM-CDM based on the provided use cases
Also, for hyper-ontology validation, we verify the ontology's correctness in answering SPARQL queries (see Annex1) based on the scenarios provided in Section 7.
4.4.6 Ontology Enrichment and Maintenance
The process of the hyper-ontology enrichment is continuous throughout the iterative development process. Also, we enrich the hyper-ontology model by considering experts' feedback on each delivered version or any additional requirements and specifications defined by the EUCAIM community, mainly regarding the federated querying or image annotation/segmentation tasks. Moreover, meetings with clinical experts have helped to enrich the medical-oriented semantic content of the hyper-ontology by maintaining the semantic patterns connecting various concepts. For instance, in the hyper-ontology, the results of tumor marker tests are represented in two different ways: 1) conditions (e.g., Oestrogen receptor positive tumour (SNOMEDCT:416053008), Progesterone receptor negative tumour (SNOMEDCT:441118006)) and 2) observations (e.g., Estrogen receptor Ag [Presence] in Breast cancer specimen by Immune stain (LOINC:85337-4), Progesterone receptor Ag [Presence] in Breast cancer specimen by Immune stain (LOINC:85339-0)) associated with qualifier values (e.g., Positive (SNOMEDCT:10828004), Negative (SNOMEDCT:260385009)), indicating the positive or negative detection of tumor markers. Considering the expertise of clinical experts, which states that both aspects reflect similar contexts, we can semantically associate them using an equivalence property (owl:equivalentProperty) (see Figure 18 for an example).

Figure 18. Part of the hyper-ontology around representing tumor marker test results (Protege).
Another example concerns the existence of secondary and primary cancers. From a clinical perspective, the term secondary cancer may refer to either metastasis from primary cancer or a second cancer unrelated to the original cancer. Thereby, the existence of a secondary cancer condition (either a metastasis or second cancer) is related to an existing primary (or original) condition. Accordingly, a semantic relationship (Has Associated Primary Condition) is defined to link the secondary cancer to primary (see Figure 19). This semantic pattern will help to logically deduce the existence of a primary cancer condition for a cancer patient who is suffering from a clinically identified secondary cancer condition (see the example of prostate cancer use case - ProCAncer-I, Section 7). The existence relationship is not applicable in the opposite direction; a primary cancer condition does not necessarily entail a secondary cancer condition.
Figure 19. Part of the hyper-ontology around primary and secondary cancer relationship (Protege).
Regarding the continuous updates and changes of the hyper-ontology content, there is a need to address the expansion of the semantic content while ensuring that consistency is maintained. Regular evaluation and validation processes on the syntactic and semantic levels (see Section 7) are required to assess the impact of evolution on the consistency and correctness of the hyper-ontology.
Last updated