5. Interoperability framework for federated processing
For enabling federated processing, data holders should implement a semantic and syntactic interoperability layer across their datasets. Semantic as how data meaning is consistent across datasets (this layer should also be implemented in tier 2), and syntactic as how data is structurally persisted within a database.
Syntactic interoperability at this tier is important so that any tool or AI/ML model processing the data is aware of the format and the structure of the local dataset, and these aspects are not addressed by the conceptual specifications (entities, relationships, terminologies) of the hyper-ontology.
5.1 CDM business requirements
Prior to selecting a CDM, we conducted an initial analysis of the main requirements, expectations, and constraints from various stakeholders. Our approach involved engaging with representatives from the AI4HI projects and requesting specific information, as follows:
The specific cancer types that each project focused on.
The clinical questions/use cases addressed by each project.
The clinical and imaging data used to answer these questions, including mandatory and optional information.
The format of the raw data available and whether standardized terminologies were used for different data types, along with the versions of these terminologies.
The anonymization techniques/profiles employed by each project to ensure compliance with GDPR and national data privacy laws.
Details about the modalities of radiological images collected and the imaging metadata associated with them, or extracted, if applicable.
Information regarding the format of segmentation masks, if they exist.
The chosen common data model and whether it covers all data types, with a straightforward mapping from the raw data.
This information was collected and documented in the ORSD document described in the previous section. The outcome of the analysis was outlined in D5.1 (section 3). It is evident that there are many challenges to be addressed, as the AI4HI projects are dealing with different cancer types, with only three out of five projects to deal with a common type of cancer, i.e. breast and prostate cancer, different use cases, and therefore different clinical and imaging data to support these use cases, different terminologies, different anonymization profiles, different formats for the segmentations, and although all of them have standardized data models, the OMOP-CDM and the FHIR resources as a data model, these are also different. Most importantly, as some of the AI4HI projects are getting finalized, they have no plan of transforming their datasets to a specific standard, as they have all selected and adopted the data model that serves the needs of the respective project. In addition to the AI4HI projects, we need to take into consideration constraints that might arise from new data holders willing to join the EUCAIM federation, which might have either standardized data models or totally ad-hoc models and might also have different capabilities, in terms of technical facilities and resources in general.
Following the collection of information from the AI4HI projects, several group meetings were conducted with different domain experts within the consortium, including AI experts, data holders, software engineers, and legal teams, to define the data model business requirements for the project. The most critical requirements are presented below:
EUCAIM should support as many input formats as possible for raw clinical and imaging data, which may or may not comply with interoperability standards.
The data model should be terminology-agnostic, accommodating different terminologies seamlessly.
Minimization of the effort required from clinical data managers to prepare data for federated processing and analysis through the platform.
The data model must fully comply with GDPR and national privacy laws.
The data model should comprehensively represent all target data types at their intended level of detail, including clinical, demographic, radiomic, and laboratory data.
The data model should be extensible to allow for additional/new data to be represented.
The data model must provide an interface for accessing and querying data for the purpose of training federated AI models.
Data transformations from the raw source to the AI training dataset should be as straightforward as possible.
The data model should be structured in a way (usually in a tabular format) that simplifies the retrieval of records in the training dataset, regardless of the training plan of an AI algorithm.
Within EUCAIM, two potential frameworks for data harmonization and standardization are being explored, as mentioned in the TEHDAS recommendations on a Data Quality Framework document[16]. One approach involves transforming all datasets held by a data holder to comply with a specific internationally adopted standard (e.g., OMOP-CDM). The other approach entails preparing the dataset for delivery based on a specific data schema that includes the necessary harmonization rules, controlled vocabularies, and standards.
In the first approach, harmonization is driven by a standard design, resulting in a dataset that is comprehensible to the community and can be used for federated analysis and to support interoperability with other research infrastructures and networks (e.g., OHDSI, Darwin EU, EHDEN). However, this method requires significant upfront effort (although only done once per dataset) and is only accessible after extracting, semantically mapping, and transforming all data sources to the standard data model. This ties the research question specification to the semantic constraints of the standard model specification.
In the second approach, harmonization is driven by the materialization of specific information in a bespoke data model, where each transformation is limited to specific entities and variables of interest. This, however, limits the reuse of the data in other contexts and introduces an additional data model for specific purposes. It is important to note that preparing datasets for secondary use should not be limited to mapping concepts. It also requires developing data models that provide a logical harmonized schema, integrating different health data sources among data holders.
In the context of EUCAIM, we explored different approaches to be considered for Tier 3 (federated processing/analysis and AI model development), which is the maximum level of interoperability to be achieved in EUCAIM, based on the two aforementioned harmonization frameworks. These approaches are analyzed in the following section, and which guided many decisions regarding the CDM (e.g. structure, format).
5.2 Data harmonization approaches for the federated processing/analysis.
5.2.1 Scenario 1: EUCAIM Hyper-Ontology Based CDM for Analysis
The architecture for this scenario is shown in Figure 20. This case outlines two distinct pathways for integrating data from AI4HI repositories or already established repositories adopting standards (OMOP, FHIR) and new data holders with ad-hoc models.
Established repositories (e.g. AI4HI projects): implement a mediator/data access service that dynamically transforms and structures data according to the hyper-ontology and CDM specification.
Other data holders (e.g. hospitals): undergo an Extract Transform Load (ETL) process, directly converting their local data into an EUCAIM hyper-ontology based CDM.

Figure 20: EUCAIM CDM for analysis & OMOP, FHIR, EUCAIM local data models. For OMOP and FHIR a mediator and mapping component is necessary.
In this examined scenario, researchers access a Data Access Service in order to request specific information to create their model’s input dataset (cohort) in a tabular form (e.g. csv). Established repositories (e.g. AI4HI repositories) utilize a mediator service and a mapping component to transform queries based on the hyper-ontology concepts (e.g., age at diagnosis, modality) to the local CDM query language and the local CDM concepts. It is in a way the same mapping component/service as in the mediator in Tier 2, but in this case, the mediator doesn’t return aggregated information, but rather specific hyper-ontology based attributes (e.g. age at diagnosis, modality, PSA etc.). This required information can be subsequently stored in a tabular form (e.g. csv, parquet) file along with the corresponding images in a POSIX path, that the federated processing service is able to access. For new data holders, an ETL process aligns datasets directly with the EUCAIM hyper-ontology based CDM specification.
The advantages of this approach are:
The researchers are able to slice and dice the information available according to the needs of their analysis/use case and the inputs of their respective models in an easy and user-friendly way through the data access service.
Federated Learning scenarios are easier for the researchers since they can specify what type of data (and format) want to be available on each federated node.
Eliminates the need for AI4HI repositories to go through an ETL process for transforming their data, but rather create a mapping component that transforms only the requested information on the fly and on demand.
Streamlines data transformation for new data holders through an ETL process, without implementing any mediator/mapping component.
The disadvantages of this approach are:
A model registry or a UI is required so that researchers are able to specify what’s the “granularity” their models/tools want to have their input to (e.g. which variables)
A data access service is needed to accept specifications of the needed dataset and create (materialize) dynamic cohorts based on these, which increases complexity.
The mediator component's on-the-fly data transformation (materialization) is technically challenging.
Adopts a bespoke data model for new providers (based on the hyper-ontology), limiting its utility outside EUCAIM.
5.2.2 Scenario 2: Integration with OMOP-FHIR for Wider Compatibility
In this scenario, new data holders can opt to convert their data into either OMOP-CDM or FHIR based standards. This facilitates easier integration with EUCAIM, in a similar way to the AI4HI projects and enhances data utility beyond the EUCAIM ecosystem. Therefore:
Figure 21: OMOP-FHIR local adopted standards– EUCAIM based CDM for analysis with mediator and mapping components necessary for all nodes in the federation.
Established (AI4HI) repositories and compliant data holders to OMOP/FHIR standards use a mediator service as in Scenario 1. (EUCAIM will need to provide mediator components (OMOP/FHIR) to the new data holders (i.e. customized versions of them, as even the same CDM has differences in the way the information is structured as we described in section 4.)
Non-compliant data holders to OMOP/FHIR standards undergo an ETL process to comply with either OMOP or FHIR standards.
Figure 21 shows the architectural design of this approach. The advantages of this approach compared to Scenario 1 is that new data holders align with well-established standard generic data models, enhancing interoperability and impact beyond EUCAIM. However, the disadvantage of this approach is that a mediator service and a mapping component should be implemented for this case as well, so that all OMOP and FHIR based repositories are harmonized for data analysis, with all the disadvantages this mediator service entails, as described in scenario 1.
5.2.3 Scenario 3: Simplifying Integration Through ETL process
This approach mandates all participating repositories to undergo a one-time ETL process, conforming to the EUCAIM hyper-ontology based CDM, thereby reducing technical complexities associated with mediator services. In this case all federated nodes can use the same (simpler) Data Access Service implementation that exports data from the CDM into a common format. Figure 22 shows the architectural design of this approach.
Figure 22: EUCAIM based CDM for all nodes participating in the federation. This would require a one-time transformation and no mediator/mapping component is necessary.
5.2.4 Scenario 4. EUCAIM hyper-ontology only for federated query purposes, OMOP-CDM for analysis
In this scenario, the EUCAIM hyper-ontology is only applicable for Tier 2 for the federated query purposes and is not used for federated processing. The architectural design of this approach is outlined in Figure 23.
All participating repositories should conform to the OMOP-CDM standard data model and go through an ETL process (apart from the OMOP-CDM ones – although some adaptation will be needed to address specific issues as described in section 4.1). The federated processing service could directly access an SQlite[17] file (for example) with the whole OMOP-CDM relational schema available, perform any desired query and transform it to any tabular format for input to the AI model or for analysis.
Figure 23: OMOP-CDM as the EUCAIM CDM for federated processing and analysis. Hyper-ontology only for federated queries.
The approach of not having a data access service in this case, but rather providing the whole dataset for researchers to use and slice and dice information, could also be applied to the
previous scenarios as well, regardless of the chosen CDM for analysis. However, the disadvantage of this approach is that all nodes need to both go through an ETL process, but also have a mediator for Tier 2, as this conforms to the hyper-ontology concepts and terms (for bridging the gaps between OMOP and FHIR standards). This approach could also be used with a FHIR-based standard, however, as we described and analyzed in D5.1, OMOP-CDM is more appropriate as a CDM for analysis and AI related operations. In addition, another drawback of this approach is that researchers are given an SQLite file/relational database to deal with, which requires knowledge of both OMOP-CDM and SQL query language, and not a tabular format that AI experts are usually engaged and accustomed with, which can be dynamically formed for their purposes. In this case, another access service could be added on top of the OMOP-CDM databases for a more user-friendly access to the underlying data.
5.3. The EUCAIM Common Data Model
5.3.1. CDM Selection Rationale
Based on the aforementioned analysis and the requirements from various stakeholders, i.e., AI experts, data model experts and AI4HI project representatives, Scenario 1 and Scenario 3 were deemed the most appropriate for supporting all the necessary processes for querying and transforming information required by the AI model algorithms and frameworks. Consequently, the EUCAIM CDM for analysis and federated processing/learning will be based on the hyper-ontology specification, which underpins the EUCAIM logical data model.
It is important to note that EUCAIM will not mandate the adoption of Scenario 1 or Scenario 3, which involves either a mediator implementation or a one-time ETL process, respectively. However, the EUCAIM partners agreed that a one-time transformation to the EUCAIM CDM is more straightforward and easier to implement, therefore this will be the recommended approach.
As we initially described in Section 4.4.3, the mCODE conceptual model was identified as the most appropriate basis for grounding the hyper-ontology in the oncology domain, especially to build the core layer of the hyper-ontology model by ontologically analyzing and explicitly and semantically representing the mCODE basic specifications. The rationale behind this decision is multifold.
Although the OMOP-CDM and FHIR standards are widely used for standardizing and exchanging healthcare data, they have limitations when it comes to AI-related tasks, especially those requiring tabular data for model training and analysis. OMOP-CDM excels in transforming and standardizing data from diverse healthcare sources into a common format, which is beneficial for interoperability and large-scale observational studies. However, due to its generic nature, and the fact that it is an observational-based model, it makes it unsuitable and not much straightforward for querying oncology related information by AI experts. For example, through its oncology extension most of the cancer modifiers, as these are defined in the OMOP-CDM specification, are represented as “Measurements”, limiting the semantics of cancer stages, cancer grades, extensions, invasions etc. Similarly, the basic FHIR (Fast Healthcare Interoperability Resources) specification is designed to facilitate real-time data exchange between healthcare systems, with its primary focus being on ensuring that different systems can communicate effectively. However, FHIR’s hierarchical and often complex data structures are not inherently suited for the tabular data formats required by many AI algorithms and frameworks. As a reference, all tools currently available in EUCAIM, which are thoroughly described and analyzed in D5.4 require clinical and imaging metadata in a tabular format.
Due to the aforementioned reasons, EUCAIM explored the two most prominent data models in oncology: mCODE (Minimal Common Oncology Data Elements)[18] and OSIRIS[19] (Interoperability and data sharing of clinical and biological data in oncology) which are both event-based models. mCODE, introduced by the ASCO and a group of collaborators, provides a standardized set of essential oncology data elements, ensuring interoperability and data consistency, which is critical for building reliable AI models. Although mCODE is based on FHIR, it narrows down the scope to oncology-specific data elements, making it easier to extract and query relevant information for cancer research and AI applications. On the other hand, OSIRIS, developed by INCa, offers a minimum data set for the sharing of clinico-biological data in oncology. Its relational model makes it easier to represent and manipulate as tabular data, which is ideal for AI model training. This structure allows for efficient querying, aggregation, and analysis of large datasets.
All options considered, the EUCAIM CDM will leverage and build upon the conceptual model of the mCODE specification and the OSIRIS data framework, leveraging the strengths of each framework, as well as accounting for the specific constraints underpinned by the secondary use of data and the AI4HI projects. For example, both models contain mandatory attributes, which cannot be supported by the available knowledge of the AI4HI projects, and that is due to GDPR and anonymization strategies followed by each project for reducing risks of re-identification of patients, and the fact that the clinical information collected by the projects accompany the imaging data. As an example, all date related attributes included in both the OSIRIS and mCODE specifications are not part of the knowledge collected from the AI4HI projects due to the anonymization of the clinical information. Instead, relative relations based on events such as diagnosis or treatment (e.g., events that happened X months after baseline/diagnosis/treatment) are included.
Summarizing, in the context of EUCAIM, mCODE will be the basis conceptual model for representing various cancer types, cancer stages, performance status metrics and scales, as well as assessments (e.g. radiological assessments (ACR Reporting and Data Systems (RADS)[20]), and it is also generally more advantageous due to the fact that it is built on the FHIR based standard, which can be exploited, if necessary, in other contexts, for exchanging purposes. In addition, OSIRIS’ relational model nature, and its approach of creating pivot tables (.csv files) for use in AI related processes supports efficient data selection for data preprocessing, feature extraction, and model training, ultimately enhancing the development of AI applications in oncology, and therefore EUCAIM will follow the same approach for facilitating AI experts in selecting specific cohorts as input to their models, by the use of pivot tables.
A first version of the EUCAIM Data Dictionary is described in the following section. A more detailed version is also available at: EUCAIM_CDM_mCODE_based_v1.0.xlsx
5.3.2. EUCAIM Data Dictionary
The EUCAIM CDM classifies all the clinical patient data into 6 different domains according to the mCODE specification:
5.3.2.1 Patient
The patient information group allows for general information about the patient including demographics, and the patient's managing organization.
Table 7 7: The EUCAIM CDM: Patient group
Group
Entity
Data Element
Definition
EUCAIM Required
Occurrences Allowed
Data Type
Patient
Patient
Identifier
Anonymized patient identifier which is unique within the context of the system.
Required
1..1
string
Patient
Gender
Administrative Gender - the gender that the patient is considered to have for administration and record keeping purposes.
Optional
0..1
CodeableConcept
Patient
Ethnicity
Concepts classifying the person into a named category of humans sharing common history, traits, geographical origin or nationality.
Optional
0..1
CodeableConcept
Patient
Race
Concepts classifying the person into groups based on their physical appearance
Optional
0..1
CodeableConcept
Patient
Birth Year
The year of birth for the individual.
Optional (required if diagnosis age is not available)
0..1
Integer (>1900, <current year)
Patient
Managing Organization
Organization that is the custodian of the patient record. Need to know who recognizes this patient record, manages and updates it.
Required
0..1
Organization
Patient
Care Provider
Patient's primary care provider organization.
Optional
0..1
Organization
Patient
Birth Sex
A code classifying the person's sex assigned at birth.
Required
1..1
CodeableConcept
Cancer Patient
Deceased
Indicates if the individual is deceased or not.
Optional
0..1
boolean
Cancer Patient
Cause of death
Main cause of death of the patient
Optional (conditional on deceased)
0..1
CodeableConcept
Cancer Patient
Date of last contact
Date of last contact if not deceased, or date of death if deceased.
Optional (conditional on deceased)
0..1
Date
Organization
Identifier
Identifies this organization across multiple systems
Optional
1..1
String
Organization
Name
Name used for the organization
Optional
1..1
String
5.3.2.2 Health Assessment
The health assessment group contains information related to the patient’s general health before and after treatment. This includes Comorbidities, Laboratory Tests, Performance Assessments (ECOG), Vital Signs, Family Member History, and Patient History of Metastatic Cancer.
Table 8 8: The EUCAIM CDM: Health assessment group
Group
Entity
Data Element Name
Definition
EUCAIM Required
Occurrences Allowed
Data Type
Health Assessment
Family Member History
Subject
The patient that the family history is about
Required
1..1
Reference: Patient
Family Member History
Relationship
Relationship to the subject
Required
1..1
CodeableConcept
Family Member History
Condition Code
Condition that the related person had
Required
1..1
CodeableConcept
Family Member History
Onset Age
When condition first manifested on the relative.
Optional
0..1
Age
History of Metastatic Cancer
Code
Type of observation
Optional
0..1
CodeableConcept
History of Metastatic Cancer
Value
The information determined as a result of making the observation, if the information has a simple value.
Optional
0..1
boolean
Comorbidities
Focus
Comorbid conditions are typically defined with respect to a specific 'index' condition. For example, comorbid condition categories would be those specified by CDC, namely obesity, renal disease, respiratory disease, etc.
Optional
0..*
Reference: PrimaryCancerCondition
Comorbidities
Comorbid Condition Present
A comorbid condition that is known to be present Required (conditional)
Required (conditional)
0..*
CodeableConcept
Comorbidities
Comorbid Condition Absent
A condition that is NOT present, related to the patient. Required (conditional)
Required (conditional)
0..*
CodeableConcept
Comorbidities
Code
Describes what was observed. Sometimes this is called the observation "name".
Required
1..1
CodeableConcept
Comorbidities
Subject
The patient whose comorbidities are recorded.
Optional
0..1
Reference: CancerPatient
ECOG Performance Status
Category
A code that classifies the general type of observation being made.
Required
1..1
CodeableConcept
ECOG Performance Status
Code
The name of the non-imaging or non-laboratory test performed on a patient. A LOINC **SHALL** be used if the concept is present in LOINC.
Required
1..1
CodeableConcept
ECOG Performance Status
Subject
Patient whose performance status is recorded.
Required
1..1
Reference: CancerPatient
ECOG Performance Status
Value
The information determined as a result of making the observation, if the information has a simple value.
Optional
0..1
integer
ECOG Performance Status
Interpretation
A categorical assessment of an observation value. For example, high, low, normal.
Optional
0,,*
CodeableConcept
5.3.2.3 Disease
The disease group includes information specific to the tumor markers, the cancer diagnosis, the histological classification, grade, morphology, and behavior of tumors, the staging of cancer, as well as any cancer risk assessment metrics.
Table 9 9: The EUCAIM CDM: Disease group
Group
Entity
Data Element Name
Definition
EUCAIM Required
Occurrences Allowed
Data Type
Disease
Tumor Marker Test
Related Condition
Associates the tumor marker test with a condition, if one exists. Condition can be given by a reference or a code. In the case of a screening test such as prostate-specific antigen (PSA), there may be no existing condition to reference.
Optional
0..*
Reference(PrimaryCancerCondition)
Tumor Marker Test
Code
The tumor marker test that was performed. A LOINC concept shall be used if the concept is present.
Required
1..1
CodeableConcept
Tumor Marker Test
Subject
Patient whose test result is recorded.
Required
1..1
Reference: CancerPatient
Tumor Marker Test
Value As Concept
The Laboratory result value if it is a coded value. The value CodeableConcept.code shall be selected from SNOMED CT.
Required (conditional)
1..1
CodeableConcept,
Tumor Marker Test
Value As Number
The Laboratory result value, if numeric.
Required (conditional)
1..1
Float
Tumor Marker Test
Value Unit Concept
If a numeric value, valueQuantity.code **SHALL** be selected from [UCUM](http://unitsofmeasure.org). A FHIR [UCUM Codes value set](http://hl7.org/fhir/STU3/valueset-ucum-units.html) that defines all UCUM codes is in the FHIR specification.
Required (conditional)
1..1
CodeableConcept
Tumor Marker Test
Performed
The elapsed time from the baseline (time 0).
Optional
1..1
Integer
Tumor Marker Test
Performed Unit Concept
The unit concept of the time interval
Optional
1..1
Integer
Primary Cancer Condition
Age of diagnosis/condition
The patient age on which the existence of the Condition was first asserted or acknowledged.
Required
1..1
Age
Primary Cancer Condition
Subject
Indicates the patient or group who the condition record is associated with.
Required
1..1
Reference: CancerPatient
Primary Cancer Condition
Code
Identification of the condition, problem or diagnosis.
Required
1..1
CodeableConcept
Primary Cancer Condition
Histology Morphology Behavior
A codeable concept describing the morphologic and behavioral characteristics of the cancer.(It takes values from: http://hl7.org/fhir/us/mcode/ValueSet/mcode-histology-morphology-behavior-vs)
Required
1..1
CodeableConcept
Primary Cancer Condition
Body Site
The anatomical location where this condition manifests itself.
Required
1..*
CodeableConcept
Primary Cancer Condition
Body Site > Location Qualifier
General location qualifier (excluding laterality) for this bodySite
Optional
0..*
CodeableConcept
Primary Cancer Condition
Body Site > Laterality Qualifier
Laterality qualifier for this bodySite
Optional
0..1
CodeableConcept
Primary Cancer Condition
Onset Age
Estimated or actual age the condition began, in the opinion of the clinician.
Optional
0..1
Age
Primary Cancer Condition
Abatement Age
The date or estimated date that the condition resolved or went into remission. This is called "abatement" because of the many overloaded connotations associated with "remission" or "resolution" - Conditions are never really resolved, but they can abate.
Optional
0..1
Age
Secondary Cancer Condition
Histology Morphology Behavior
Describes the morphologic and behavioral characteristics of the cancer.
Optional
1..1
CodeableConcept
Secondary Cancer Condition
Related Primary Cancer Condition
A reference to the primary cancer condition that provides context for this resource.
Optional
1..1
Reference: Primary Cancer Condition
Secondary Cancer Condition
Code
Identification of the condition, problem or diagnosis.
Required
1..1
CodeableConcept
Secondary Cancer Condition
Body Site
The anatomical location where this condition manifests itself.
Optional
0..*
CodeableConcept
Secondary Cancer Condition
Body Site > Location Qualifier
General location qualifier (excluding laterality) for this bodySite
Optional
0..*
CodeableConcept
Secondary Cancer Condition
Body Site > Laterality Qualifier
Laterality qualifier for this bodySite
Optional
0..1
CodeableConcept
Secondary Cancer Condition
Subject
Indicates the patient or group who the condition record is associated with.
Required
1..1
Reference: CancerPatient
Secondary Cancer Condition
Condition appearance
The number of time elapsed after the primary cancer condition on which the existence of this Condition was first asserted or acknowledged.
Required (conditional on Onset Age)
1..1
Integer
Secondary Cancer Condition
Appearance Unit Concept
The unit of time for the time elapsed after the primary cancer condition
Required (conditional condition appearance)
1..1
Integer
Secondary Cancer Condition
Onset Age
Estimated or actual age the condition began, in the opinion of the clinician.
Required (conditional on condition appearance)
1..1
Age
Secondary Cancer Condition
Abatement Age
The date or estimated date that the condition resolved or went into remission. This is called "abatement" because of the many overloaded connotations associated with "remission" or "resolution" - Conditions are never really resolved, but they can abate.
Optional
1..1
Age
Cancer Stage
Code
The kind of stage reported, e.g., a pathologic TNM stage, a Lugano lymphoma stage, or a Rai stage for leukemia. This element identifies the type of value that is reported in Observation.value and is necessary for the correct interpretation of that value.
The distinction between Observation.code and Observation.method is important. Observation.code identifies the kind of stage being reported while Observation.method represents the staging system used to determine the code. Observation.code may imply the staging system. For example, the SNOMED CT 103420007 says the reported value is a modified Dukes stage, implying the Modified Dukes staging system (SNOMED CT 385359000) was used to determine the stage. When the staging system is implied by Observation.code, Observation.method is not required. However, when Observation.code does not imply a staging system (for example, if the code is SNOMED CT 385388004 Lymphoma stage), then the staging system must be specified in Observation.method.
The value (Observation.valueCodeableConcept) may also imply certain things about the kind of stage being reported. For example, the value cN0 implies the value is a clinical stage. However, even if the value is partly or wholly self-identifying, it is not a reliable indicator of the type of stage being reported or the method of staging. Therefore, Observation.code must in all cases be reported.
Required
1..1
CodeableConcept
Cancer Stage
Method
The staging system or protocol used to determine the stage, stage group, or category of the cancer based on its extent. When the staging system is implied by Observation.code, Observation.method is not required. However, when Observation.code does not imply a staging system (for example, if the code is SNOMED CT 385388004 Lymphoma stage), then the staging system must be specified in Observation.method.
Optional
0..1
CodeableConcept
Cancer Stage
Value
The stage, stage group, category, or classification resulting from the staging evaluation.
Required
1..1
CodeableConcept
Cancer Stage
Subject
The patient associated with staging assessment.
Required
1..1
Reference: CancerPatient
Cancer Stage
Related Procedure
The procedure from which the cancer stage was determined. It can either be an imaging examination (MRI), biopsy, surgery.
Required
1..*
Reference: Procedure
Cancer Stage
Focus
Staging is associated with a particular cancer condition. Observation.focus is used to point back to that condition.
Optional
0..*
Reference: CancerCondition
Histologic Grade
Related Condition
Associates the histologic grade test with a condition, if one exists. Condition can be given by a reference.
Optional
0..*
Reference: Condition
Histologic Grade
Category
A code that classifies the general type of observation being made.
Required
1..1
CodeableConcept
Histologic Grade
Subject
Patient whose test result is recorded.
Required
1..1
Reference: CancerPatient
Histologic Grade
ValueAsConcept
The Laboratory result value. If a coded value, the value CodeableConcept.code should be selected from SNOMED CT, if the concept exists.
Required
1..1
CodeableConcept
Histologic Grade
ValueAsNumber
The Laboratory result value. If a numeric value, value Quantity.code shall be selected from [UCUM](http://unitsofmeasure.org).
Required
1..1
Quantity
Histologic Grade
Method
Indicates the mechanism used to perform the observation.
Optional
0..1
CodeableConcept
5.3.2.4 Cancer Treatments
The cancer treatment group includes treatment techniques used to treat cancer patients, categorized as: medications, surgery, and radiotherapy.
Table 1010: The EUCAIM CDM: Cancer treatment group
Group
Entity
Data Element Name
Definition
EUCAIM Required
Occurrences Allowed
Data Type
Treatment
Cancer-Related Surgical Procedure
Code
The specific procedure that is performed.
Required
1..1
CodeableConcept
Cancer-Related Surgical Procedure
Subject
The patient on whom the procedure was performed.
Required
1..1
Reference: Patient
Cancer-Related Surgical Procedure
Performed
Period of time elapsed after baseline
Optional
0..1
Integer
Cancer-Related Surgical Procedure
Performed Unit Concept
Cancer-Related Surgical Procedure
Body Site
Detailed and structured anatomical location information. Multiple locations are allowed - e.g. multiple punch biopsies of a lesion.
Optional
0..*
CodeableConcept
Cancer-Related Surgical Procedure
Body Site > Location Qualifier
General location qualifier (excluding laterality) for this bodySite
Optional
0..*
CodeableConcept
Cancer-Related Surgical Procedure
Body Site > Laterality Qualifier
Laterality qualifier for this bodySite
Optional
0..*
CodeableConcept
Cancer-Related Surgical Procedure
Response
Response evaluation to an oncology treatment from RECIST terminology.
Optional
0..1
CodeableConcept
Cancer-Related Medication Administration
Code
Code that identifies this medication
Required
1..1
CodeableConcept
Cancer-Related Medication Administration
Subject
The patient receiving the medication.
Required
1..1
Reference: CancerPatient
Cancer-Related Medication Administration
Effective
An interval of time during which the administration took place.
Optional
0..1
Period
Cancer-Related Medication Administration
Effective Unit Concept
An interval of time during which the administration took place.
Optional
0..1
Period
Cancer-Related Medication Administration
Administered
The time elapsed
Optional
0..1
CodeableConcept
Cancer-Related Medication Administration
Administered Unit Concept
Period of time elapsed unit concept.
Optional
0..1
CodeableConcept
Cancer-Related Medication Administration
Response
Response evaluation to an oncology treatment from RECIST terminology.
Optional
1..1
CodeableConcept
Radiotherapy Course Summary
Modality
Capturing a modality of external beam or brachytherapy radiation procedures.
Required
1..1
CodeableConcept
Radiotherapy Course Summary
Technique
Capturing a technique of external beam or brachytherapy radiation procedures.
Optional
0..*
CodeableConcept
Radiotherapy Course Summary
Actual Number of Sessions
The number of sessions in a course of radiotherapy.
Optional
0..1
unsignedInt
Radiotherapy Course Summary
Dose Delivered to Volume
Dose delivered to a given radiotherapy volume.
Optional
0..*
Radiotherapy Dose Delivered To Volume Extension
Radiotherapy Course Summary
Dose Delivered to Volume > Volume
A BodyStructure resource representing volume in the body where radiation was delivered, for example, Chest Wall Lymph Nodes.
Optional
0..1
Reference: RadiotherapyVolume
Radiotherapy Course Summary
Dose Delivered to Volume > Total Dose Delivered
The total amount of physical radiation delivered to this volume within the scope of this dose delivery, i.e., dose delivered from the Procedure in which this extension is used.
Optional
0..1
Quantity
Radiotherapy Course Summary
Dose Delivered to Volume > Fractions Delivered
The number of fractions delivered to this volume.
Optional
0..1
unsignedInt
Radiotherapy Course Summary
Code
The specific procedure that is performed. Use text if the exact nature of the procedure cannot be coded (e.g. "Laparoscopic Appendectomy").
Required
0..1
CodeableConcept
Radiotherapy Course Summary
Subject
The patient on whom the procedure was performed.
Required
1..1
Reference: CancerPatient
Radiotherapy Course Summary
Performed
Period of time elapsed in months after primary cancer diagnosis
Optional
0..1
Period
Radiotherapy Course Summary
Body Site
Coded body structure(s) treated in this course of radiotherapy. These codes represent general locations. For additional detail, refer to the BodyStructures references in the doseDeliveredToVolume extension.
Optional
0..*
CodeableConcept
Radiotherapy Course Summary
Response
Response evaluation to an oncology treatment from RECIST terminology.
Optional
1..1
CodeableConcept
Radiotherapy Volume
Identifier
Unique identifier to reliably identify the same target volume in different requests and procedures, for example, the Conceptual Volume UID used in DICOM.
Optional
0..*
Identifier
Radiotherapy Volume
Morphology
The kind of structure being represented by the body structure at `BodyStructure.location`. This can define both normal and abnormal morphologies.
Optional
0..1
CodeableConcept
Radiotherapy Volume
Location
The location and locationQualifier codes specify a TG263 body structure comprising the irradiated volume.
Optional
0..1
CodeableConcept
Radiotherapy Volume
Location Qualifier
Qualifiers that together with the associated location code specify the TG263 body structure comprising the irradiated volume.
Optional
0..*
CodeableConcept
Radiotherapy Volume
Description
A text description of the radiotherapy volume, which SHOULD contain any additional information above and beyond the location and locationQualifier that describe the volume.
Optional
0..*
string
Radiotherapy Volume
Patient
The patient for which a radiotherapy procedure is planned or performed.
Required
1..1
Reference: CancerPatient
5.3.2.5 Outcome
The outcome group involves the cancer disease status, e.g., whether it is stable, worsening (progressing), or improving (responding) based on different kinds of evidence (imaging data, tumor markers etc.).
Table 11 11: The EUCAIM CDM: Outcome group
Group
Entity
Data Element Name
Definition
EUCAIM Required
Occurrences Allowed
Data Type
Outcome
Tumor
Body Structure Identifier
Stable identifier(s) for this specific tumor. The identifiers MUST be unique within the context of the referenced `CancerPatient`. This id is used to track the tumor over time, through the related procedures.
Required
1..*
Identifier
Tumor
Related Condition
Associates this tumor with a cancer condition. This could be a causal association (e.g., this is believed to be the primary tumor causing the cancer) or a different type of relationship (e.g., this tumor is a metastasis)
Optional
0..1
CodeableConcept or Reference: Condition
Tumor
Related Procedure
Associates this tumor with a related procedure. For example it associates a tumor with an MR examination procedure.
Required (conditional on Condition)
1..1
Reference: Procedure
Tumor
Risk Assessment
Associates this tumor with a risk assessment report. In case the tumor is identified in an imaging report, this could be used for storing RADS related information.
Optional
0..1
Reference: RiskAssessment
Tumor
Morphology
The kind of structure being represented by the body structure at `BodyStructure.location`. This can define both normal and abnormal morphologies.
Optional
0..*
CodeableConcept
Tumor
Location
The anatomical location or region of the specimen, lesion, or body structure.
Required
1..*
CodeableConcept
Tumor
Location Qualifier
Qualifier to refine the anatomical location. These include qualifiers for laterality, relative location, directionality, number, and plane.
Optional
0..*
CodeableConcept
Tumor
Patient
The patient associated with this tumor.
Required
1..1
Reference: CancerPatient
Tumor Size
Code
Describes what was observed. Sometimes this is called the observation "name".
Required
1..1
CodeableConcept
Tumor Size
Subject
The patient whose tumor was measured. SHALL be a `Patient` resource conforming to `CancerPatient`.
Required
1..1
Reference: CancerPatient
Tumor Size
Focus
Reference to a BodyStructure resource conforming to Tumor.
Optional
0..1
Reference: Tumor
Tumor Size
Volume
The volume of the lesion
Optional
0..1
Quantity
Tumor Size
Method
Method for measuring the size or the volume of the tumor
Optional
0..1
CodeableConcept
Tumor Size
Tumor Longest Dimension
The longest tumor dimension in cm or mm.
Required
1..1
Quantity
Tumor Size
Tumor Longest Dimension > Code
Describes what was observed. Sometimes this is called the observation "code".
Optional
0..1
CodeableConcept
Tumor Size
Tumor Longest Dimension > Value
The information determined as a result of making the observation, if the information has a simple value.
Optional
0..1
Quantity
Tumor Size
Tumor Other Dimension
The second or third tumor dimension in cm or mm.
Optional
0..2
Quantity
Tumor Size
Tumor Other Dimension > Code
Describes what was observed. Sometimes this is called the observation "code".
Required
1..1
CodeableConcept
Tumor Size
Tumor Other Dimension > Value
The information determined as a result of making the observation, if the information has a simple value.
Optional
0..1
Quantity
Cancer Disease Status
Evidence Type
Categorization of the kind of evidence contributing to a clinical judgment on cancer disease progression.
Optional
0..*
CodeableConcept
Cancer Disease Status
Code
Describes what was observed. Sometimes this is called the observation "name".
Required
1..*
CodeableConcept
Cancer Disease Status
Subject
Patient whose disease status is recorded.
Required
1..*
Reference: CancerPatient
5.3.2.6 Imaging
As the focus of the EUCAIM project is the federation of cancer imaging datasets, it is imperative that important imaging metadata are standardized to facilitate the unambiguous representation of the stored information and support federated queries. Although the DICOM standard for collecting, storing, and transferring medical imaging data can be used to access critical image acquisition parameters (such as acquisition method, field of view, and slice thickness) for cohort discovery and quality checking, it lacks essential information needed to query efficiently relevant images. This is due to the fact that certain information is not standardized in the DICOM metadata. For instance, the classification of a series as a T2-weighted axial series is typically recorded in the "Series Description" (0008,103E) DICOM tag, which is free text and highly variable across clinical institutions.
The EUCAIM Imaging component corresponds to important metadata extracted from the DICOM header-related tags, which get standardized to allow for efficient querying and analysis. Although mCODE does not explicitly represent imaging-related procedures and their corresponding metadata, the EUCAIM CDM builds upon the FHIR Resources ImagingStudy and ImagingSeries, the MI-CDM extension of the OMOP-CDM[21] - the ProCAncer-I imaging extension, and the OSIRIS imaging component. The following section presents a first version of the imaging related entities and their associated information:
Image Study: Representation of the content produced in a DICOM imaging study. A study comprises a set of series, each of which includes a set of Service-Object Pair Instances (SOP Instances - images or other data) acquired or produced in a common context. A series is of only one modality (e.g. X-ray, CT, MR, ultrasound), but a study may have multiple series of different modalities.
Image Series: Representation of the content produced in a DICOM imaging series, by representing important metadata across all image modalities. Some of the most important parameters, include the modality, the body region, the patient position, the patient orientation, the laterality etc.
Image Modality: Representation of the distinct modality-related acquisition parameters, in order to enable tailored queries for each modality (e.g. echo time, magnetic field strength for MR modality etc.). It is important to note that the modeling choice of the image modality entity is to allow for storing any modality related acquisition parameter, without the need to change/add new attributes in the model. However, some important acquisition parameters of the two most important modalities (MR, CT) as these are defined in OSIRIS, but also included in the ProCAncer-I collected MR imaging metadata are:
MR image: sequence name, magnetic field strength, MR acquisition type, repetition time, echo time, imaging frequency, flip angle, inversion time, receive coil name, diffusion b-value (for DWI).
CT image: kVp, xRay tube current, exposure time, spiral pitch factor, filter type, convolution kernel.
Image Annotation: Representation of the most important metadata concerning imaging annotation processes.
Table 1212: The EUCAIM CDM: Imaging group
Group
Entity
Data Element Name
Definition
EUCAIM Required?
Occurrences Allowed
Data Type
Mapping: DICOM Tag Mapping
Imaging
Image Study
Identifier
The logical id of the resource, as used in the URL for the resource. Once assigned, this value never changes.
Required
1...1
id
Image Study
Subject
The patient of the imaging study.
Required
1...1
Reference(Patient)
(0010/*)
Image Study
Study UID
Identifiers for the ImagingStudy, i.e. as DICOM Study Instance UID.
Required
1…1
String
StudyInstanceUID (0020,000D) | study ID (0020,0010)
Image Study
AcquisitionDate
The date the study acquisition was obtained.
Optional
0...1
dateTime
(0008,0020)+(0008,0030)
Image Study
Part Of
A larger event of which this particular ImagingStudy is a component or step. For example, an ImagingStudy as part of a procedure.
Optional
0...*
Reference(Procedure)
Image Study
Access URI
The accessURI of the study, either on a DICOM web server (e.g. via the WADO-RS DICOMweb REST-API) or on a local machine via the path name to the folder containing the study.
Optional
0...*
String
Image Study
Number Of Series
Number of Series in the Study. This value given may be larger than the number of series elements this Resource contains due to resource availability, security, or other factors. This element should be present if any series elements are present.
Optional
0...1
unsignedInt
(0020,1206)
Image Study
Number Of Instances
Number of SOP Instances in Study. This value given may be larger than the number of instance elements this resource contains due to resource availability, security, or other factors. This element should be present if any instance elements are present.
Optional
0...1
unsignedInt
(0020,1208)
Image Study
Manufacturer Name
Name of the manufacturing company of the imaging equipment.
Required
1..1
CodeableConcept
(0008,0070)
Image Study
Manufacturer Model Name
Name of the model of the manufacturing company of the imaging equipment.
Optional
0..1
String
(0008,1090)
Image Series
Study identifier
The study in which the series belongs to.
Required
1..1
Reference(Image Study)
Image Series
Identifier
Unique id for the element within a resource (for internal references). This may be any string value that does not contain spaces.
Required
1...1
string
Image Series
Series UID
The DICOM Series Instance UID for the series.
Required
1...1
String
(0020,000E)
Image Series
Number
The numeric identifier of this series in the study.
Optional
0...1
unsignedInt
(0020,0011)
Image Series
Modality
The distinct modality for this series. This may include both acquisition and non-acquisition modalities.
Required
1...1
CodeableConcept
(0008,0060)
Image Series
Description
A description of the series.
Optional
0...1
string
(0008,103E)
Image Series
Number Of Instances
Number of SOP Instances in the Study. The value given may be larger than the number of instance elements this resource contains due to resource availability, security, or other factors. This element should be present if any instance elements are present.
Optional
0...1
unsignedInt
(0020,1209)
Image Series
Access URI
The accessURI of the series, either on a DICOM web server (e.g. via the WADO-RS DICOMweb REST-API) or on a local machine via the path name to the folder containing the series instances.
Optional
0...*
String
Image Series
Body Site
The anatomic structures examined. See DICOM Part 16 Annex L (http://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_L.html) for DICOM to SNOMED-CT mappings. The bodySite may indicate the laterality of body part imaged; if so, it shall be consistent with any content of ImageSeries.laterality.
Required
1...1
CodeableConcept
(0018,0015)
Image Series
Laterality
The laterality of the (possibly paired) anatomic structures examined. E.g., the left knee, both lungs, or unpaired abdomen. If present, shall be consistent with any laterality information indicated in ImageSeries.bodySite.
Optional
0...1
CodeableConcept
(0020,0060)
Image Series
Specimen
The specimen imaged, e.g., for whole slide imaging of a biopsy.
Optional
0...*
Reference(Specimen)
(0040,0551) + (0040,0562)
Image Series
Acquisition Date
The date the series acquisition was obtained.
Optional
0...1
date
(0008,0021) + (0008,0031)
Image Modality
Identifier
Unique id for the element within a resource (for internal references). This may be any string value that does not contain spaces.
Required
1..1
string
Image Modality
Series identifier
Reference to the series id for which important acquisition parameters are being stored.
Required
1..1
Reference(Image Series)
Image Modality
AcquisitionParameter Code
The concept code of the acquisition parameters relevant to the modality of the series. (e.g. slice thickness for MR modality)
Required
1..1
CodeableConcept
Image Modality
AcquisitionParameter Value As Concept
The concept code of the value of the acquisition parameter (e.g. "Spin echo" value of the "MR echo type" concept)
Optional(conditional on ParamCode)
0..1
CodeableConcept
Image Modality
AcquisitionParameter Value As Number
The numerical value of the modality acquisition concept (e.g. 0 for the gantry tilt angle in case of a CT)
Optional (conditional on ParamCode)
0..1
Float
Image Modality
AcquisitionParameter Value Unit Concept
If a numeric value, the units of measure concept code should be used. (http://unitsofmeasure.org).
Required(conditional on Acquisition Parameter Value as Number)
0..1
CodeableConcept
Image Annotation
Id
A unique identifier for the annotation.
Required
1..1
string
Image Annotation
series.id
The unique identifier for the imaging series being annotated.
Required
1..1
Reference(Image Series)
Image Annotation
study.id
The unique identifier for the imaging study that contains the series that is being annotated.
Required
1..1
Reference(Image Study)
Image Annotation
derived.series.id
The unique identifier for the annotated derived imaging series.
Required
1..1
Reference(Image Series)
Image Annotation
performed
The date and time the annotation was made.
Optional
0..1
datetime
Image Annotation
status
The current status of the annotation, such as final or pending.
Optional
0..1
CodeableConcept
Image Annotation
anatomic location
The anatomic location being annotated (e.g. peripheral zone of the prostate gland)
Optional
0..1
CodeableConcept
Image Annotation
observation
The imaging observation that is reported. (e.g. lesion of the prostate)
Optional
0..1
CodeableConcept
Image Annotation
type
The annotation type (e.g. bounding box, contouring, etc..)
Optional
0..1
CodeableConcept
Image Annotation
method
The method used to create the annotation, such as manual or automatic, or semiautomatic.
Optional
0..1
CodeableConcept
Last updated