5. Interoperability framework for federated processing

For enabling federated processing, data holders should implement a semantic and syntactic interoperability layer across their datasets. Semantic as how data meaning is consistent across datasets (this layer should also be implemented in tier 2), and syntactic as how data is structurally persisted within a database.

Syntactic interoperability at this tier is important so that any tool or AI/ML model processing the data is aware of the format and the structure of the local dataset, and these aspects are not addressed by the conceptual specifications (entities, relationships, terminologies) of the hyper-ontology.

5.1 CDM business requirements

Prior to selecting a CDM, we conducted an initial analysis of the main requirements, expectations, and constraints from various stakeholders. Our approach involved engaging with representatives from the AI4HI projects and requesting specific information, as follows:

  • The specific cancer types that each project focused on.

  • The clinical questions/use cases addressed by each project.

  • The clinical and imaging data used to answer these questions, including mandatory and optional information.

  • The format of the raw data available and whether standardized terminologies were used for different data types, along with the versions of these terminologies.

  • The anonymization techniques/profiles employed by each project to ensure compliance with GDPR and national data privacy laws.

  • Details about the modalities of radiological images collected and the imaging metadata associated with them, or extracted, if applicable.

  • Information regarding the format of segmentation masks, if they exist.

  • The chosen common data model and whether it covers all data types, with a straightforward mapping from the raw data.

This information was collected and documented in the ORSD document described in the previous section. The outcome of the analysis was outlined in D5.1 (section 3). It is evident that there are many challenges to be addressed, as the AI4HI projects are dealing with different cancer types, with only three out of five projects to deal with a common type of cancer, i.e. breast and prostate cancer, different use cases, and therefore different clinical and imaging data to support these use cases, different terminologies, different anonymization profiles, different formats for the segmentations, and although all of them have standardized data models, the OMOP-CDM and the FHIR resources as a data model, these are also different. Most importantly, as some of the AI4HI projects are getting finalized, they have no plan of transforming their datasets to a specific standard, as they have all selected and adopted the data model that serves the needs of the respective project. In addition to the AI4HI projects, we need to take into consideration constraints that might arise from new data holders willing to join the EUCAIM federation, which might have either standardized data models or totally ad-hoc models and might also have different capabilities, in terms of technical facilities and resources in general.

Following the collection of information from the AI4HI projects, several group meetings were conducted with different domain experts within the consortium, including AI experts, data holders, software engineers, and legal teams, to define the data model business requirements for the project. The most critical requirements are presented below:

  • EUCAIM should support as many input formats as possible for raw clinical and imaging data, which may or may not comply with interoperability standards.

  • The data model should be terminology-agnostic, accommodating different terminologies seamlessly.

  • Minimization of the effort required from clinical data managers to prepare data for federated processing and analysis through the platform.

  • The data model must fully comply with GDPR and national privacy laws.

  • The data model should comprehensively represent all target data types at their intended level of detail, including clinical, demographic, radiomic, and laboratory data.

  • The data model should be extensible to allow for additional/new data to be represented.

  • The data model must provide an interface for accessing and querying data for the purpose of training federated AI models.

  • Data transformations from the raw source to the AI training dataset should be as straightforward as possible.

  • The data model should be structured in a way (usually in a tabular format) that simplifies the retrieval of records in the training dataset, regardless of the training plan of an AI algorithm.

Within EUCAIM, two potential frameworks for data harmonization and standardization are being explored, as mentioned in the TEHDAS recommendations on a Data Quality Framework document[16]. One approach involves transforming all datasets held by a data holder to comply with a specific internationally adopted standard (e.g., OMOP-CDM). The other approach entails preparing the dataset for delivery based on a specific data schema that includes the necessary harmonization rules, controlled vocabularies, and standards.

In the first approach, harmonization is driven by a standard design, resulting in a dataset that is comprehensible to the community and can be used for federated analysis and to support interoperability with other research infrastructures and networks (e.g., OHDSI, Darwin EU, EHDEN). However, this method requires significant upfront effort (although only done once per dataset) and is only accessible after extracting, semantically mapping, and transforming all data sources to the standard data model. This ties the research question specification to the semantic constraints of the standard model specification.

In the second approach, harmonization is driven by the materialization of specific information in a bespoke data model, where each transformation is limited to specific entities and variables of interest. This, however, limits the reuse of the data in other contexts and introduces an additional data model for specific purposes. It is important to note that preparing datasets for secondary use should not be limited to mapping concepts. It also requires developing data models that provide a logical harmonized schema, integrating different health data sources among data holders.

In the context of EUCAIM, we explored different approaches to be considered for Tier 3 (federated processing/analysis and AI model development), which is the maximum level of interoperability to be achieved in EUCAIM, based on the two aforementioned harmonization frameworks. These approaches are analyzed in the following section, and which guided many decisions regarding the CDM (e.g. structure, format).

5.2 Data harmonization approaches for the federated processing/analysis.

5.2.1 Scenario 1: EUCAIM Hyper-Ontology Based CDM for Analysis

The architecture for this scenario is shown in Figure 20. This case outlines two distinct pathways for integrating data from AI4HI repositories or already established repositories adopting standards (OMOP, FHIR) and new data holders with ad-hoc models.

  1. Established repositories (e.g. AI4HI projects): implement a mediator/data access service that dynamically transforms and structures data according to the hyper-ontology and CDM specification.

  2. Other data holders (e.g. hospitals): undergo an Extract Transform Load (ETL) process, directly converting their local data into an EUCAIM hyper-ontology based CDM.

Figure 20: EUCAIM CDM for analysis & OMOP, FHIR, EUCAIM local data models. For OMOP and FHIR a mediator and mapping component is necessary.

In this examined scenario, researchers access a Data Access Service in order to request specific information to create their model’s input dataset (cohort) in a tabular form (e.g. csv). Established repositories (e.g. AI4HI repositories) utilize a mediator service and a mapping component to transform queries based on the hyper-ontology concepts (e.g., age at diagnosis, modality) to the local CDM query language and the local CDM concepts. It is in a way the same mapping component/service as in the mediator in Tier 2, but in this case, the mediator doesn’t return aggregated information, but rather specific hyper-ontology based attributes (e.g. age at diagnosis, modality, PSA etc.). This required information can be subsequently stored in a tabular form (e.g. csv, parquet) file along with the corresponding images in a POSIX path, that the federated processing service is able to access. For new data holders, an ETL process aligns datasets directly with the EUCAIM hyper-ontology based CDM specification.

The advantages of this approach are:

  • The researchers are able to slice and dice the information available according to the needs of their analysis/use case and the inputs of their respective models in an easy and user-friendly way through the data access service.

  • Federated Learning scenarios are easier for the researchers since they can specify what type of data (and format) want to be available on each federated node.

  • Eliminates the need for AI4HI repositories to go through an ETL process for transforming their data, but rather create a mapping component that transforms only the requested information on the fly and on demand.

  • Streamlines data transformation for new data holders through an ETL process, without implementing any mediator/mapping component.

The disadvantages of this approach are:

  • A model registry or a UI is required so that researchers are able to specify what’s the “granularity” their models/tools want to have their input to (e.g. which variables)

  • A data access service is needed to accept specifications of the needed dataset and create (materialize) dynamic cohorts based on these, which increases complexity.

  • The mediator component's on-the-fly data transformation (materialization) is technically challenging.

  • Adopts a bespoke data model for new providers (based on the hyper-ontology), limiting its utility outside EUCAIM.

5.2.2 Scenario 2: Integration with OMOP-FHIR for Wider Compatibility

In this scenario, new data holders can opt to convert their data into either OMOP-CDM or FHIR based standards. This facilitates easier integration with EUCAIM, in a similar way to the AI4HI projects and enhances data utility beyond the EUCAIM ecosystem. Therefore:

Figure 21: OMOP-FHIR local adopted standards– EUCAIM based CDM for analysis with mediator and mapping components necessary for all nodes in the federation.

  1. Established (AI4HI) repositories and compliant data holders to OMOP/FHIR standards use a mediator service as in Scenario 1. (EUCAIM will need to provide mediator components (OMOP/FHIR) to the new data holders (i.e. customized versions of them, as even the same CDM has differences in the way the information is structured as we described in section 4.)

  2. Non-compliant data holders to OMOP/FHIR standards undergo an ETL process to comply with either OMOP or FHIR standards.

Figure 21 shows the architectural design of this approach. The advantages of this approach compared to Scenario 1 is that new data holders align with well-established standard generic data models, enhancing interoperability and impact beyond EUCAIM. However, the disadvantage of this approach is that a mediator service and a mapping component should be implemented for this case as well, so that all OMOP and FHIR based repositories are harmonized for data analysis, with all the disadvantages this mediator service entails, as described in scenario 1.

5.2.3 Scenario 3: Simplifying Integration Through ETL process

This approach mandates all participating repositories to undergo a one-time ETL process, conforming to the EUCAIM hyper-ontology based CDM, thereby reducing technical complexities associated with mediator services. In this case all federated nodes can use the same (simpler) Data Access Service implementation that exports data from the CDM into a common format. Figure 22 shows the architectural design of this approach.

Figure 22: EUCAIM based CDM for all nodes participating in the federation. This would require a one-time transformation and no mediator/mapping component is necessary.

5.2.4 Scenario 4. EUCAIM hyper-ontology only for federated query purposes, OMOP-CDM for analysis

In this scenario, the EUCAIM hyper-ontology is only applicable for Tier 2 for the federated query purposes and is not used for federated processing. The architectural design of this approach is outlined in Figure 23.

All participating repositories should conform to the OMOP-CDM standard data model and go through an ETL process (apart from the OMOP-CDM ones – although some adaptation will be needed to address specific issues as described in section 4.1). The federated processing service could directly access an SQlite[17] file (for example) with the whole OMOP-CDM relational schema available, perform any desired query and transform it to any tabular format for input to the AI model or for analysis.

Figure 23: OMOP-CDM as the EUCAIM CDM for federated processing and analysis. Hyper-ontology only for federated queries.

The approach of not having a data access service in this case, but rather providing the whole dataset for researchers to use and slice and dice information, could also be applied to the

previous scenarios as well, regardless of the chosen CDM for analysis. However, the disadvantage of this approach is that all nodes need to both go through an ETL process, but also have a mediator for Tier 2, as this conforms to the hyper-ontology concepts and terms (for bridging the gaps between OMOP and FHIR standards). This approach could also be used with a FHIR-based standard, however, as we described and analyzed in D5.1, OMOP-CDM is more appropriate as a CDM for analysis and AI related operations. In addition, another drawback of this approach is that researchers are given an SQLite file/relational database to deal with, which requires knowledge of both OMOP-CDM and SQL query language, and not a tabular format that AI experts are usually engaged and accustomed with, which can be dynamically formed for their purposes. In this case, another access service could be added on top of the OMOP-CDM databases for a more user-friendly access to the underlying data.

5.3. The EUCAIM Common Data Model

5.3.1. CDM Selection Rationale

Based on the aforementioned analysis and the requirements from various stakeholders, i.e., AI experts, data model experts and AI4HI project representatives, Scenario 1 and Scenario 3 were deemed the most appropriate for supporting all the necessary processes for querying and transforming information required by the AI model algorithms and frameworks. Consequently, the EUCAIM CDM for analysis and federated processing/learning will be based on the hyper-ontology specification, which underpins the EUCAIM logical data model.

It is important to note that EUCAIM will not mandate the adoption of Scenario 1 or Scenario 3, which involves either a mediator implementation or a one-time ETL process, respectively. However, the EUCAIM partners agreed that a one-time transformation to the EUCAIM CDM is more straightforward and easier to implement, therefore this will be the recommended approach.

As we initially described in Section 4.4.3, the mCODE conceptual model was identified as the most appropriate basis for grounding the hyper-ontology in the oncology domain, especially to build the core layer of the hyper-ontology model by ontologically analyzing and explicitly and semantically representing the mCODE basic specifications. The rationale behind this decision is multifold.

Although the OMOP-CDM and FHIR standards are widely used for standardizing and exchanging healthcare data, they have limitations when it comes to AI-related tasks, especially those requiring tabular data for model training and analysis. OMOP-CDM excels in transforming and standardizing data from diverse healthcare sources into a common format, which is beneficial for interoperability and large-scale observational studies. However, due to its generic nature, and the fact that it is an observational-based model, it makes it unsuitable and not much straightforward for querying oncology related information by AI experts. For example, through its oncology extension most of the cancer modifiers, as these are defined in the OMOP-CDM specification, are represented as “Measurements”, limiting the semantics of cancer stages, cancer grades, extensions, invasions etc. Similarly, the basic FHIR (Fast Healthcare Interoperability Resources) specification is designed to facilitate real-time data exchange between healthcare systems, with its primary focus being on ensuring that different systems can communicate effectively. However, FHIR’s hierarchical and often complex data structures are not inherently suited for the tabular data formats required by many AI algorithms and frameworks. As a reference, all tools currently available in EUCAIM, which are thoroughly described and analyzed in D5.4 require clinical and imaging metadata in a tabular format.

Due to the aforementioned reasons, EUCAIM explored the two most prominent data models in oncology: mCODE (Minimal Common Oncology Data Elements)[18] and OSIRIS[19] (Interoperability and data sharing of clinical and biological data in oncology) which are both event-based models. mCODE, introduced by the ASCO and a group of collaborators, provides a standardized set of essential oncology data elements, ensuring interoperability and data consistency, which is critical for building reliable AI models. Although mCODE is based on FHIR, it narrows down the scope to oncology-specific data elements, making it easier to extract and query relevant information for cancer research and AI applications. On the other hand, OSIRIS, developed by INCa, offers a minimum data set for the sharing of clinico-biological data in oncology. Its relational model makes it easier to represent and manipulate as tabular data, which is ideal for AI model training. This structure allows for efficient querying, aggregation, and analysis of large datasets.

All options considered, the EUCAIM CDM will leverage and build upon the conceptual model of the mCODE specification and the OSIRIS data framework, leveraging the strengths of each framework, as well as accounting for the specific constraints underpinned by the secondary use of data and the AI4HI projects. For example, both models contain mandatory attributes, which cannot be supported by the available knowledge of the AI4HI projects, and that is due to GDPR and anonymization strategies followed by each project for reducing risks of re-identification of patients, and the fact that the clinical information collected by the projects accompany the imaging data. As an example, all date related attributes included in both the OSIRIS and mCODE specifications are not part of the knowledge collected from the AI4HI projects due to the anonymization of the clinical information. Instead, relative relations based on events such as diagnosis or treatment (e.g., events that happened X months after baseline/diagnosis/treatment) are included.

Summarizing, in the context of EUCAIM, mCODE will be the basis conceptual model for representing various cancer types, cancer stages, performance status metrics and scales, as well as assessments (e.g. radiological assessments (ACR Reporting and Data Systems (RADS)[20]), and it is also generally more advantageous due to the fact that it is built on the FHIR based standard, which can be exploited, if necessary, in other contexts, for exchanging purposes. In addition, OSIRIS’ relational model nature, and its approach of creating pivot tables (.csv files) for use in AI related processes supports efficient data selection for data preprocessing, feature extraction, and model training, ultimately enhancing the development of AI applications in oncology, and therefore EUCAIM will follow the same approach for facilitating AI experts in selecting specific cohorts as input to their models, by the use of pivot tables.

A first version of the EUCAIM Data Dictionary is described in the following section. A more detailed version is also available at: EUCAIM_CDM_mCODE_based_v1.0.xlsxarrow-up-right

5.3.2. EUCAIM Data Dictionary

The EUCAIM CDM classifies all the clinical patient data into 6 different domains according to the mCODE specification:

5.3.2.1 Patient

The patient information group allows for general information about the patient including demographics, and the patient's managing organization.

Table 7 7: The EUCAIM CDM: Patient group

Group

Entity

Data Element

Definition

EUCAIM Required

Occurrences Allowed

Data Type

Patient

Patient

Identifier

Anonymized patient identifier which is unique within the context of the system.

Required

1..1

string

Patient

Gender

Administrative Gender - the gender that the patient is considered to have for administration and record keeping purposes.

Optional

0..1

CodeableConcept

Patient

Ethnicity

Concepts classifying the person into a named category of humans sharing common history, traits, geographical origin or nationality.

Optional

0..1

CodeableConcept

Patient

Race

Concepts classifying the person into groups based on their physical appearance

Optional

0..1

CodeableConcept

Patient

Birth Year

The year of birth for the individual.

Optional (required if diagnosis age is not available)

0..1

Integer (>1900, <current year)

Patient

Managing Organization

Organization that is the custodian of the patient record. Need to know who recognizes this patient record, manages and updates it.

Required

0..1

Organization

Patient

Care Provider

Patient's primary care provider organization.

Optional

0..1

Organization

Patient

Birth Sex

A code classifying the person's sex assigned at birth.

Required

1..1

CodeableConcept

Cancer Patient

Deceased

Indicates if the individual is deceased or not.

Optional

0..1

boolean

Cancer Patient

Cause of death

Main cause of death of the patient

Optional (conditional on deceased)

0..1

CodeableConcept

Cancer Patient

Date of last contact

Date of last contact if not deceased, or date of death if deceased.

Optional (conditional on deceased)

0..1

Date

Organization

Identifier

Identifies this organization across multiple systems

Optional

1..1

String

Organization

Name

Name used for the organization

Optional

1..1

String

5.3.2.2 Health Assessment

The health assessment group contains information related to the patient’s general health before and after treatment. This includes Comorbidities, Laboratory Tests, Performance Assessments (ECOG), Vital Signs, Family Member History, and Patient History of Metastatic Cancer.

Table 8 8: The EUCAIM CDM: Health assessment group

Group

Entity

Data Element Name

Definition

EUCAIM Required

Occurrences Allowed

Data Type

Health Assessment

Family Member History

Subject

The patient that the family history is about

Required

1..1

Reference: Patient

Family Member History

Relationship

Relationship to the subject

Required

1..1

CodeableConcept

Family Member History

Condition Code

Condition that the related person had

Required

1..1

CodeableConcept

Family Member History

Onset Age

When condition first manifested on the relative.

Optional

0..1

Age

History of Metastatic Cancer

Code

Type of observation

Optional

0..1

CodeableConcept

History of Metastatic Cancer

Value

The information determined as a result of making the observation, if the information has a simple value.

Optional

0..1

boolean

Comorbidities

Focus

Comorbid conditions are typically defined with respect to a specific 'index' condition. For example, comorbid condition categories would be those specified by CDC, namely obesity, renal disease, respiratory disease, etc.

Optional

0..*

Reference: PrimaryCancerCondition

Comorbidities

Comorbid Condition Present

A comorbid condition that is known to be present Required (conditional)

Required (conditional)

0..*

CodeableConcept

Comorbidities

Comorbid Condition Absent

A condition that is NOT present, related to the patient. Required (conditional)

Required (conditional)

0..*

CodeableConcept

Comorbidities

Code

Describes what was observed. Sometimes this is called the observation "name".

Required

1..1

CodeableConcept

Comorbidities

Subject

The patient whose comorbidities are recorded.

Optional

0..1

Reference: CancerPatient

ECOG Performance Status

Category

A code that classifies the general type of observation being made.

Required

1..1

CodeableConcept

ECOG Performance Status

Code

The name of the non-imaging or non-laboratory test performed on a patient. A LOINC **SHALL** be used if the concept is present in LOINC.

Required

1..1

CodeableConcept

ECOG Performance Status

Subject

Patient whose performance status is recorded.

Required

1..1

Reference: CancerPatient

ECOG Performance Status

Value

The information determined as a result of making the observation, if the information has a simple value.

Optional

0..1

integer

ECOG Performance Status

Interpretation

A categorical assessment of an observation value. For example, high, low, normal.

Optional

0,,*

CodeableConcept

5.3.2.3 Disease

The disease group includes information specific to the tumor markers, the cancer diagnosis, the histological classification, grade, morphology, and behavior of tumors, the staging of cancer, as well as any cancer risk assessment metrics.

Table 9 9: The EUCAIM CDM: Disease group

Group

Entity

Data Element Name

Definition

EUCAIM Required

Occurrences Allowed

Data Type

Disease

Tumor Marker Test

Related Condition

Associates the tumor marker test with a condition, if one exists. Condition can be given by a reference or a code. In the case of a screening test such as prostate-specific antigen (PSA), there may be no existing condition to reference.

Optional

0..*

Reference(PrimaryCancerCondition)

Tumor Marker Test

Code

The tumor marker test that was performed. A LOINC concept shall be used if the concept is present.

Required

1..1

CodeableConcept

Tumor Marker Test

Subject

Patient whose test result is recorded.

Required

1..1

Reference: CancerPatient

Tumor Marker Test

Value As Concept

The Laboratory result value if it is a coded value. The value CodeableConcept.code shall be selected from SNOMED CT.

Required (conditional)

1..1

CodeableConcept,

Tumor Marker Test

Value As Number

The Laboratory result value, if numeric.

Required (conditional)

1..1

Float

Tumor Marker Test

Value Unit Concept

If a numeric value, valueQuantity.code **SHALL** be selected from [UCUM](http://unitsofmeasure.org). A FHIR [UCUM Codes value set](http://hl7.org/fhir/STU3/valueset-ucum-units.html) that defines all UCUM codes is in the FHIR specification.

Required (conditional)

1..1

CodeableConcept

Tumor Marker Test

Performed

The elapsed time from the baseline (time 0).

Optional

1..1

Integer

Tumor Marker Test

Performed Unit Concept

The unit concept of the time interval

Optional

1..1

Integer

Primary Cancer Condition

Age of diagnosis/condition

The patient age on which the existence of the Condition was first asserted or acknowledged.

Required

1..1

Age

Primary Cancer Condition

Subject

Indicates the patient or group who the condition record is associated with.

Required

1..1

Reference: CancerPatient

Primary Cancer Condition

Code

Identification of the condition, problem or diagnosis.

Required

1..1

CodeableConcept

Primary Cancer Condition

Histology Morphology Behavior

A codeable concept describing the morphologic and behavioral characteristics of the cancer.(It takes values from: http://hl7.org/fhir/us/mcode/ValueSet/mcode-histology-morphology-behavior-vs)

Required

1..1

CodeableConcept

Primary Cancer Condition

Body Site

The anatomical location where this condition manifests itself.

Required

1..*

CodeableConcept

Primary Cancer Condition

Body Site > Location Qualifier

General location qualifier (excluding laterality) for this bodySite

Optional

0..*

CodeableConcept

Primary Cancer Condition

Body Site > Laterality Qualifier

Laterality qualifier for this bodySite

Optional

0..1

CodeableConcept

Primary Cancer Condition

Onset Age

Estimated or actual age the condition began, in the opinion of the clinician.

Optional

0..1

Age

Primary Cancer Condition

Abatement Age

The date or estimated date that the condition resolved or went into remission. This is called "abatement" because of the many overloaded connotations associated with "remission" or "resolution" - Conditions are never really resolved, but they can abate.

Optional

0..1

Age

Secondary Cancer Condition

Histology Morphology Behavior

Describes the morphologic and behavioral characteristics of the cancer.

Optional

1..1

CodeableConcept

Secondary Cancer Condition

Related Primary Cancer Condition

A reference to the primary cancer condition that provides context for this resource.

Optional

1..1

Reference: Primary Cancer Condition

Secondary Cancer Condition

Code

Identification of the condition, problem or diagnosis.

Required

1..1

CodeableConcept

Secondary Cancer Condition

Body Site

The anatomical location where this condition manifests itself.

Optional

0..*

CodeableConcept

Secondary Cancer Condition

Body Site > Location Qualifier

General location qualifier (excluding laterality) for this bodySite

Optional

0..*

CodeableConcept

Secondary Cancer Condition

Body Site > Laterality Qualifier

Laterality qualifier for this bodySite

Optional

0..1

CodeableConcept

Secondary Cancer Condition

Subject

Indicates the patient or group who the condition record is associated with.

Required

1..1

Reference: CancerPatient

Secondary Cancer Condition

Condition appearance

The number of time elapsed after the primary cancer condition on which the existence of this Condition was first asserted or acknowledged.

Required (conditional on Onset Age)

1..1

Integer

Secondary Cancer Condition

Appearance Unit Concept

The unit of time for the time elapsed after the primary cancer condition

Required (conditional condition appearance)

1..1

Integer

Secondary Cancer Condition

Onset Age

Estimated or actual age the condition began, in the opinion of the clinician.

Required (conditional on condition appearance)

1..1

Age

Secondary Cancer Condition

Abatement Age

The date or estimated date that the condition resolved or went into remission. This is called "abatement" because of the many overloaded connotations associated with "remission" or "resolution" - Conditions are never really resolved, but they can abate.

Optional

1..1

Age

Cancer Stage

Code

The kind of stage reported, e.g., a pathologic TNM stage, a Lugano lymphoma stage, or a Rai stage for leukemia. This element identifies the type of value that is reported in Observation.value and is necessary for the correct interpretation of that value.

The distinction between Observation.code and Observation.method is important. Observation.code identifies the kind of stage being reported while Observation.method represents the staging system used to determine the code. Observation.code may imply the staging system. For example, the SNOMED CT 103420007 says the reported value is a modified Dukes stage, implying the Modified Dukes staging system (SNOMED CT 385359000) was used to determine the stage. When the staging system is implied by Observation.code, Observation.method is not required. However, when Observation.code does not imply a staging system (for example, if the code is SNOMED CT 385388004 Lymphoma stage), then the staging system must be specified in Observation.method.

The value (Observation.valueCodeableConcept) may also imply certain things about the kind of stage being reported. For example, the value cN0 implies the value is a clinical stage. However, even if the value is partly or wholly self-identifying, it is not a reliable indicator of the type of stage being reported or the method of staging. Therefore, Observation.code must in all cases be reported.

Required

1..1

CodeableConcept

Cancer Stage

Method

The staging system or protocol used to determine the stage, stage group, or category of the cancer based on its extent. When the staging system is implied by Observation.code, Observation.method is not required. However, when Observation.code does not imply a staging system (for example, if the code is SNOMED CT 385388004 Lymphoma stage), then the staging system must be specified in Observation.method.

Optional

0..1

CodeableConcept

Cancer Stage

Value

The stage, stage group, category, or classification resulting from the staging evaluation.

Required

1..1

CodeableConcept

Cancer Stage

Subject

The patient associated with staging assessment.

Required

1..1

Reference: CancerPatient

Cancer Stage

Related Procedure

The procedure from which the cancer stage was determined. It can either be an imaging examination (MRI), biopsy, surgery.

Required

1..*

Reference: Procedure

Cancer Stage

Focus

Staging is associated with a particular cancer condition. Observation.focus is used to point back to that condition.

Optional

0..*

Reference: CancerCondition

Histologic Grade

Related Condition

Associates the histologic grade test with a condition, if one exists. Condition can be given by a reference.

Optional

0..*

Reference: Condition

Histologic Grade

Category

A code that classifies the general type of observation being made.

Required

1..1

CodeableConcept

Histologic Grade

Subject

Patient whose test result is recorded.

Required

1..1

Reference: CancerPatient

Histologic Grade

ValueAsConcept

The Laboratory result value. If a coded value, the value CodeableConcept.code should be selected from SNOMED CT, if the concept exists.

Required

1..1

CodeableConcept

Histologic Grade

ValueAsNumber

The Laboratory result value. If a numeric value, value Quantity.code shall be selected from [UCUM](http://unitsofmeasure.org).

Required

1..1

Quantity

Histologic Grade

Method

Indicates the mechanism used to perform the observation.

Optional

0..1

CodeableConcept

5.3.2.4 Cancer Treatments

The cancer treatment group includes treatment techniques used to treat cancer patients, categorized as: medications, surgery, and radiotherapy.

Table 1010: The EUCAIM CDM: Cancer treatment group

Group

Entity

Data Element Name

Definition

EUCAIM Required

Occurrences Allowed

Data Type

Treatment

Cancer-Related Surgical Procedure

Code

The specific procedure that is performed.

Required

1..1

CodeableConcept

Cancer-Related Surgical Procedure

Subject

The patient on whom the procedure was performed.

Required

1..1

Reference: Patient

Cancer-Related Surgical Procedure

Performed

Period of time elapsed after baseline

Optional

0..1

Integer

Cancer-Related Surgical Procedure

Performed Unit Concept

Cancer-Related Surgical Procedure

Body Site

Detailed and structured anatomical location information. Multiple locations are allowed - e.g. multiple punch biopsies of a lesion.

Optional

0..*

CodeableConcept

Cancer-Related Surgical Procedure

Body Site > Location Qualifier

General location qualifier (excluding laterality) for this bodySite

Optional

0..*

CodeableConcept

Cancer-Related Surgical Procedure

Body Site > Laterality Qualifier

Laterality qualifier for this bodySite

Optional

0..*

CodeableConcept

Cancer-Related Surgical Procedure

Response

Response evaluation to an oncology treatment from RECIST terminology.

Optional

0..1

CodeableConcept

Cancer-Related Medication Administration

Code

Code that identifies this medication

Required

1..1

CodeableConcept

Cancer-Related Medication Administration

Subject

The patient receiving the medication.

Required

1..1

Reference: CancerPatient

Cancer-Related Medication Administration

Effective

An interval of time during which the administration took place.

Optional

0..1

Period

Cancer-Related Medication Administration

Effective Unit Concept

An interval of time during which the administration took place.

Optional

0..1

Period

Cancer-Related Medication Administration

Administered

The time elapsed

Optional

0..1

CodeableConcept

Cancer-Related Medication Administration

Administered Unit Concept

Period of time elapsed unit concept.

Optional

0..1

CodeableConcept

Cancer-Related Medication Administration

Response

Response evaluation to an oncology treatment from RECIST terminology.

Optional

1..1

CodeableConcept

Radiotherapy Course Summary

Modality

Capturing a modality of external beam or brachytherapy radiation procedures.

Required

1..1

CodeableConcept

Radiotherapy Course Summary

Technique

Capturing a technique of external beam or brachytherapy radiation procedures.

Optional

0..*

CodeableConcept

Radiotherapy Course Summary

Actual Number of Sessions

The number of sessions in a course of radiotherapy.

Optional

0..1

unsignedInt

Radiotherapy Course Summary

Dose Delivered to Volume

Dose delivered to a given radiotherapy volume.

Optional

0..*

Radiotherapy Dose Delivered To Volume Extension

Radiotherapy Course Summary

Dose Delivered to Volume > Volume

A BodyStructure resource representing volume in the body where radiation was delivered, for example, Chest Wall Lymph Nodes.

Optional

0..1

Reference: RadiotherapyVolume

Radiotherapy Course Summary

Dose Delivered to Volume > Total Dose Delivered

The total amount of physical radiation delivered to this volume within the scope of this dose delivery, i.e., dose delivered from the Procedure in which this extension is used.

Optional

0..1

Quantity

Radiotherapy Course Summary

Dose Delivered to Volume > Fractions Delivered

The number of fractions delivered to this volume.

Optional

0..1

unsignedInt

Radiotherapy Course Summary

Code

The specific procedure that is performed. Use text if the exact nature of the procedure cannot be coded (e.g. "Laparoscopic Appendectomy").

Required

0..1

CodeableConcept

Radiotherapy Course Summary

Subject

The patient on whom the procedure was performed.

Required

1..1

Reference: CancerPatient

Radiotherapy Course Summary

Performed

Period of time elapsed in months after primary cancer diagnosis

Optional

0..1

Period

Radiotherapy Course Summary

Body Site

Coded body structure(s) treated in this course of radiotherapy. These codes represent general locations. For additional detail, refer to the BodyStructures references in the doseDeliveredToVolume extension.

Optional

0..*

CodeableConcept

Radiotherapy Course Summary

Response

Response evaluation to an oncology treatment from RECIST terminology.

Optional

1..1

CodeableConcept

Radiotherapy Volume

Identifier

Unique identifier to reliably identify the same target volume in different requests and procedures, for example, the Conceptual Volume UID used in DICOM.

Optional

0..*

Identifier

Radiotherapy Volume

Morphology

The kind of structure being represented by the body structure at `BodyStructure.location`. This can define both normal and abnormal morphologies.

Optional

0..1

CodeableConcept

Radiotherapy Volume

Location

The location and locationQualifier codes specify a TG263 body structure comprising the irradiated volume.

Optional

0..1

CodeableConcept

Radiotherapy Volume

Location Qualifier

Qualifiers that together with the associated location code specify the TG263 body structure comprising the irradiated volume.

Optional

0..*

CodeableConcept

Radiotherapy Volume

Description

A text description of the radiotherapy volume, which SHOULD contain any additional information above and beyond the location and locationQualifier that describe the volume.

Optional

0..*

string

Radiotherapy Volume

Patient

The patient for which a radiotherapy procedure is planned or performed.

Required

1..1

Reference: CancerPatient

5.3.2.5 Outcome

The outcome group involves the cancer disease status, e.g., whether it is stable, worsening (progressing), or improving (responding) based on different kinds of evidence (imaging data, tumor markers etc.).

Table 11 11: The EUCAIM CDM: Outcome group

Group

Entity

Data Element Name

Definition

EUCAIM Required

Occurrences Allowed

Data Type

Outcome

Tumor

Body Structure Identifier

Stable identifier(s) for this specific tumor. The identifiers MUST be unique within the context of the referenced `CancerPatient`. This id is used to track the tumor over time, through the related procedures.

Required

1..*

Identifier

Tumor

Related Condition

Associates this tumor with a cancer condition. This could be a causal association (e.g., this is believed to be the primary tumor causing the cancer) or a different type of relationship (e.g., this tumor is a metastasis)

Optional

0..1

CodeableConcept or Reference: Condition

Tumor

Related Procedure

Associates this tumor with a related procedure. For example it associates a tumor with an MR examination procedure.

Required (conditional on Condition)

1..1

Reference: Procedure

Tumor

Risk Assessment

Associates this tumor with a risk assessment report. In case the tumor is identified in an imaging report, this could be used for storing RADS related information.

Optional

0..1

Reference: RiskAssessment

Tumor

Morphology

The kind of structure being represented by the body structure at `BodyStructure.location`. This can define both normal and abnormal morphologies.

Optional

0..*

CodeableConcept

Tumor

Location

The anatomical location or region of the specimen, lesion, or body structure.

Required

1..*

CodeableConcept

Tumor

Location Qualifier

Qualifier to refine the anatomical location. These include qualifiers for laterality, relative location, directionality, number, and plane.

Optional

0..*

CodeableConcept

Tumor

Patient

The patient associated with this tumor.

Required

1..1

Reference: CancerPatient

Tumor Size

Code

Describes what was observed. Sometimes this is called the observation "name".

Required

1..1

CodeableConcept

Tumor Size

Subject

The patient whose tumor was measured. SHALL be a `Patient` resource conforming to `CancerPatient`.

Required

1..1

Reference: CancerPatient

Tumor Size

Focus

Reference to a BodyStructure resource conforming to Tumor.

Optional

0..1

Reference: Tumor

Tumor Size

Volume

The volume of the lesion

Optional

0..1

Quantity

Tumor Size

Method

Method for measuring the size or the volume of the tumor

Optional

0..1

CodeableConcept

Tumor Size

Tumor Longest Dimension

The longest tumor dimension in cm or mm.

Required

1..1

Quantity

Tumor Size

Tumor Longest Dimension > Code

Describes what was observed. Sometimes this is called the observation "code".

Optional

0..1

CodeableConcept

Tumor Size

Tumor Longest Dimension > Value

The information determined as a result of making the observation, if the information has a simple value.

Optional

0..1

Quantity

Tumor Size

Tumor Other Dimension

The second or third tumor dimension in cm or mm.

Optional

0..2

Quantity

Tumor Size

Tumor Other Dimension > Code

Describes what was observed. Sometimes this is called the observation "code".

Required

1..1

CodeableConcept

Tumor Size

Tumor Other Dimension > Value

The information determined as a result of making the observation, if the information has a simple value.

Optional

0..1

Quantity

Cancer Disease Status

Evidence Type

Categorization of the kind of evidence contributing to a clinical judgment on cancer disease progression.

Optional

0..*

CodeableConcept

Cancer Disease Status

Code

Describes what was observed. Sometimes this is called the observation "name".

Required

1..*

CodeableConcept

Cancer Disease Status

Subject

Patient whose disease status is recorded.

Required

1..*

Reference: CancerPatient

5.3.2.6 Imaging

As the focus of the EUCAIM project is the federation of cancer imaging datasets, it is imperative that important imaging metadata are standardized to facilitate the unambiguous representation of the stored information and support federated queries. Although the DICOM standard for collecting, storing, and transferring medical imaging data can be used to access critical image acquisition parameters (such as acquisition method, field of view, and slice thickness) for cohort discovery and quality checking, it lacks essential information needed to query efficiently relevant images. This is due to the fact that certain information is not standardized in the DICOM metadata. For instance, the classification of a series as a T2-weighted axial series is typically recorded in the "Series Description" (0008,103E) DICOM tag, which is free text and highly variable across clinical institutions.

The EUCAIM Imaging component corresponds to important metadata extracted from the DICOM header-related tags, which get standardized to allow for efficient querying and analysis. Although mCODE does not explicitly represent imaging-related procedures and their corresponding metadata, the EUCAIM CDM builds upon the FHIR Resources ImagingStudy and ImagingSeries, the MI-CDM extension of the OMOP-CDM[21] - the ProCAncer-I imaging extension, and the OSIRIS imaging component. The following section presents a first version of the imaging related entities and their associated information:

  • Image Study: Representation of the content produced in a DICOM imaging study. A study comprises a set of series, each of which includes a set of Service-Object Pair Instances (SOP Instances - images or other data) acquired or produced in a common context. A series is of only one modality (e.g. X-ray, CT, MR, ultrasound), but a study may have multiple series of different modalities.

  • Image Series: Representation of the content produced in a DICOM imaging series, by representing important metadata across all image modalities. Some of the most important parameters, include the modality, the body region, the patient position, the patient orientation, the laterality etc.

  • Image Modality: Representation of the distinct modality-related acquisition parameters, in order to enable tailored queries for each modality (e.g. echo time, magnetic field strength for MR modality etc.). It is important to note that the modeling choice of the image modality entity is to allow for storing any modality related acquisition parameter, without the need to change/add new attributes in the model. However, some important acquisition parameters of the two most important modalities (MR, CT) as these are defined in OSIRIS, but also included in the ProCAncer-I collected MR imaging metadata are:

    • MR image: sequence name, magnetic field strength, MR acquisition type, repetition time, echo time, imaging frequency, flip angle, inversion time, receive coil name, diffusion b-value (for DWI).

    • CT image: kVp, xRay tube current, exposure time, spiral pitch factor, filter type, convolution kernel.

  • Image Annotation: Representation of the most important metadata concerning imaging annotation processes.

Table 1212: The EUCAIM CDM: Imaging group

Group

Entity

Data Element Name

Definition

EUCAIM Required?

Occurrences Allowed

Data Type

Mapping: DICOM Tag Mapping

Imaging

Image Study

Identifier

The logical id of the resource, as used in the URL for the resource. Once assigned, this value never changes.

Required

1...1

id

Image Study

Subject

The patient of the imaging study.

Required

1...1

Reference(Patient)

(0010/*)

Image Study

Study UID

Identifiers for the ImagingStudy, i.e. as DICOM Study Instance UID.

Required

1…1

String

StudyInstanceUID (0020,000D) | study ID (0020,0010)

Image Study

AcquisitionDate

The date the study acquisition was obtained.

Optional

0...1

dateTime

(0008,0020)+(0008,0030)

Image Study

Part Of

A larger event of which this particular ImagingStudy is a component or step. For example, an ImagingStudy as part of a procedure.

Optional

0...*

Reference(Procedure)

Image Study

Access URI

The accessURI of the study, either on a DICOM web server (e.g. via the WADO-RS DICOMweb REST-API) or on a local machine via the path name to the folder containing the study.

Optional

0...*

String

Image Study

Number Of Series

Number of Series in the Study. This value given may be larger than the number of series elements this Resource contains due to resource availability, security, or other factors. This element should be present if any series elements are present.

Optional

0...1

unsignedInt

(0020,1206)

Image Study

Number Of Instances

Number of SOP Instances in Study. This value given may be larger than the number of instance elements this resource contains due to resource availability, security, or other factors. This element should be present if any instance elements are present.

Optional

0...1

unsignedInt

(0020,1208)

Image Study

Manufacturer Name

Name of the manufacturing company of the imaging equipment.

Required

1..1

CodeableConcept

(0008,0070)

Image Study

Manufacturer Model Name

Name of the model of the manufacturing company of the imaging equipment.

Optional

0..1

String

(0008,1090)

Image Series

Study identifier

The study in which the series belongs to.

Required

1..1

Reference(Image Study)

Image Series

Identifier

Unique id for the element within a resource (for internal references). This may be any string value that does not contain spaces.

Required

1...1

string

Image Series

Series UID

The DICOM Series Instance UID for the series.

Required

1...1

String

(0020,000E)

Image Series

Number

The numeric identifier of this series in the study.

Optional

0...1

unsignedInt

(0020,0011)

Image Series

Modality

The distinct modality for this series. This may include both acquisition and non-acquisition modalities.

Required

1...1

CodeableConcept

(0008,0060)

Image Series

Description

A description of the series.

Optional

0...1

string

(0008,103E)

Image Series

Number Of Instances

Number of SOP Instances in the Study. The value given may be larger than the number of instance elements this resource contains due to resource availability, security, or other factors. This element should be present if any instance elements are present.

Optional

0...1

unsignedInt

(0020,1209)

Image Series

Access URI

The accessURI of the series, either on a DICOM web server (e.g. via the WADO-RS DICOMweb REST-API) or on a local machine via the path name to the folder containing the series instances.

Optional

0...*

String

Image Series

Body Site

The anatomic structures examined. See DICOM Part 16 Annex L (http://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_L.html) for DICOM to SNOMED-CT mappings. The bodySite may indicate the laterality of body part imaged; if so, it shall be consistent with any content of ImageSeries.laterality.

Required

1...1

CodeableConcept

(0018,0015)

Image Series

Laterality

The laterality of the (possibly paired) anatomic structures examined. E.g., the left knee, both lungs, or unpaired abdomen. If present, shall be consistent with any laterality information indicated in ImageSeries.bodySite.

Optional

0...1

CodeableConcept

(0020,0060)

Image Series

Specimen

The specimen imaged, e.g., for whole slide imaging of a biopsy.

Optional

0...*

Reference(Specimen)

(0040,0551) + (0040,0562)

Image Series

Acquisition Date

The date the series acquisition was obtained.

Optional

0...1

date

(0008,0021) + (0008,0031)

Image Modality

Identifier

Unique id for the element within a resource (for internal references). This may be any string value that does not contain spaces.

Required

1..1

string

Image Modality

Series identifier

Reference to the series id for which important acquisition parameters are being stored.

Required

1..1

Reference(Image Series)

Image Modality

AcquisitionParameter Code

The concept code of the acquisition parameters relevant to the modality of the series. (e.g. slice thickness for MR modality)

Required

1..1

CodeableConcept

Image Modality

AcquisitionParameter Value As Concept

The concept code of the value of the acquisition parameter (e.g. "Spin echo" value of the "MR echo type" concept)

Optional(conditional on ParamCode)

0..1

CodeableConcept

Image Modality

AcquisitionParameter Value As Number

The numerical value of the modality acquisition concept (e.g. 0 for the gantry tilt angle in case of a CT)

Optional (conditional on ParamCode)

0..1

Float

Image Modality

AcquisitionParameter Value Unit Concept

If a numeric value, the units of measure concept code should be used. (http://unitsofmeasure.org).

Required(conditional on Acquisition Parameter Value as Number)

0..1

CodeableConcept

Image Annotation

Id

A unique identifier for the annotation.

Required

1..1

string

Image Annotation

series.id

The unique identifier for the imaging series being annotated.

Required

1..1

Reference(Image Series)

Image Annotation

study.id

The unique identifier for the imaging study that contains the series that is being annotated.

Required

1..1

Reference(Image Study)

Image Annotation

derived.series.id

The unique identifier for the annotated derived imaging series.

Required

1..1

Reference(Image Series)

Image Annotation

performed

The date and time the annotation was made.

Optional

0..1

datetime

Image Annotation

status

The current status of the annotation, such as final or pending.

Optional

0..1

CodeableConcept

Image Annotation

anatomic location

The anatomic location being annotated (e.g. peripheral zone of the prostate gland)

Optional

0..1

CodeableConcept

Image Annotation

observation

The imaging observation that is reported. (e.g. lesion of the prostate)

Optional

0..1

CodeableConcept

Image Annotation

type

The annotation type (e.g. bounding box, contouring, etc..)

Optional

0..1

CodeableConcept

Image Annotation

method

The method used to create the annotation, such as manual or automatic, or semiautomatic.

Optional

0..1

CodeableConcept

Last updated