2. Glossary

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

Acceptance process

The Management Board makes a decision of acceptance or rejection within a period of 60 days, supported by the internal governance bodies on ethics and legal compliance and taking into consideration the indications from the Access Committee and Steering Committee.

Access Committee

EUCAIM Governing Body that controls the access to the Atlas of Cancer Images. It reviews the evaluation reports to provide a final decision about the acceptance/rejection of data and/or tools (by data holders), and the R&D requests (by Data Users). The body will ensure responsible and secure access to the infrastructure data and services, promoting valuable research while upholding ethical and privacy standards.

Access Negotiator

The Access Negotiator, or Negotiator, is a specialised tool integrated into the EUCAIM Dashboard and designed to facilitate the exchange of documents and information between Data User and the Access Committee. On the one hand, the Negotiator allows users to submit requests for data or software to one or several holders as selected in a previous discovery step in the EUCAIM catalogue. On the other hand, the Negotiator also allows users to build new research projects by facilitating the negotiation with a specific EUCAIM network of contacts according to their objectives and needs. In both cases, the negotiation mechanism allows the Access Committee and, ultimately the Data or Software Holder itself, where appropriate, to (a) to obtain more information from the requestor to better understand the reason of the request and the requested data in this broadcast mode, (b) to enter a negotiation process with the requester, or (c) to step back from a request in case thinking of not being able to fulfil what was requested for some reason.

Administrative Project Coordinator (AdmCo)

The administrative project coordinator is responsible for the mediation between the project consortium and the funding authority, the European Commission (EC). Acting as the main point of contact with the EC, the AdmCo is responsible for the overall administrative and financial management of the EUCAIM project. The administrative coordinator is also tasked with the technical review of deliverables and milestones and financial reporting. Finally, it is currently envisioned that the AdmCo will oversee all managerial aspects of the Central Hub Office, with the overall purpose of supporting the implementations of the activities planned in the periodic strategic plan for the maintenance of the infrastructure.

Advisory Boards

The Advisory Boards are a group of external experts established during the course of the project to advise the Management Board on technical, ethical and related legal issues as well as on exploitation and regulatory matters. These boards will involve participants that are not part of the consortium members, in order to provide a fresh-eye, unbiased view on the decision making of the rest of boards. Even after the project concludes, the AB is envisioned to continue to provide external, unbiased advice on any decision-making regarding the day to day operations of the infrastructure, both at the technical and legal level.

Affiliated Entity

An organization that, without being a main partner of the consortium, maintains a formal relationship with one of the participating entities. These entities can contribute to the project by providing data, infrastructure, technical expertise, or support in the implementation of tools and platforms for analysis. [1]

Aggregated data

Aggregated data is pooled data. Statistical data about several individuals that have been combined to show general trends or values within the data [2].

AI Impact Assessment

A structured process used to evaluate the potential risks, benefits, and societal effects of artificial intelligence systems before and after deployment. It considers ethical, legal, social, and economic impacts, particularly on fundamental rights, safety, and compliance with EU values. The AI Impact Assessment aims to ensure trustworthy and responsible development of AI, especially in high-risk applications, as outlined in the proposed European AI legal framework. [3]

Analysis Platform

Component within the federated processing infrastructure for executing tasks related to data analysis, including AI training and inference, while upholding data privacy and regulatory compliance. It provides a user interface, comprising both a dashboard (FP Dashboard) and API, where users can initiate experiments, monitor processes, and retrieve results. It includes frameworks responsible for orchestrating, managing, distributing (if applicable) and preparing the compute processing environment for the packed software.

Annotation

The process of marking, labelling, or adding metadata to medical images to make it more informative for a specific task. Segmentations are a highly valuable type of annotation in medical imaging that consist of delineating specific areas within an image (e.g., identifying tumours or organs in medical images), typically with pixel-level accuracy, assigning class labels to different segments. This helps in training models that perform tasks like object detection, image segmentation, or classification.

Annotation Hackathons

Workshops organised to collect necessary metadata from available tools following the recommendations and standards of the ELIXIR infrastructure. These events focus on enhancing the description and registration of software tools and modules to be utilised in the EUCAIM infrastructure.

Anonymisation

The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject. Irreversible removal of personally identifiable information (i.e., all directly and indirectly identifying information) definitely not allowing the identification of the data subjects [6]. The methods used to anonymize the data depend on the context and the technology used (such as DICOM tags removal and facial erasing); this process must take into account the recommendations of the Data Protection Authorities of each EU Member State. Anonymized data allows its use in research and the development of artificial intelligence tools without compromising patient confidentiality, facilitating collaboration and information exchange between different institutions.

Artificial Intelligence Act

Proposed legislation by the European Union (EU) aimed at regulating Artificial Intelligence (AI) technologies within the EU. The act seeks to ensure ethical, transparent, and accountable AI practices while fostering innovation and competitiveness [4].

Atlas of Cancer Images

Data and service environment of the federation for aiding cancer research (during the project as well as after its conclusion for future utilisation). The Atlas of cancer images includes de-identified images both from the Central Hub, as well as federated data nodes, plus the data from the European research repositories.

Authentication and Authorisation Infrastructure (AAI)

AAI refers to a set of services and procedures that enable the identification of a user and the validation of their credentials (authentication) and the definition and implementation of the permissions for the user (authorisation) in order to access restricted information or services. EUCAIM services use the Life Science Login as AAI,

↑ Back to glossary index

B

Benchmarking

Benchmarking ensures that the platform's software tools, AI models, and infrastructure meet high standards of performance, scalability, and usability. It consists of scientific and technical benchmarking processes.

Scientific benchmarking evaluates AI models, data preprocessing tools, analytical software, and datasets to ensure validity, reliability, and applicability in clinical and research contexts. AI models are tested for accuracy and robustness, preprocessing tools for data compatibility and scalability, and datasets for biases and representativeness.
Technical benchmarking focuses on the integration, robustness, and scalability of platform components. It includes compatibility testing for central services and federated nodes, AI model deployment efficiency across various hardware configurations, and scalability testing to ensure resilience under high data loads.

By applying benchmarking, EUCAIM optimizes performance, ensures interoperability, and enhances reliability across its federated ecosystem.

Beneficiary

An organization that is part of the project's official consortium and receives funding from the European Union to carry out specific activities assigned within the EUCAIM initiative.

Biometric data

Personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or fingerprint data [7] [8].

Budget for Open Call

Available funds allocated for new beneficiaries. These beneficiaries will receive funding under the same co-funding conditions as consortium partners (i.e. 50% of the budget; a total budget of €3,600,000 has been included with the COO EIBIR, and the maximum amount per grant is €200,000).

↑ Back to glossary index

C

Calibration

In prediction models, calibration refers to the concordance between predicted and observed probabilities.

Central Core Infrastructure and Services

The Central Core Infrastructure of EUCAIM is the foundational cloud-based infrastructure that supports the deployment, orchestration, and performance of the federation. It consists of hardware resources, cloud management (OpenStack), and container orchestration (Kubernetes) to provide high-performance computing, storage, and networking capabilities.

This infrastructure is divided into production and development environments, ensuring scalability, security, and continuous updates. The core system includes persistent storage, GPU acceleration, and automated service deployment, enabling a reliable foundation for EUCAIM's operations. Additionally, external services such as Life Sciences Authentication and Authorization Infrastructure (LS-AAI) provide secure access management.

At its core, the Central Core Infrastructure ensures that the platform has the necessary computing power, networking, and deployment mechanisms to support federated services and data processing across the EUCAIM network.

Central Dashboard

Website intended for Data Users who want to use EUCAIM data for analysis in the context of research and innovation. Starting from the dashboard, Data Users can see the metadata of datasets in a public catalogue, are able to register into the platform and search for metadata (e.g., disease, imaging modalities, age groups), can request access to data (if needed), can apply processing tools to the data, can obtain analysis results, and may inform the providers of interesting results obtained for their consideration. In addition, the dashboard will also guide data holders to the documentation page and a request form.

Central Hub

Infrastructure comprising the Reference Nodes, Central Dashboard, and the services and SW provided by the EUCAIM platform.

Central Hub Office (CHO)

The Central Hub Office is responsible for all functions necessary in accordance with the infrastructure´s statutes, the needs of its ordinary functioning and compliance with the legal requirements for an entity of its nature. The CHO will comprise experts in cloud infrastructure maintenance, technical support, legal matters, fundraising and project management, IPR, dissemination and promotional actions, as well as administrative and financial management.

Central Hub Operational Framework (CHOF)

The set of guidelines, processes, and organizational structures that govern the operation and management of the Central Hub [9]:

It serves as a centralized point for coordination, data management, and resource distribution within the EUCAIM initiative.
It defines how technological infrastructures and platforms, patient data collection, and collaborative workflows among consortium partners will be managed.
It outlines the responsibilities and protocols to ensure that the project's objectives are met efficiently and in compliance with ethical and legal standards.

Central Validation Services (CVS)

Centralized services responsible for the evaluation, verification, and validation of the tools, platforms, and artificial intelligence models developed within the Project [9]:

They ensure that medical image analysis technologies meet quality, accuracy, and reliability standards before being implemented or used in clinical settings.
They conduct thorough testing using reference datasets and collaborate closely with consortium partners to ensure that the solutions are validated in ethical, technical, and regulatory terms.

Chief Security Officer (CSO)

Person responsible for a company's physical and digital security. The CSO provides executive leadership and oversees the identification, assessment and prioritisation of risks, directing all efforts concerned with the security of the organisation. The CSO works to stay ahead of security issues (e.g., security breaches), solve problems and ensure the organisation runs smoothly. Additionally, CSO expertise is required to implement safeguards and reporting risk management mechanisms for regulation compliance [10].

Clinical Data

Clinical data encompasses a diverse set of information integral to the EUCAIM infrastructure, extending beyond imaging data to include a range of critical details relevant to medical research and healthcare. This category involves comprehensive clinical information that accompanies the images, providing contextual insights into patients' health conditions. It covers various aspects, such as mutation status, results from biological samples, quality of life assessments, quality of care metrics, and health-related costs.

Clinical Question

A clinical question is a specific, structured query that arises in the context of healthcare or clinical practice, typically aimed at improving patient care. It often seeks to address uncertainties about diagnosis, treatment, prognosis, or prevention of diseases or health conditions. Clinical questions are essential for guiding research, decision-making, and evidence-based practice.

Cloud

Network of computing facilities providing remote data storage and processing services through the internet [11].

Cloud computing

Paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with administration on-demand [12].

Code system mapping

If the data store uses a different code system(s) than the EUCAIM's Hyper-Ontology (LOINC, SNOMED, ICD10…), code mapping needs to be implemented, to translate the codes from the search query into the codes in the data store. Code system mapping can be a part of the query mapping component, or a separate component.

Collaboration Agreement (ColA)

Document that expresses the willingness of the Parties to collaborate by establishing an overarching framework to facilitate interaction and exchange of information between them.

Collection

An aggregation of one or more datasets of medical images registered in the catalogue.

Collection Explorer

Please refer to User's library

Compliance unit

Team responsible for ensuring that all activities and processes of the project comply with applicable legal, ethical, and regulatory standards. This includes compliance with data protection legislation, as well as other regulations related to medical research, patient privacy, and the use of technologies such as artificial intelligence.

Confidentiality Agreement (CA)

A Confidentiality Agreement is a legal contract between two or more parties that outlines the information or data considered confidential and restricts the sharing of this information with others. It ensures that sensitive or proprietary information remains protected and is only shared under agreed-upon conditions.

An individual's agreement, e.g. to participate in research, undergo a healthcare procedure, to personal data processing.

Within the context of personal data, the General Data Protection Regulation (GDPR) defines consent as: "Any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her" [13].

Consortium Agreement (ConA)

The Consortium Agreement, specifies the rights and obligations of the project partners, it also establishes the relations between the partners themselves.

Customer Relation Management (CRM)

The strategies and tools used to manage and optimize interactions with key stakeholders.

↑ Back to glossary index

D

Data

Data can be defined as the recorded factual material that is commonly accepted in the scientific community as information that is required to support research findings [14] [15]. Refers to any digital representation of acts, facts or information and any compilation of such acts, facts or information, including in the form of sound, visual or audiovisual recording [6]. There are four major categorical types of data for where the data comes from: observational; experimental; simulated and derived [16]. Data is information available for processing. Specifically, the types of data that the EUCAIM infrastructure is interested in collecting are imaging data (radiological and nuclear medicine cancer images of any modality, segmentation masks with the annotations made and histopathological images) and other clinical data (clinical information accompanying the images, mutations status, biological sample results, quality of life, quality of care and health costs).

Data access

The processing of data by a Data User, which was provided by a data holder, in accordance with specific technical, legal, or organisational requirements, without necessarily implying the transmission or downloading of such data (see Personal data) [6]. Three data access conditions are offered in EUCAIM: authorisation to download the datasets; authorisation to access, view and process them in-situ; or authorisation to remotely process the datasets from a federated data node without the ability to access and visualise data, even remotely.

Data access right

The ability, right or permission to act on data in a defined location [17]. Data access in EUCAIM will be limited to authorised individuals or organisations based on specific permissions or roles, and always upon request.

Data Act

A forthcoming regulatory proposal within the European Union aiming to establish uniform guidelines for accessing product or associated service data by end-users of interconnected products or services. This regulation encompasses crucial provisions delineating the prerequisites for data space interoperability (Article 28) and mandates governing the implementation of data sharing agreements through smart contracts (Article 30), thus facilitating seamless data exchange and fostering digital innovation across the EU [18].

Data altruism

Voluntary sharing of data on the basis of the consent by data subjects to process personal data pertaining to them, or permissions of other data holders to allow the use of their non-personal data without seeking a reward, for purposes of general interest, such as scientific research purposes or improving public services [6].

Data Annotation

A process within the realm of data science and machine learning, involving the labelling or tagging of data points with informative metadata to enhance their interpretability and utility for computational algorithms. This method facilitates the training and optimization of machine learning models by providing context and structure to raw data, enabling more accurate analysis and predictive capabilities.

Data API

The Data API provides secure, programmatic access to datasets within EUCAIM, allowing researchers and developers to interact with metadata, retrieve imaging and clinical data, and integrate external applications with the federated infrastructure. This API enables seamless querying and data retrieval while ensuring compliance with interoperability standards.

It supports data materialization, ensuring that requested datasets are properly formatted and prepared for analysis. Additionally, it facilitates access management, allowing authorized users to retrieve and utilize datasets for AI model training and clinical research. The Data API plays a crucial role in enabling automated and scalable access to federated datasets, ensuring efficient interaction between research tools and the EUCAIM platform.

Data Cataloguing

Process of organizing, describing, and structuring the dataset of oncological medical images to facilitate access, search, and interoperability. This process involves the creation of standardized metadata that allow classifying the information according to previously established criteria, as well as the origin of the data and compliance with privacy and ethical regulations. [5]

Data Centric Health Research Computational Infrastructure

Infrastructure that provides data as a service. This infrastructure includes services, such as data visualisation, hosting and processing of data. In particular, it can process health-related sensitive data. Technological infrastructures for data analysis, exploitation and/or processing.

Data collection

This term will not be used in the EUCAIM framework. Please refer to Dataset.

Data Concerning Health

Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.

Data Curator

A person who is responsible for the quality and FAIRness of the health-related data, and to make sure the value of the data is discovered and accessible. This role also considers the possibility of enriching data when increasing its quality. Importantly, data curators might play a role regarding being processors, e.g. responsible for the data at hand.

Data discoverability

The ability or a mechanism to browse and locate available data relevant to a specific user's purpose (e.g., research project) in a non-targeted search. Data is more discoverable if the datasets have a metadata catalogue, and the metadata catalogue is publicly accessible. Discoverability is related to findability from the FAIR principles.

Data Federation and Interoperability Framework (DFIF)

A data federation and interoperability framework extends the concept of data federation by emphasizing not only the integration of disparate data sources but also the ability to interact and exchange information between them effectively.This includes ensuring that data can be exchanged and understood across various platforms, which involves semantic, technical, legal, and organizational interoperability.

Data Federation Framework (DFF)

A data federation framework is a data integration approach that allows organizations to access and query data from multiple disparate sources as if they were a single unified database. This framework does not require the physical movement of data; instead, it creates a virtual layer that abstracts the underlying complexities of varying data sources, such as proprietary query languages or varying schemas, enabling users to interact with the data more easily.

Data governance

Assembly of policies and processes, coordination aspects, data usage and accessibility principles and data management procedures for a certain health data infrastructure to ensure legal compliance, consistency and good data quality throughout the different stages of the data life cycle.

Data Governance Act

European regulation crafted to establish a structured framework fostering European data spaces and fostering trust among stakeholders within the data market (DGA). Enacted in June of 2022, its provisions came into effect in September 2023, heralding a new era of data governance and collaboration in the European landscape [19].

Data harmonisation

The process of removing systematic differences between images acquired from different scanners (i.e., inter-scanner variability) via statistical methods. Such techniques enable multi-center datasets and derive greater power from results than when centres work independently. Given the high economic costs of imaging, multi-center collaboration is the most feasible way to acquire large imaging datasets [20].

Data Harvest Model (DHM)

A structured and systematic approach to the collection, integration, and storage of data related to the project. This model includes both clinical patient data and radiological image data, as well as other relevant information for medical image analysis. It is designed to ensure that data is efficiently obtained, securely stored, and properly used for the development and validation of artificial intelligence technologies. Additionally, the model promotes interoperability between different systems and platforms, ensuring that data can be shared and analyzed collaboratively among consortium partners, while always respecting ethical and privacy regulations. [21]

Data Holder (DH)

A Data Holder refers to any natural or legal person, including entities, bodies, and research organisations in the health or care sectors, as well as European Union institutions, bodies, offices, and agencies, who has the right, obligation, or capability to make certain data available for research purposes. This may include registering, providing, restricting access to, or exchanging the data. Examples of Data Holders include data repositories, regional biobanks, clinical centres, cancer screening programs, public entities, pharmaceutical companies, data altruism initiatives, and publication repositories. These infrastructures may host one or more datasets for discovery and retrieval, and the exposure and access to data in the Dashboard will be provided at the dataset level.

Data Ingestion

The process of importing and managing multi-omic and medical imaging data for clinical trials and research projects. This can be done manually through a visual interface, where users upload DICOM or ZIP files, define project details, patient information, and timepoints. Before any data leaves the user's browser, it is automatically anonymized, and real-time progress updates are provided.

This process primarily follows a batch ingestion approach, where data is uploaded and processed in discrete sets. However, it also integrates real-time ingestion through interoperability services, such as a DICOM node that enables direct communication using the DIMSE protocol and a DICOMWeb server that supports HTTP-based interactions via STOW-RS, QIDO-RS, and WADO-RS operations. Additionally, APIs allow verified users to ingest and manage non-DICOM data programmatically, including creating and editing eCRFs or handling subjects and datasets without relying on a user interface.

Data intermediation service

Service aimed at fostering commercial engagements for the purpose of facilitating data sharing among an indeterminate cohort of data subjects, data holders, and Data Users. This facilitation is achieved through various technical, legal, or alternative means, with a particular emphasis on upholding the rights of data subjects concerning personal data. This definition excludes the following categories:

Services engaging in the aggregation, enrichment, or transformation of data from data holders to augment its value significantly, subsequently licensing the resultant data for use by Data Users without forging direct commercial relationships between data holders and users.
Services primarily focused on intermediating copyright-protected content.
Services exclusively employed by a single data holder to facilitate data usage or utilised by multiple legal entities within a confined consortium, such as supplier-customer relationships or contractual collaborations, especially those aimed at sustaining the functionalities of interconnected Internet of Things (IoT) devices.

Data sharing services extended by public sector entities lacking the intent to establish commercial ties (DGA - Article 2 (10) [22].

Data mapping

The process of matching fields from multiple datasets into a centralised database. It is required to transfer, ingest, process, and manage data [25].

Data Materialisation Service

Services which ensure that datasets requested for federated analysis are properly retrieved, validated, and made available for computational workflows. These services operate within Federated Data Nodes, which store and manage datasets while providing computational resources. The process is facilitated by the Data Materializer Tool, which interacts with each Federated Data Node's Data Materialisation Service to execute dataset retrieval securely. The materialisation process follows predefined configurations, ensuring controlled access, authentication, and compliance within the EUCAIM infrastructure.

Data Materialisation Service

When a researcher requests access to specific data, it needs to be extracted, formatted, and prepared for analysis. The Data Materialization Service handles this process by creating structured datasets that are ready to be used in federated AI training or large-scale clinical studies. It ensures that the data is consistent, properly formatted, and compliant with EUCAIM's interoperability standards.

Data Materializer Tool (DMT)

Software in the federated analysis architecture that allocates the data requested by the researchers in a predefined local storage space in the Federated Data Node. It is launched as a preliminary step once the FP Daemon at the FDN receives a command to run a software (or a workflow). The DMT does a validation on the dataset-ids that are sent, filtering out the ones that are not for the node where it is run.

Data Maturity Model Adoption (DMM)

A structured framework that measures and improves the quality, integrity, and usability of data throughout the project [21]:

It allows for evaluating the data maturity level of clinical and radiological data collected in EUCAIM, from collection to integration and analysis.
It ensures that the data used to train artificial intelligence models are accurate, complete, and well-managed.
It also facilitates the identification of areas for improvement in data management processes and ensures that the data meet the necessary standards for research and clinical implementation, while respecting ethical and privacy regulations.

Data Population Monitoring Team (DPMT)

The objective of this interdisciplinary Team is to define a set of KPIs related to data holders, data, users, software and user training. These KPIs will be monitored and accessible through the Data Population Monitoring Dashboard (DPMD).

Data Protection Impact Assessment (DPIA)

The DPIA process aims at providing assurance that controllers adequately address privacy and data protection risks of 'risky' processing operations. By providing a structured way of thinking about the risks to data subjects and how to mitigate them, DPIAs help organisations to comply with the requirement of 'data protection by design' where it is needed the most, i.e. for 'risky' processing operations.

Data Protection Task Force

Body that plays the role of the Data Protection Officer (DPO) during both the project execution and beyond. It will monitor internal compliance, inform, and advise on data protection obligations, provide advice regarding Data Protection Impact Assessments and act as a contact point for all the partners and data subjects (the results of this task being documented in D3.6 - Data Management Plan). During the project execution phase, the main representatives of this task force will involve the DPOs of each consortium partner. Upon project end, the members of this board may need to be re-elected.

Data Push Model (DPM)

An approach in which data is actively sent from the participating institutions to a central repository or analysis platform. Instead of data being passively requested, the data push model implies that the parties involved in the project continuously and proactively provide their data. This model is key to integrating large volumes of clinical and radiological data, allowing the data to be centralized and accessible to consortium partners who need it for the development and validation of artificial intelligence technologies in medical image analysis. Furthermore, it ensures greater efficiency in data collection, facilitating its analysis and contributing to the progress of the project. [21]

Data quality

The degree to which a set of inherent characteristics of data fulfils requirements [23].

Notes: The requirements are defined by the purpose of the processing and hence data quality can be viewed in other words also as a "fitness for purpose". The purpose can be any use of the data, including primary use or secondary use.

For the purpose of data protection, data quality refers to a set of principles laid down in Article 5 of the GDPR and Article 4 of Regulation (EU) 2018/1725, namely [24]:

Lawfulness, fairness and transparency
Purpose limitation
Data minimization
Accuracy
Storage limitation
Integrity and confidentiality

Data quarantine

Data Quarantine is the temporary isolation of ingested datasets for quality checks before they are officially registered or made accessible. In EUCAIM, datasets in reference nodes undergo a revision phase to verify integrity, compliance with Tier levels, and adherence to the Common Data Model (CDM) before being published in the Public Catalogue. Imaging datasets in DICOM format can be released while associated clinical data remains in quarantine if it requires further validation. This process ensures data accuracy, consistency, and compliance, following standard data quality workflows where data is inspected in a temporary location before reaching its final destination.

Data recipient

An individual or entity, whether legal or natural, engaged in activities pertinent to their trade, business, craft, or profession, distinct from the user of a product or associated service, to whom the custodian of data furnishes information. This includes third parties to whom data is disclosed upon request by the user to the data holder, or in compliance with obligations delineated by Union law or national legislation implementing Union directives.

Provision of data by a data holder to a Data User for the purpose of joint or individual use of the shared data, based on conditions of use, directly or through an intermediary [6].

Agreement between two or more parties that outline which data will be shared and how the data can be used.

Data sovereignty

Data stored outside of an organisation's host country are still subject to the laws of the country where the data are stored [26].

Data space: A distributed system delineated by a governance framework, facilitating secure and reliable data transactions among participants, with a focus on upholding trust and data sovereignty. A data space typically consists of one or more infrastructures and supports various use cases.

Data steward

A person who has an administrative role; and does not really use the data. Data Stewards create guidelines to make data FAIR and advice on how to do it. They might have direct responsibility on the data at hand (processors) or not.

Data subject

As defined in the GDPR, in the case of data processing, a data subject is a person who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person [27].

Data transaction

The outcome of an interaction between two participants, aimed at sharing, accessing, exchanging, or processing data.

Data Transfer Agreement (DTA)

Agreement established between organisations that governs the transfer of one or more data sets from the owner/provider to a third party.

Data User (DU)

A Data User refers to any natural or legal person who wants to make use of the data that is made accessible through the EUCAIM infrastructure for research, development and innovation purposes.

Data User-Innovator

Data Users, as for example data scientists, who are dedicated to developing or enhancing AI algorithms. Innovators typically request data to develop and validate machine learning models, AI algorithms, or predictive models. Their emphasis is on deriving actionable insights from data that can improve business decisions, optimise processes, or create new products or services. Data scientists often possess specialised skills in data mining and analysis, allowing them to clean, process, and analyse data to uncover meaningful insights.

Data User-Researcher

Data Users, as for example clinicians, who typically request data to conduct studies, research, or analysis with the intention of generating new knowledge in the field of medicine and publishing the findings. Their focus is on discovering new insights, patterns, or trends in the data, and their work often contributes to academic or scientific research. Data Users may require specialised access to certain datasets or tools that enable complex data analysis.

Data Warehouse

A healthcare Data Warehouse is a repository that stores large volumes of structured and unstructured data from multiple sources, such as electronic health records (EHRs), laboratory systems, radiology, patient management systems, and administrative databases. It is designed to be highly scalable, allowing healthcare organizations to integrate and analyze information from different systems, facilitating both clinical decision-making and operational management. It must be built with high standards of security and privacy to ensure the proper and safe handling of sensitive information within the healthcare ecosystem.

Dataset

Dataset refers to a specific set of imaging and accompanying clinical information, published by a single data holder and created for a particular purpose or study. A dataset is described by a set of common metadata elements related to the imaging and clinical information, dataset creation, access rights and terms of use. Data Users will be able to request access at the Dataset-level.

DCAT-AP

The DCAT Application Profile for data portals in Europe (DCAT-AP) is a specification designed to enhance the description and interoperability of public sector datasets across Europe. Based on the Data Catalogue Vocabulary (DCAT) developed by W3C, DCAT-AP provides a standardized framework for metadata that facilitates the exchange of dataset descriptions among various data portals. Its primary aim is to improve the discoverability and accessibility of datasets, enabling cross-border and cross-sector searches.

De facto anonymisation

Operations by which personal identifiers are removed and further techniques to reduce personal reference (e.g. randomization or generalisation) are applied so that re-identification with reasonable efforts in accordance with the current state of art is no longer possible. In terms of GDPR compliance this concept requires that data is kept in a closed secure environment that will exclude any external attack. An "attacker" here is a third party (i.e. neither the data holdernor the Data User) who accesses the original data sets accidentally or intentionally.

De-identification

General term for any process of removing the association between a set of identifying data and the data subject (22). De-identification refers to the removal of identifiers (e.g. name, address, National Registration Identity Card number) that directly identify an individual. De-identification is sometimes mistakenly equated to anonymisation, however it is only the first step of anonymisation. A de-identified dataset may easily be re-identified when combined with data that is publicly or easily accessible [28].

Demonstration Experiments

Computational experiments performed with selected platforms to showcase their capabilities. In the context of the EUCAIM project, these experiments demonstrate the functionality of federated learning platforms and distributed analysis tools in solving specific data analysis challenges. The outcomes contribute to understanding technical issues in a real distributed scenario.

Digital Chair Event

Virtual meeting organized to foster the exchange of knowledge, experiences, and advances in the project.

Distributed Data Processing

Also referred to in this project as federated data processing, it consists of the orchestration framework in which the Tier 3 nodes run applications on their data in a coordinated way.

↑ Back to glossary index

E

ELIXIR Tools Platform

Centralised resource for accessing and discovering tools in the life sciences, promoting collaboration and efficiency in research. It includes recommendations for software registration in bio.tools ELIXIR registry, packaging with Biocontainers, and participation in services like OpenEBench for software quality monitoring.

Ethical and Legal Board (ELB)

Body in charge of ensuring that no EU rule is violated, while ensuring that the research conducted is up to the accepted EU standards. In this context, the term "Ethics" refers to questions of legal and regulatory compliance that constitute a part of the governance process. In EU-funded projects, ethics is deemed a transversal issue and Ethics Advisory Board a key oversight mechanism to ensure understanding of the Ethics Appraisal Procedure, proper implementation of the Ethics Requirements, addressing specific issues such as Privacy and Data Protection Impact Assessments or Artificial Intelligence and ensuring ethics compliance in general. The ELB will act as a contact point for guidance on ethical issues that may arise during project execution and beyond project end, working in close connection with any party saddled with ethics-related responsibilities. During the project execution, the ELB will be chaired by the WP3 leaders and composed of legal experts in the participating entities. Beyond project end, the members of this board may be reselected based on availability.

ETL process

ETL stands for Extract, Transform, Load, and is a data integration framework used to prepare data for analysis. It consists of three main phases:

Extract: Data is collected from various sources such as databases, APIs, and files.
Transform: The extracted data is cleaned, standardized, and formatted to meet business requirements, ensuring accuracy and consistency.
Load: The transformed data is then loaded into a target system, such as a data warehouse or database, making it ready for analysis and reporting.

EUCAIM Benchmarking platform

The EUCAIM Benchmarking Platform is a system for evaluating AI models, data preprocessing tools, analytical software, and datasets within the European Cancer Imaging Initiative (EUCAIM). It ensures performance, reliability, and scalability through scientific and technical benchmarking, integrating with OpenEBench and Grand Challenge for standardized evaluation. The platform enables automated performance assessment, comparison of AI models across datasets, and validation of interoperability within the EUCAIM infrastructure. It also supports continuous benchmarking events, addressing challenges like dataset accessibility and automation to enhance the quality and reproducibility of AI-driven cancer imaging solutions

EUCAIM CDM

The EUCAIM Common Data Model (CDM) is a standardized data schema that provides a consistent and homogeneous framework for representing data related to cancer imaging, facilitating the integration and sharing of datasets across various research and clinical environments. Primarily focuses on structuring and organizing the EUCAIM related data entities, attributes, and their relationships, rather than capturing the semantics or rules governing those relationships, which are defined in the EUCAIM hyper-ontology.

EUCAIM DICOM Anonymizer Tool

A specialized tool designed to de-identify imaging data by removing or modifying sensitive DICOM tags according to a predefined de-identification profile. The tool processes folders containing multiple patients or cases, ensuring compliance with privacy regulations while maintaining the integrity of the imaging data. It plays a critical role in data anonymization before datasets are registered in the EUCAIM Public Catalogue, facilitating secure data sharing in medical imaging research. .

EUCAIM Federation / European Federation of Cancer Images

The entity as a whole, which encompasses both the central and federated components (the central repository functions as another node within the federation). The term "federation" encompasses the overall scope of EUCAIM, involving the governing bodies and orchestration of all nodes, whether central or federated. It constitutes the collective framework for coordination and governance.

EUCAIM Hyper-ontology

The EUCAIM hyper-ontology is a common semantic meta-model that aims to support and maintain semantic interoperability among heterogeneous cancer image data models/standards. It defines the concepts, categories, and relationships in the oncology domain, enabling semantic understanding and reasoning about the data. It emphasizes the meaning and context of data rather than just its structure.

EUCAIM Infrastructure

The collective technical foundation supporting the EUCAIM Federation. It includes both central and federated components, forming the backbone that enables data distribution, access, and associated services.

EUCAIM Platform

The overarching framework that combines the distributed data throughout the federation, including both central and federated components, with the services facilitating their use. The platform serves as the integrated infrastructure, providing access to images and associated services within the EUCAIM Federation.

European Data Innovation Board

A distinguished expert group convened pursuant to the mandates outlined in the Data Governance Act (DGA), entrusted with advising the European Commission on the dissemination of exemplary methodologies. Its focal areas encompass data intermediation, data altruism, and the judicious utilisation of public data not amenable to open data practices. Additionally, the EDIB is tasked with orchestrating the harmonisation of cross-sectoral interoperability standards, thus presenting proposals for harmonised guidelines governing European data spaces, as stipulated in Article 30 of the DGA. Further, the EDIB is slated to acquire expanded competencies under the auspices of the Data Act [29].

European Digital Infrastructure Consortium (EDIC)

The Digital Decade policy programme 2030 establishes a new legal framework for multi-country projects, the European Digital Infrastructure Consortium. It is a new instrument to help Member States speed up and simplify the setup and implementation of multi-country projects. A minimum of three Member States who want to use a European Digital Infrastructure Consortium to set up a multi-country project will submit an application to the Commission. Following the examination of Member States' application, the Commission will, if it concludes that all requirements provided for in the decision are satisfied, adopt a decision establishing the European Digital Infrastructure Consortium. Each consortium will have its own legal personality, governing body, statutes, and seat in a participating Member State [30].

European Health Data Space (EHDS)

The European Health Data Space (EHDS) is a proposed framework under European Union legislation that aims to facilitate the safe sharing and use of health data across member states. Its goal is to enable better healthcare delivery, foster medical research and innovation, and create a secure infrastructure for data exchange in compliance with EU data protection laws, such as GDPR.

Evangelisation

The process of promoting, disseminating, and generating acceptance among stakeholders about the technologies and solutions developed within the project. This includes raising awareness, education, and the promotion of innovations related to medical image analysis and the use of artificial intelligence, with the aim of ensuring their adoption and proper implementation in clinical and scientific environments.

External Open Call

An invitation extended to organizations, researchers, and institutions outside the existing consortium to contribute to the EUCAIM project. The call focuses on expanding the federation with new data providers, AI model developers, and clinical partners, enhancing the diversity and representativeness of the datasets.

Applicants are evaluated by an Access Committee, which assesses proposals based on scientific, technical, ethical, and legal compliance. Selected participants will integrate their datasets, tools, or AI models into the EUCAIM infrastructure, supporting cancer imaging research and AI development. While the call does not provide direct funding, it offers an opportunity for new collaborators to engage with EUCAIM's federated ecosystem, benefiting from access to its infrastructure and expertise.

External partners

In the context of an EU project, an external partner typically refers to an organisation or entity that is not a part of the original consortium, but is engaged or involved in the project in some way (e.g. via Open Call). External partners may include organisations, institutions, companies, or individuals who collaborate with the consortium members to contribute and broaden the project's objectives, outcomes, or activities becoming part of the consortium and, therefore, the internal partners.

External use case

Any use case conducted by external partners (see also definition for "use case").

↑ Back to glossary index

F

FAIR Data Point (FDP)

A FAIR Data Point is a system designed to store and publish metadata about datasets, ensuring they are Findable, Accessible, Interoperable, and Reusable (FAIR) on the web without requiring APIs. It standardizes metadata for Findability and Reusability while providing a uniform, open method for Accessing data. Although it enhances metadata Interoperability, the responsibility for data interoperability remains with the data provider. FDP empowers users to share their data online, similar to how early web servers enabled widespread text publishing.

FAIR Principles

Principles to define the Findability, Accessibility, Interoperability, and Reuse of resources for humans and computers at the source. For example, the principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data [31].

Findable: Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier.
Accessible: Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository.
Interoperable: Metadata uses a formal, accessible, shared, and broadly applicable language for knowledge representation.
Reusable: Data and collections have a clear usage licence and provide accurate information on provenance [32].

Federated Core services

The Federated Core Services are the functional components that enable interoperability between independent data providers while maintaining a unified user experience. These services ensure that data remains distributed across different nodes but can still be searched, accessed, and managed in a standardized manner.

Key Federated Core Services include Authentication and Authorization Infrastructure (AAI) for secure user authentication, Public Catalogue to store metadata and enable dataset discovery, Federated Search to query datasets across multiple federated nodes, Negotiator to handle access requests and manage compliance, Monitoring to track system performance, and Helpdesk for user support.

Unlike the Central Core Infrastructure, which provides the technical backbone, the Federated Core Services focus on how users interact with the data, ensuring that datasets across multiple nodes are findable, accessible, and governed under a federated framework. These services enable seamless collaboration between data providers and researchers while respecting data privacy, ownership, and access control policies.

Federated Catalogue

Metadata catalogue that stores the clinical and imaging metadata within the different federated data nodes of the Atlas of Cancer Images, as a federated search endpoint compliant with the EUCAIM federated query requirements.

Federated data analysis

Federated data analysis describes an analysis that is performed on multiple (often geographically) separated datasets. During this analysis, the data is not exchanged and can stay, for example, behind a given institution's firewall. Only the interim results of a local analysis are exchanged between the data-hosting sites [33]. The aggregated non-identifiable results from each local analysis are pooled and returned to the Data User.

Federated learning

This is a specific case of federated data analysis, for machine learning purposes. It is a learning technique that allows users to collectively reap the benefits of shared models trained from rich datasets. The learning task is conducted across multiple separate sites coordinated centrally. Each site has a local training dataset which is never shared. Instead, each site computes an update to the current global model maintained centrally, and only this updated model is communicated [34].

Federated Learning Platform (FLP)

A technological infrastructure that allows artificial intelligence models to be trained in a distributed manner without the need to centralize the data. Instead of moving the data to a central server, federated learning enables the data to remain locally within the participating institutions, and the models are trained collaboratively using that data in a secure way while respecting privacy. [35]

Federated Node

A Local Node, i.e.Infrastructure deployed by a data holder, that meets at least the Tier 2 requirements for technical data compliance. This means that federated nodes support the EUCAIM federated query capabilities (Tier 2) and possibly federated processing capabilities (Tier 3).

Federated Processing (FP) Infrastructure

Technical framework designed to facilitate federated analysis, which involves processing data without centralising it. In the context of the EUCAIM project, FP Infrastructure enables the execution of tasks, including AI training and inference, while keeping data decentralised at their original sites, adhering to specific regulatory frameworks.

Federated Processing Daemon (FP Daemon)

Daemon that operates at each data node within the federated processing infrastructure. It connects to the Message Broker to obtain assigned tasks and initiates the execution of software required for the federated processing. The Orchestrator interacts with the local execution infrastructure, ensuring the proper execution of tasks on each data node.

Federated Processing Orchestrator (FP Orchestrator)

The Federated Processing Orchestrator is responsible for coordinating and managing distributed data processing tasks across multiple nodes within EUCAIM's federated infrastructure. It acts as a central mechanism that ensures computational workloads are efficiently scheduled, distributed, and executed while maintaining security and compliance with data protection regulations.

This component connects to the message broker to receive assigned tasks and manages the execution of federated AI model training and data analysis. By optimizing resource allocation and supporting large-scale, privacy-preserving computations, the FP Orchestrator enables researchers to perform complex analyses across multiple institutions without the need for data centralization.

Federated Processing Services

Please refer to Distributed Data Processing

Federated Query

A search method executed within the framework of a federated database system that enables simultaneous querying of multiple data sources across multiple systems presenting unified results. The federated search translates the query or queries into different formats which are understandable by each data source, via harmonising data models, schemas and query languages to ensure accurate and comprehensive data retrieval. As a result, users obtain relevant information, such as, the number of subjects that meet a specific criteria through a unified result set.

There are two areas of consideration in federated search:

On the central core services, which consist of the front-end, its back-end, the federated query brokering system, and the certificate storage.
On the provider's side, the query dispatcher, the store, and the data holders customised components to translate the queries (and the codes if different code systems than EUCAIM CDM are used) into the locally supported format.

Filing system

Any structured set of personal and non-personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis [36].

Financial Statements (FS)

The documents that reflect the economic and financial situation of the project, detailing the income, expenses, assets, liabilities, and cash flow related to the consortium's activities.

↑ Back to glossary index

G

Genetic Data

Personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question [37].

Governing Body

The party that encompasses the board of EUCAIM and can decide the approval, comment, or refusal of an application of data access, and data or tool provisioning, supported by legal, ethical, and technical boards.

↑ Back to glossary index

H

Health data

Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status [38].

Health Data Access Bodies (HDABs)

Entities or institutions responsible for managing and regulating access to health data, including oncological medical images, within the framework of the project. These bodies are tasked with ensuring that access to health data is carried out ethically and in compliance with data protection regulations. Additionally, they oversee that the data is used solely for the purposes established in the project, ensuring patient privacy and informed consent, and facilitating the secure exchange of data among stakeholders within EUCAIM. [9]

Health information

All organised and contextualised data on population health and health service activities and performance, individual or aggregated, that improves health promotion, prevention, care, cure and policy-making [39].

Health Information System (HIS)

A health information system is the total of resources, stakeholders, activities and outputs enabling evidence-informed health policy-making. The health information system manages all types of health data, from EHRs to imaging data and population health data. HIS activities include data collection, interpretation (analysis and synthesis), health reporting, and knowledge translation, i.e. stimulating and enhancing the uptake of health information into policy and practice. Health information system governance relates to the mechanisms and processes to coordinate and steer all elements of a health information system [40].

Helpdesk

A service to provide support to the users and to notify incidences in the platform in the form of tickets.

Hospital

Within the EUCAIM project framework, affiliated hospitals constitute a distinct subset of data holders. In this case, hospitals will not expose their data warehouses to the federation, which may already exist or may have been created for EUCAIM. Instead, they will be approached individually each time there is a new research project on specific clinical cases. If they choose to participate in the project, hospitals will prepare the necessary anonymised datasets within their data warehouses, and these datasets will be shared with the federation through a federated data node or by uploading them to the Central Storage. Therefore, hospitals will only expose metadata for specific datasets from projects in which they have chosen to participate upon request.

↑ Back to glossary index

I

Imaging folder structure

Standardised organisation system used to structure and store DICOM imaging data within EUCAIM. It follows the hierarchy patient/study/series/images. The annotations are located at series level.

Imaging study

Defined as the utilisation of a variety of imaging techniques to acquire visual representations used as tools for screening, detection and monitor of cancer.

Industry Advisory Board

Group of experts and industry representatives who provide guidance and strategic advice on the commercial, technological, and innovation aspects within the project.

Internal Open Call

An invitation exclusively extended to existing consortium partners to support the expansion of real-world use cases, focusing on new data incorporation and the development of AI algorithms. The internal call was launched during the first project period and, following an assessment of the initial applications, will remain open until the project concludes.

All EUCAIM partners (Beneficiaries, Associated Partners, Affiliated Entities), whether as data holders or those interested in utilising the infrastructure for the development, training, validation, or benchmarking of AI models, are encouraged to apply. More broadly, this call is directed at all project partners interested in using the EUCAIM infrastructure to share their data or address clinical questions requiring extensive datasets and possibly technical, ethical, or legal support. Use cases from industrial partners are particularly encouraged.

The primary aim of this call is to gather all necessary information to ensure the smooth incorporation of data and implementation of use cases. The prioritisation of use cases will be based on their relevance and maturity level. No additional funding will be allocated to applicants.

Internal use case

Any use case conducted by internal partners (see also definition for "use case").

↑ Back to glossary index

L

Learning Management System

A Learning Management System (LMS) is a software application designed to facilitate the administration, documentation, tracking, reporting, and delivery of educational courses and training programs.

Legal entity

A company or organisation that has legal rights and responsibilities. From a legal point of view, it has its own personality and full capacity to fulfil its purposes. In legal relations, it is the holder of rights and obligations. It can be created directly by the law or in accordance with the provisions of the law.

Licensing Agreement

In Europe, the licence is generally considered as a contract between a Licensor (the author of the software) and a Licensee (the user of the software, who can then use it according to the licence terms). Note that if the Licensee does not agree to the licence terms, he/she normally does not have the right to use, copy, change or distribute the software. If the Licensee does this without agreeing to the licence terms, he/she is violating copyright law.

The Life Science Login (https://lifescience-ri.eu/ls-login/) is the Authentication and Authorization Infrastructure (AAI) used by the EUCAIM services to manage the access to services and data. The Life Science Login enables researchers to use their home organisation credentials or community or other identities (e.g. Google, Linkedin, LS ID) to sign in and access data and services they need. It also allows service providers (both in academia and industry) to control and manage access rights of their users and create different access levels for research groups or international projects.

Limited governance framework

A decision-making and oversight structure within the project that sets specific restrictions on the scope and authority of the involved parties. This governance framework aims to ensure that the control and direction of the project remain within defined limits, assigning clear responsibilities to different groups or entities while limiting the degree of authority to avoid conflicts of interest or decisions that could compromise the project's integrity. [21]

Local Data Manager

An authorised technical expert or a team of experts on the DHs site who are responsible for installing, configuring, operating, and maintaining the local services that support the federation. This also includes managing the ingestion of data into the Central Storage when applicable. The Local Data Manager will receive support from the EUCAIM Helpdesk and all relevant teams.

Local Node

A local node represents the infrastructure set up by a Data Holder that is considered compliant with the EUCAIM infrastructure. A local node can be Tier 1, 2 or 3. A local node is responsible for storing datasets by the data holder locally (Tier 1) and may support federated search (Tier 2) and federated processing capabilities (Tier 3).

Local Services

Services run on the local node to reach interoperability with the central hub.

↑ Back to glossary index

M

Machine learning

A subset of AI techniques based on the use of statistical and mathematical modelling techniques to define and analyse data. Such learned patterns are then applied to perform or guide certain tasks and make predictions [41].

Management Board (MB)

The operational body responsible for the monitoring of the technical progress of the project, quality assurance, and the ad-hoc coordination of scientific and technological activities. It comprises the Administrative Project Coordinator, the Scientific Coordinator (SCo) (chair), and all Work Package leaders (WPLs).

Upon project end, the MB is also envisioned to be in charge of any decision making regarding any technical implementations and quality control of all operations regarding the day-to-day functioning of the infrastructure, including the coordination of scientific activities around it.

Mapping Component

A component translating the abstract search tree (AST) containing the query parameters into a query adapted to the data store's data model, such as Structured Query Language (SQL) for sites providing OHDSI OMOP-CDM compliant data, or Clinical Quality Language (CQL) for sites providing FHIR compliant data. The Mapping Component is developed by the Provider and runs together with the Query Dispatcher and the data store. The Query Dispatcher sends the AST to the Mapping Component and receives the search results from it. If the translation into the query language of the data store is already done by the Query Dispatcher, no Mapping Component is needed, and the Query Dispatcher queries the data store directly.

Marketplace

Centralised platform within the EUCAIM federation that facilitates the exchange and distribution of processing tools, services and applications developed by Software Providers. It serves as a repository where Software Providers can contribute their tools for federated processing or data preprocessing purposes to be used by Data Users.

Message Broker

A software intermediary that enables seamless communication between various components within the federated processing system. This broker functions as a central hub, overseeing the exchange, routing, and delivery of messages. Specifically, it plays a key role in coordinating tasks related to federated analysis, ensuring the secure and efficient flow of information between the Analysis Platform and distributed nodes.

Metadata

A set of data that defines and describes a resource (e.g., data, dataset, sample...) so that it can be understood, discovered and reused. There are different levels of metadata. Since metadata can be used to describe different aspects of data, we can group metadata properties in terms of quality, availability, provenance, processing, among others. Then there are metadata catalogues that can be used to describe the available datasets. Metadata is important to make data understandable, and can contribute to increase the findability, accessibility, interoperability and reusability of the data. Metadata can be collected or compiled in repositories to improve the level of compliance with FAIR principles of the datasets.

Metadata harvesting

Automated collection of metadata descriptions from different sources to create useful aggregations of metadata and related services [42].

Milestones

Key events or important stages within the project.

Monitoring services

The EUCAIM monitoring service tracks the status of its components by making periodic requests to associated web services. It uses the ELK stack (Elasticsearch, Logstash, and Kibana) deployed in a Kubernetes cluster. Elasticsearch stores metrics, Kibana provides data visualization, and Logstash processes alerts and sends notifications. Heartbeat checks service availability, while Elastic Agent and kube-state-metrics collect cluster state data. The system is deployed via YAML manifests and operates through automated interactions, ensuring real-time monitoring and alerts.

↑ Back to glossary index

N

Non-personal data

All data other than personal data. Note that non-personal data could be inextricably linked with personal data or be used in order to obtain inferences of persons' qualities; in such case, GDPR and national data protection laws must apply [6].

↑ Back to glossary index

O

Ontology Requirements Specification Document (ORSD)

ORSD is a structured document outlining the requirements that the ontology should fulfil (e.g., reasons to build the ontology, target group, intended uses), ontology requirements (e.g., groups of competency questions) and possibly reach through a consensus process.

Open Call

The open call aims to extend real-world use cases by welcoming new beneficiaries into the consortium. The objectives include: (i) onboarding new data holders to expand geographic scope, data types, or cancer research targets, and (ii) facilitating the adoption of trustworthy AI algorithms developed using the project's data repository. The open call will adhere to specific guidelines on openness and publication, and will be launched at the start of the second project period,in compliance with the terms and conditions stated by the European Commission (see Budget for Open Call).

The following use cases are accepted for this call:

Incorporation of large amounts of cancer imaging data into the Cancer Image Europe platform for further re-use, which can be leveraged to address relevant clinical questions (either into the central repository or through the federation of local repositories)
Development, training, benchmarking and/or validation of AI algorithms
Clinical use cases addressing a specific question (e.g. identification of imaging biomarkers)

Open data

Data that is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. Open licence is a licence agreement which contains provisions that allow other individuals to reuse another creator's work, giving them four major freedoms. Without a special licence, these uses are normally prohibited by copyright law or commercial licence. Most free licences are worldwide, royalty-free, non-exclusive, and perpetual (see copyright durations). Free licences are often the basis of crowdsourcing and crowdfunding projects [43].

Open science

The movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of an inquiring society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practise open-notebook science, and generally making it easier to publish and communicate scientific knowledge [44].

OpenEbench

OpenEBench is the ELIXIR gateway to benchmarking communities, software monitoring, and quality metrics for life sciences tools and workflows, developed by the Barcelona Supercomputing Center (BSC). It supports scientific benchmarking, performance evaluation, and FAIR software monitoring, integrating with the European Open Science Cloud (EOSC) to ensure reproducibility and long-term data storage. The platform enables AI model comparison, workflow validation, and community-led benchmarking through a secure Virtual Research Environment

Orchestrator Node (ON)

Component of the central hub that hosts the elements designed to orchestrate a federated analysis within the EUCAIM infrastructure.

Ownership Control Assessment (OCA)

The process of evaluating and determining who has control and ownership of the data, tools, and results generated during the project. This assessment is crucial to ensure that all participants in the consortium understand and respect intellectual property rights, data access agreements, and the distribution of benefits derived from the developed technologies. [45]

↑ Back to glossary index

P

Patient Advisory Board

A group of patients or patient representatives who provide guidance and insights on the needs, experiences, and concerns related to the use of data and artificial intelligence technologies within the project framework.

Permission

In the realm of data governance, "permission" denotes the authorization granted to Data Users, empowering them with the right to process non-personal data. (DGA Art. 2(6))

Personal data

According to Article 3 (1) of Regulation (EU) 2018/1725: "'personal data' means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person" [46].

Name and the social security number are two examples of personal data which relate directly to a person. However, the definition extends further and also encompasses for instance e-mail addresses and the office phone number of an employee. Other examples of personal data can be found in information on physical disabilities, in medical records and in an employee's evaluation.

Personal data which is processed in relation to the work of the data subject remain personal/individual in the sense that they continue to be protected by the relevant data protection legislation, which strives to protect the privacy and integrity of natural persons.

As a consequence, data protection legislation does not address the situation of legal persons (apart from the exceptional cases where information on a legal person also relates to a physical person).

Personal data breach

A breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed [47].

Personal Identifiable Information (PII)

Any data that can be used to identify a person directly or indirectly, such as name, address, phone number, email address, or medical information. [5]

Platform Manager

A technical expert or team of experts that will operate the core services of the EUCAIM platform. The Platform Manager is responsible for managing and maintaining the underlying infrastructure of the central storage, including servers, databases, and other resources. The Platform Manager manages user accounts and access permissions, deploys applications, and services, uploads new applications to the marketplace (provided by Software Providers), and ensures their proper integration into the platform. They support the orchestration of federated processing, working with Data Holders and Software Providers to integrate metadata, tools, and services, and ensuring that Data Users queries are properly executed. As a team of experts, it is possible to have multiple platform managers assigned to different roles such as security and data privacy, administration, development, system management, etc. Additionally, the Platform Manager provides user support, responds to inquiries, provides documentation, and troubleshoots issues that arise with the platform.

Platform User Roles

A user profile identified by a name and the user stories that (s)he could do, which determine the access permissions and any other authorised activity required to perform the actions in the user stories.

Preprocessing framework

Structured and organised set of tools/softwares, methods and workflows that are used to prepare data before any interaction with the EUCAIM platform. They are used for tasks such as anonymization or quality control by the DHs to prepare their datasets before ingesting them into the reference nodes or sharing them as a federated node. Also, these tools enable DUs to prepare data for the training and validation of AI algorithms.

Primary use of data

The use of any data for the purpose for which it was originally collected.

Privacy-preserving learning techniques

AI-based merging techniques that help preserve patients' privacy. The building blocks of privacy-preserving machine learning are federated learning, homomorphic encryption, and differential privacy. They can use tricks from cryptography and statistics [48].

Processing (personal and non-personal)

Any operation or set of operations which is performed on data or on datasets, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction [49].

Profiling

Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person's performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements [50].

Properly Anonymized Data

The process of irreversibly modifying personal data so that all personally identifiable information (PII) is removed or transformed in a way that the individual it originally referred to can no longer be identified, either directly or indirectly, by any means reasonably likely to be used. Properly anonymised data is no longer considered personal data. This ensures compliance with data protection regulations and safeguards patient privacy. Anonymised data can be used in research and in the development of artificial intelligence tools without compromising confidentiality, promoting secure collaboration and information exchange between institutions .[5]

Pseudonymisation

The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person [51].

Public catalogue

Metadata catalogue available to anonymous and authenticated users, offering the visualisation of the datasets metadata, with basic centralised filtering/faceted search options. This catalogue stores metadata, offering the Data Users basic descriptive information about the available datasets and their data access conditions.

↑ Back to glossary index

Q

Query mapping

Translating an abstract search tree into the query language supported by the data store. Done by the Mapping component.

↑ Back to glossary index

R

Raw Data

Original data from individuals as it was captured by imaging devices, without any processing, modification, or analysis.

Real Word Data (RWD)

RWD is information about patients' health and healthcare delivery that's collected outside of clinical trials. It can come from many sources, including patients, clinicians, registries, and electronic medical records. Examples of RWD can be patient demographics, diagnoses, treatments, lab results, patient outcomes, doctor's notes, discharge summaries, and patient feedback.

Real World Data Holders (RWDH)

Institutions, hospitals, research centers, or other entities that hold and manage real-world clinical and radiological patient data.

Reference Node

A special federated Tier-3 node that can host third-party datasets through Data Sharing Agreements, either permanently or temporarily for their processing. The reference nodes provide registry, federated search and federated processing capacities. Initially, EUCAIMS offers two reference nodes (one at the UPV and one at Health-RI) with complementary capabilities, described in deliverable D5.6.

Repository

A storage for digital information, typically organised in the form of a catalogue of datasets, that can be searchable and can provide access to the data under given conditions.

Research Community (RC)

A formal multidisciplinary team composed of researchers, clinicians, data scientists, engineers, and AI specialists, dedicated to exploring, innovating and advancing a specific field or topic, to improve the role of imaging in healthcare, fostering collaborations and improving outcomes.

Research software

It includes individual pieces of software (e.g. tools), analytical workflows (composition of two or more individual tools and eventually other workflows), platforms (e.g. for federated learning) and other auxiliary software that is important to carry on the scientific activities expected in the project.

Responsible AI

AI that is designed, developed, evaluated, and monitored by employing an appropriate code of conduct and appropriate methods to achieve technical, clinical, ethical, and legal requirements (e.g., efficacy, safety, fairness, robustness, transparency) [52].

Restriction of processing (personal and non-personal data)

Defined by the GDPR, methods by which to restrict the processing of data could include, inter alia, temporarily moving the selected data to another processing system, making the selected personal data unavailable to users, or temporarily removing published data from a website. In automated filing systems, the restriction of processing should in principle be ensured by technical means in such a manner that the personal data are not subject to further processing operations and cannot be changed. The fact that the processing of data is restricted should be clearly indicated in the system [53].

↑ Back to glossary index

S

Scientific Advisory Board

A committee of experts that provides strategic guidance and scientific advice on the development and implementation of the project. Its role is to ensure that EUCAIM initiatives meet the highest scientific standards and address the needs of the medical, research, and clinical community in the use of biomedical imaging, artificial intelligence, and data analysis.

Scientific Coordinator (SCo)

The Scientific Coordinator (SCo) of the project is the person who leads the Central Hub operations in all scientific and technical aspects and provides strategic scientific guidance. The Scientific Coordinator is a central figure in conflict resolution and decision-making in the project management bodies and plays a central role in the monitoring of the Project's overall progress and strategic plans.

Secondary use of data/data re-use

Secondary use refers to using data for a different purpose than the one it was originally collected for (i.e. than the primary use).

According to the European Data Governance Act 2020 're-use' means the use by natural or legal persons of data held by public sector bodies, for commercial or non- commercial purposes other than the initial purpose within the public task for which the data were produced, except for the exchange of data between public sector bodies purely in pursuit of their public tasks [6].

Clinical definition: Secondary use of health data applies personal health information (PHI) for uses outside of direct health care delivery [54].

Secure Processing Environment (SPE)

The physical or virtual environment and organisational means to provide the opportunity to re-use data in a manner that allows for the operator of the secure processing environment to determine and supervise all data processing actions, including to display, storage, download, export of the data and calculation of derivative data through computational algorithms [6].

Sensitive data

Information that is regulated by law due to possible risk for plants, animals, individuals and/or communities and for public and private organisations. Sensitive personal data include information related to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership and data concerning the health or sex life of an individual. These data could be identifiable and potentially cause harm through their disclosure [55].

Service Level Agreement (SLA)

Document that establishes the terms and conditions for integrating a local node into the EUCAIM Federation. It will define the level of service, access to data and processing resources, technical interoperability requirements, and support supplied by the providers. The SLA will also outline service availability targets, constraints, and contact points for addressing any issues or inquiries related to the services within the federation.

Software Provider (SP)

The Software Provider refers to any entity (startups, enterprises, research institutions, government agencies, non-profit organisations) that would like to contribute with processing tools, services, or applications they have developed to the EUCAIM's marketplace for use in the federated processing purposes of the platform.

Stakeholder Forum (SF)

A collaboration space that brings together individuals or institutions, such as hospitals, research communities, and other entities involved in the project. Its goal is to facilitate the exchange of ideas and knowledge among the various key stakeholders, promoting cooperation and alignment of objectives. [21]

Steering Committee (SC)

The Steering Committee is the highest-level decision-making body of the infrastructure and project consortium. It currently consists of one representative of each project partner entity, being chaired by the Scientific Coordinator. The members of the SC are required to be duly authorised to deliberate, negotiate and decide on all matters which fall under the responsibility of the Steering Committee as laid out in the Infrastructure Statutes.

During the project duration, the SC will discuss and decide on major modifications of the consortium membership (e.g., entry of new partners, withdrawal of partners), as well as on the work plan, project budget, intellectual property rights, etc. A more detailed description of these matters are listed in the Article 6.3.1 of the project's Consortium Agreement.

Upon project end, the SC is envisioned to have the last word in the decision-making of any unresolved matter at a lower level (e.g. Technical board, Access Committee). In this context, the SC will be convened ad-hoc by its Chair – the Scientific Coordinator. It is expected that each project partner should be represented at the meeting by its designated representative or by their proxy if the former is not available.

Subcontractor

An external entity or company contracted by one of the consortium partners to carry out specific tasks or activities within the project. It may be responsible for tasks such as software development, data collection and analysis, or the implementation of related technologies, among others.

Sustainability

The ability of an entity, service or process to be maintained continuously over time [57].

Synthetic data

The concept of synthetic data generation is to take an original data source (dataset) and create new, artificial data, with similar statistical properties from it.

Keeping the statistical properties means that anyone analysing the synthetic data, a data analyst for example, should be able to draw the same statistical conclusions from the analysis of a given dataset of synthetic data as he/she would if given the real (original) data.

The use of synthetic data is growing in many fields: from training of artificial intelligence models within the health sector to computer vision, image recognition and robotics fields [56].

↑ Back to glossary index

T

Technical and user management unit

Team responsible for coordinating the technical aspects of the project and managing user interaction. Its role includes maintaining and developing the technological infrastructure, supporting project participants, ensuring data interoperability and security, and managing user needs and expectations to optimize their experience with the platform.

Technical Board (TB)

A committee that provides technical guidance, supervision and control to the project, as part of the project governance structure.

The TB is first tasked with the review of the potential engagement of tools and service providers to EUCAIM. Technical partners have the responsibility to adopt a responsible research and innovation attitude when designing and developing their solutions, by following the guides and requirements of the ethical committees, with the lead and support of the Data Protection Task Force and Ethics Advisory Board.

Technical Showrooms

Workshops that provide insights into platforms and tools relevant to distributed and federated data analysis. These sessions aim to understand the capabilities and potential fit of different tools and platforms within the overall EUCAIM infrastructure.

Testing Data (TD)

Data used for providing an independent evaluation of the trained and validated AI system in order to confirm the expected performance of that system before its placing on the market or putting into service [8].

GDPR is an EU-wide regulation that aims to protect individuals' rights and freedoms concerning the processing of personal data and the free movement of such data across the EU. The GDPR establishes detailed guidelines for lawful data processing, ensuring the quality and legitimacy of the data, and requiring special protection for sensitive categories of data. It also mandates transparency, giving data subjects the right to be informed, access their data, object to its processing, and ensure the confidentiality and security of personal data. The GDPR includes rules for the transfer of personal data to third countries and requires each EU Member State to establish a supervisory authority to oversee compliance.

To accommodate different levels of data compliance with the DFF, three technical tiers have been established. These tiers are scalable and allow data to be upgraded as the datasets are used in new research projects. Each tier offers increased visibility and usability of the data within the EUCAIM community.

Minimum Compliance: public metadata catalogue search
Medium level of compliance: federated query functionality
Full compliance: distributed and federated processing

Tools

A digital or computerised resource that assists, enhances or executes an action or process [58].

Training Data

Data used for training machine learning algorithms (e.g., an artificial intelligence (AI) system) through fitting its learnable parameters [8].

Training platform

EUCAIM's training platform is the training environment for all EUCAIM members. It is based on Moodle platform and contains three training courses addressing the different target groups: (1) Internal - consortium partners only; (2) Public - public stakeholders, such as patient organisations, patients, interested public; (3) EUCAIM users - registered and authenticated users only (data holders, Data Users, tool providers) [59].

Training Team

The team in Work Package 2 working on setting up the EUCAIM training platform as well as conceptualising the training environment and developing the training materials.

Training, communication and outreach unit

Team responsible for designing and implementing training programs for the users and participants of the project. Additionally, it handles internal and external communication, promoting the dissemination of the project's progress and results. This unit works to increase EUCAIM's visibility, foster collaboration among stakeholders, and ensure that information and resources are accessible and understandable to all involved, including the scientific and medical community and the general public.

Transparent liability status

Clarity and visibility of the legal and ethical responsibilities related to the use of data, technologies, and processes within the project. This concept ensures that all parties involved, from researchers to end users, clearly understand who is responsible for what in the event of issues, errors, or regulatory violations. [45]

Trusted research environment

A Trusted Research Environment is a secure processing environment where data remains protected and cannot be downloaded, ensuring compliance with privacy and governance policies. These environments are integrated with Virtual Research Environments (VREs), allowing researchers to perform analyses without direct access to raw data. Certain infrastructure providers enable controlled access to these environments, ensuring data security while supporting collaborative research.

Trustworthy AI

AI with proven characteristics such as efficacy, safety, fairness, robustness, transparency, which enable relevant stakeholders such as citizens, clinicians, health organisations and authorities to rely on it and adopt it in real-world practice [52].

↑ Back to glossary index

U

Use case

A use case refers to a specific scenario or situation in which the Cancer Image Europe platform is used to address a real-life scientific or clinical question where access and re-use of large amounts of cancer data can improve patient outcomes. The integration of use cases into the EUCAIM project will outline the technical, ethical and legal steps involved in the process, and will play a crucial role in defining requirements for the implementation of additional cancer image datasets, tools and AI algorithms to improve clinical outcomes. Example of use case: Identification of imaging biomarkers for early detection of breast cancer.

Scenario: The use case focuses on leveraging the Cancer Image Europe platform to enhance the early detection of breast cancer through the identification of imaging biomarkers. The objective is to develop a robust and efficient AI algorithm to help radiologists in identifying potential malignancies at an early stage.

Steps: 1) Data collection and integration; 2) Training the AI model; 3) Validation of the AI model; 4) Integration in clinical workflows. In this example, a data holder could be a hospital with vast repository of breast imaging data (mammography, MR and ultrasound images) committed to advancing early cancer detection, intending to incorporate this imaging data into the EUCAIM Infrastructure to contribute to the development of a robust breast cancer detection model; while an AI developers could be a Medtech SME specialised in the development of AI algorithms for healthcare applications aiming to leverage the EUCAIM infrastructure for the development, training, and validation of advanced AI algorithms focused on breast cancer diagnostics.

Expected outcomes: 1) Improved early detection of breast cancer through the identification of relevant imaging biomarkers; 2) Enhanced efficiency in radiology workflows, reducing the time required for manual review; 3) Increased accuracy in distinguishing between benign and malignant lesions, reducing false positives and unnecessary interventions; 4) Empowerment of healthcare professionals with a valuable decision support tool for more informed clinical decisions.

Who could apply: 1) A data holder, e.g. a hospital with a repository of breast imaging data (mammography, MR and/or ultrasound images) committed to advancing early cancer detection, intending to incorporate this imaging data into the Cancer Image Europe Infrastructure to contribute to the development of a robust breast cancer detection model; 2) AI developers, e.g. a MedTech SME specialised in the development of AI algorithms for healthcare applications aiming to leverage the Cancer Image Europe infrastructure for the development, training, and validation of advanced AI algorithms focused on breast cancer diagnostics.

User

Any of the actors that interact to the platform, which could belong to Data Holders (preparation and management of data sharing lifecycle), Data Users (researchers and innovators accessing the data, tools and resources), or Software Providers (developing, and maintaining software tools).

User actions

Specific tasks or interactions that users can perform within the platform. These actions are related to the technical use of the platform and are specific to each user role. These actions are initiated by users to achieve specific objectives within the context of the User Stories, that describe the situation or scenario in which these actions take place.

User sandbox

Secure environment where data users can experiment with datasets, test AI models, and validate federated queries without impacting the live EUCAIM infrastructure. It ensures compliance with data governance protocols while supporting innovation and reliable model development.

User story

Descriptions of full interactions of a User Role with the EUCAIM platform, described in natural language. User Stories define in general terms the needs, restrictions, performance limitations, desired features, innovation capabilities and business models for the repository.

User’s Library

Area of the Dashboard where the authenticated Data Users can add and remove the references of collections selected from the User's Catalogue (either filtered using the federated query mechanism or not), request access to them, and view and manage their approved or rejected access requests.

↑ Back to glossary index

V

Validation Data

Data used for providing an evaluation of the trained AI system and for tuning its non-learnable parameters and its learning process [8].

Verified dataset

A Verified Dataset is a dataset that has undergone validation to ensure compliance with EUCAIM's federated processing standards. It is made available through secure reference nodes, where verified software can be executed on the dataset while maintaining data integrity and security. This validation process ensures that datasets are properly structured, anonymized, and meet the necessary requirements for research and AI model training.

↑ Back to glossary index

W

Wizard Tool

An anonymization planning tool to identify potential risks, suggest mitigation strategies, and promote a secure-by-design approach for data anonymization. The tool helps data holders comply with EUCAIM's privacy and accountability requirements by ensuring structured anonymization processes. However, due to non-standardized clinical data in Tier 1, its functionalities for clinical datasets are deactivated. The Wizard Tool enhances awareness of weak points in anonymization workflows and supports compliance with federated data-sharing frameworks. References:

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

↑ Back to top

Previous1. Introduction Next3. References

Last updated 3 months ago

A

Acceptance process

Access Committee

Access Negotiator

Administrative Project Coordinator (AdmCo)

Advisory Boards

Affiliated Entity

Aggregated data

AI Impact Assessment

Analysis Platform

Annotation

Annotation Hackathons

Anonymisation

Artificial Intelligence Act

Atlas of Cancer Images

Authentication and Authorisation Infrastructure (AAI)

B

Benchmarking

Beneficiary

Biometric data

Budget for Open Call

C

Calibration

Central Core Infrastructure and Services

Central Dashboard

Central Hub

Central Hub Office (CHO)

Central Hub Operational Framework (CHOF)

Central Validation Services (CVS)

Chief Security Officer (CSO)

Clinical Data

Clinical Question

Cloud

Cloud computing

Code system mapping

Collaboration Agreement (ColA)

Collection

Collection Explorer

Compliance unit

Confidentiality Agreement (CA)

Consent

Consortium Agreement (ConA)

Customer Relation Management (CRM)

D

Data

Data access

Data access right

Data Act

Data altruism

Data Annotation

Data API

Data Cataloguing

Data Centric Health Research Computational Infrastructure

Data collection

Data Concerning Health

Data Curator

Data discoverability

Data Federation and Interoperability Framework (DFIF)

Data Federation Framework (DFF)

Data governance

Data Governance Act

Data harmonisation

Data Harvest Model (DHM)

Data Holder (DH)

Data Ingestion

Data intermediation service

Data mapping

Data Materialisation Service

Data Materialisation Service

Data Materializer Tool (DMT)

Data Maturity Model Adoption (DMM)

Data Population Monitoring Team (DPMT)

Data Protection Impact Assessment (DPIA)

Data Protection Task Force

Data Push Model (DPM)

Data quality

Data quarantine

Data recipient

Data sharing

Data Sharing Agreement (DSA)