With hundreds of projects funded by the European Commission that run more or less at the same time, activities that used to be rather easy in the past, have become a real nightmare for projects. Some of those activities are: i) understanding what other projects do and capitalize those developments and outcomes for your own work, ii) letting the wider audience know what is the positioning of your project in comparison to other projects working in similar fields -and thus, understanding complementarities, commonalities, differences, value proposition of each of them; so, using i) for the benefit of creating clear messages that help target communities understand the content-, and iii) attracting the attention of your target audience, meaning just being able to capture some minutes of researchers or policy makers in an era characterized by information overload.

In this complex context, the instrument of clusters emerges as a very useful tool not only to become effective in the aforementioned activities, but also to develop them in an efficient way.

Graph-Massivizer understood this premise from the beginning, and instead of setting up many new channels with a very limited reach, we created a strategy that would revolve around the principles of collaboration and networking. This has been materialized by the set up of two clusters that offer complementary opportunities to the project.

The first cluster, extremely critical to the success of Graph-Massivizer, has been labelled as DataNexus. It brings together all the Research and Innovation Actions addressing the topic of Extreme data mining, aggregation and analytics technologies and solutions. These projects aim to provide ground-breaking advances in the performance, speed and/or accuracy as well as usefulness of data discovery, collection, mining, filtering and processing when “extreme data” is involved.

Extreme data is defined here as “data that exhibits one or more of the following characteristics, to an extent that makes current technologies fail: increasing volume, speed, variety; complexity/diversity/multilingualism of data; the dispersed data sources; sparse/missing/insufficient data/extreme variations in values”.

According to IDC, the volume of data created each year is forecast to increase at a CAGR of 24.9% from 2024 to 2029 (faster unstructured data). In the case of Data Integration and Intelligence SW, the revenue at worldwide level is projected to nearly double from $6.4B in 2024 to $12.2B in 2029 (11,8% for EMEA), and interestingly enough, the AI Life-Cycle Software showcases CAGRs above 27% 2024-2029 (EMEA from $3B to $11B). Extreme data will influence all these new solutions, and has a huge potential market, as a great percentage of data falls under the former definition of extreme data. A lot of challenges come with those opportunities, such as complexity and integration, rising costs, regulatory and security concerns, talent and ecosystem gaps and ROI and value extraction, to name a few. Addressing these challenges requires a holistic approach and collaboration between those initiatives that focus on the different “pieces” of the big problems. Graph-Massivizer, for example, targets the whole lifecycle of graph-based data.

DataNexus, as a cluster that connects the different projects working on extreme scale data challenges has been instrumental in creating a knowledge base of technologies and developments, allowing project partners to collaborate in common aspects and understanding complementary views of addressing similar problems. In addition, the entire portfolio enables a more complete picture of the challenges that arise when dealing with the computing continuum, the processing of data in different computing infrastructures or issues associated to diverse vertical sectors. Furthermore, extreme data as a research topic has been highlighted by the strength of the cluster, which is more powerful than that of a single project. The following table summarizes key aspects about the positioning of the different projects and diversity of use cases covered, as well as contributions to the cluster.

DataNexus Cluster Overview

Project Objectives Focus Area Contribution to DataNexus
Graph-Massivizer – Extreme and Sustainable Graph Processing Graph-Massivizer develops methods and tools for extreme and sustainable graph processing to address urgent societal challenges that require extracting insights from complex relational data structures Digital twins for sustainable exascale computing
Green AI for automotive and industrial domains
Foresight modelling for environmental protection
Sustainable and green financial analytics
Graph-Massivizer’s expertise in scalable graph analytics enhances the cluster’s capacity to interpret complex inter-related data at scale, facilitating advanced analytics for use cases where relational structures are central
NEARDATA – Extreme Near-Data Processing Platform NEARDATA aims to build platforms that enable near-data processing, minimising data movement and enhancing responsiveness for extreme data workloads High-performance processing of genomics and metabolic data
Surgical data analysis and real-time insights
Novel architectures to support privacy and performance in sensitive data environments
By pushing computation closer to where data resides, NEARDATA addresses critical performance and privacy challenges inherent in extreme data analytics.
EXA4MIND – EXtreme Analytics for Mining Data Spaces EXA4MIND develops a platform for extreme data analytics, automation, and integration, particularly on HPC and supercomputing infrastructures. Automated data management integrated with European data ecosystems
Advanced analytics tools that support edge-to-HPC workflows
Analytics-as-a-Service (MAaaS) capabilities, e.g., for mobility risk forecasting and traffic flow analytics
EXA4MIND enhances the cluster’s capabilities in bridging high-performance computing with real-world analytics needs, particularly for mobility and large-scale event forecasting
EXTRACT – Distributed Data-Mining Platform EXTRACT focuses on distributed data-mining technologies that scale across heterogeneous infrastructures Personalised evacuation systems
Real-time distributed knowledge extraction
Cross-domain data mining for safety and resilience
EXTRACT brings scalable distributed mining capabilities, enabling the cluster to handle dynamic and geographically dispersed data sources
SYCLOPS – Cross-Architecture AI/Data Acceleration SYCLOPS is committed to democratising AI and data acceleration using open standards and cross-architecture solutions Hardware-agnostic acceleration frameworks
Standard-based AI/data toolchains
Accessibility and inclusivity in high-performance analytics
SYCLOPS strengthens the cluster’s technological foundation by lowering barriers to adopting accelerated computing across diverse hardware environments
EMERALDS – Extreme-scale Urban Mobility Data Analytics EMERALDS develops data-as-a-service and analytics platforms for urban mobility, emphasising scalability and privacy. Intelligent mobility analytics
Event risk assessment and forecasting
Integrated traffic management and flow analytics
Through real-world urban mobility use cases, EMERALDS grounds the cluster’s technologies in practical, impactful deployments that inform smart city development
EFRA – Extreme Food Risk Analytics EFRA targets risk analytics for food safety and supply chain resilience, leveraging extreme data to predict and manage risks Predictive models for food pathogens
Pest and contamination forecasting
Decision-support intelligence for regulatory frameworks
EFRA’s domain-specific analytics enrich the cluster’s multi-sector relevance, particularly for safeguarding food systems using advanced predictive insights.

The DataNexus cluster has produced a lot of materials that provide more elaborated insights of this work. See links below for reference [1].

EUDATA+ Cluster Overview

Project Objectives Focus Area Contribution to EUDATA+
Graph-Massivizer – Extreme and Sustainable Graph Processing Extreme data processing and analytics for complex data structures using massive graphs. Digital twins for sustainable exascale computing
Green AI for automotive and industrial domains
Foresight modelling for environmental protection
Sustainable and green financial analytics
Graph-Massivizer contributes scalable tools for data ingestion and analysis that help transform large, relational datasets into actionable knowledge pipelines — an essential component for data marketplaces and lifecycle orchestration within EUDATA+
PISTIS – Promoting and Incentivising Federated, Trusted, and Fair Sharing and Trading of Interoperable Data Assets Secure platform for sharing, trading, and monetizing proprietary data with technologies such as federated sharing and AI-driven quality assessment Mobility and Urban Planning
Energy
Automotive
PISTIS brings capabilities for trusted data exchange and trading infrastructure, underpinning monetization and governance solutions across cluster activities..
FAME – Federated decentralized trusted dAta Marketplace for Embedded finance Federated, trustworthy data marketplace facilitating monetization and trading of data assets, especially in the embedded finance domain with a strong emphasis on energy efficiency and security. Financial recommendation engine for families
Embedding Finance Services in a Personalized Citizen Wallet
Personalized Collaborative Intelligence for Enhancing EmFi Services
The EU Funds Application Process Made Easy
ESG Scorecard Ranking & Sustainable Portfolio Optimisation
Embedding Climatic Predictions in Property Insurance Products
Assessing the Quality and Monetary Value of Data Assets
FAME strengthens multi-sided data marketplace frameworks that interconnect producers and consumers across sectors, supporting sustainable monetization models
UPCAST – Universal Platform Components for Safe, Fair, Interoperable Data Exchange, Monetization and Trading. Tools and plugins to automate data-sharing agreements across multiple stakeholders, ensuring transparency and ease of use. Digital Marketing data and resources
Biomedical and genomic data sharing
Sharing Public Administration for climate across Thessaloniki cities
Health and fitness data sharing
Cactus marketing data
UPCAST brings practical workflow automation for contractual and technical data sharing, enabling seamless integration of distributed datasets in shared environments
enRichMyData – Empower AI-driven business products and services An open toolbox of scalable components for data enrichment, improving data quality, reusability, and value creation Marketing data Enrichment for smart-bidding optimization
Artificial Intelligence-based Welding Analytics
Service Data Enrichment for Smart Maintenance
European Register of Entities from Known Actions
Innovation Knowledge Graph for understanding Innovation lifecycle
Industrial Data Enrichment for Mineral Processing Optimization
By enhancing the quality and richness of data assets, enRichMyData supports the cluster’s mission to strengthen data value and enhance utility for analytics and monetization pathways.
DATAMITE – Monetization, Interoperability, Trading & Exchange Open-source framework to boost data monetization, interoperability, and exchange for diverse stakeholders including SMEs and public administrations Corporate Multi-Domain Data Exchange with DIH support
Corporate Multi-Site Data Exchange
Offering Data to Service Providers with DataSpaces
Leveraging Electricity Distribution Open Data
Connecting eDWIN to Data Markets
Connecting MISTRAL to the EU AI-ON-Demand Platform
DATAMITE contributes infrastructure and interoperability components for data sharing frameworks and exchange ecosystems, integral to cluster demonstrations and standards engagement
ExtremeXP – Experiment-driven and user-oriented analytics for extremely precise outcomes and decisions Human-centred analytics framework optimising complex data-driven workflows with integration of user preferences and feedback for personalised insights Crisis management
Cybersecurity
Public safety
Mobility
Manufacturing
ExtremeXP adds an experience-driven analytics dimension to cluster outputs, focusing on impactful, trustworthy insights derived from advanced data workflows

The EUDATA+ cluster has produced a lot of materials that provide more elaborated insights of this work. See links below for reference [2].

Conclusion

The set up of the DataNexus and EUDATA+ clusters and activities therein have been instrumental to give visibility to the outcomes generated by Graph-Massivizer to a wide audience. In addition to the increased number of dissemination opportunities (and thus Graph-Massivizer exposure), they have enabled a more clear positioning of our project in a complex ecosystem of projects that work in related fields, allowing us to derive concrete messages to our target audiences and to define a more accurate and finetuned value proposition, both aspects of great importance to foster the adoption of project results, which is one of the ultimate goals of the committed investments.

Author: Nuria de Lama (Consulting Director, IDC)

References

[1] https://www.youtube.com/watch?v=CLBs7Si0MNo;
https://www.youtube.com/watch?v=7kTdhwvELB4&t=139s
https://extract-project.eu/introducing-datanexus/
https://emeralds-horizon.eu/synergies/data-nexus-cluster

[2] https://datamite-horizon.eu/eudata/
Working Groups of the EUDATA+ cluster (link to zenodo)