With hundreds of projects funded by the European Commission that run more or less at the same time, activities that used to be rather easy in the past, have become a real nightmare for projects. Some of those activities are: i) understanding what other projects do and capitalize those developments and outcomes for your own work, ii) letting the wider audience know what is the positioning of your project in comparison to other projects working in similar fields -and thus, understanding complementarities, commonalities, differences, value proposition of each of them; so, using i) for the benefit of creating clear messages that help target communities understand the content-, and iii) attracting the attention of your target audience, meaning just being able to capture some minutes of researchers or policy makers in an era characterized by information overload.
In this complex context, the instrument of clusters emerges as a very useful tool not only to become effective in the aforementioned activities, but also to develop them in an efficient way.
Graph-Massivizer understood this premise from the beginning, and instead of setting up many new channels with a very limited reach, we created a strategy that would revolve around the principles of collaboration and networking. This has been materialized by the set up of two clusters that offer complementary opportunities to the project.
The first cluster, extremely critical to the success of Graph-Massivizer, has been labelled as DataNexus. It brings together all the Research and Innovation Actions addressing the topic of Extreme data mining, aggregation and analytics technologies and solutions. These projects aim to provide ground-breaking advances in the performance, speed and/or accuracy as well as usefulness of data discovery, collection, mining, filtering and processing when “extreme data” is involved.
Extreme data is defined here as “data that exhibits one or more of the following characteristics, to an extent that makes current technologies fail: increasing volume, speed, variety; complexity/diversity/multilingualism of data; the dispersed data sources; sparse/missing/insufficient data/extreme variations in values”.
According to IDC, the volume of data created each year is forecast to increase at a CAGR of 24.9% from 2024 to 2029 (faster unstructured data). In the case of Data Integration and Intelligence SW, the revenue at worldwide level is projected to nearly double from $6.4B in 2024 to $12.2B in 2029 (11,8% for EMEA), and interestingly enough, the AI Life-Cycle Software showcases CAGRs above 27% 2024-2029 (EMEA from $3B to $11B). Extreme data will influence all these new solutions, and has a huge potential market, as a great percentage of data falls under the former definition of extreme data. A lot of challenges come with those opportunities, such as complexity and integration, rising costs, regulatory and security concerns, talent and ecosystem gaps and ROI and value extraction, to name a few. Addressing these challenges requires a holistic approach and collaboration between those initiatives that focus on the different “pieces” of the big problems. Graph-Massivizer, for example, targets the whole lifecycle of graph-based data.
DataNexus, as a cluster that connects the different projects working on extreme scale data challenges has been instrumental in creating a knowledge base of technologies and developments, allowing project partners to collaborate in common aspects and understanding complementary views of addressing similar problems. In addition, the entire portfolio enables a more complete picture of the challenges that arise when dealing with the computing continuum, the processing of data in different computing infrastructures or issues associated to diverse vertical sectors. Furthermore, extreme data as a research topic has been highlighted by the strength of the cluster, which is more powerful than that of a single project. The following table summarizes key aspects about the positioning of the different projects and diversity of use cases covered, as well as contributions to the cluster.
DataNexus Cluster Overview
| Project | Objectives | Focus Area | Contribution to DataNexus |
|---|---|---|---|
| Graph-Massivizer – Extreme and Sustainable Graph Processing | Graph-Massivizer develops methods and tools for extreme and sustainable graph processing to address urgent societal challenges that require extracting insights from complex relational data structures | Digital twins for sustainable exascale computing Green AI for automotive and industrial domains Foresight modelling for environmental protection Sustainable and green financial analytics |
Graph-Massivizer’s expertise in scalable graph analytics enhances the cluster’s capacity to interpret complex inter-related data at scale, facilitating advanced analytics for use cases where relational structures are central |
| NEARDATA – Extreme Near-Data Processing Platform | NEARDATA aims to build platforms that enable near-data processing, minimising data movement and enhancing responsiveness for extreme data workloads | High-performance processing of genomics and metabolic data Surgical data analysis and real-time insights Novel architectures to support privacy and performance in sensitive data environments |
By pushing computation closer to where data resides, NEARDATA addresses critical performance and privacy challenges inherent in extreme data analytics. |
| EXA4MIND – EXtreme Analytics for Mining Data Spaces | EXA4MIND develops a platform for extreme data analytics, automation, and integration, particularly on HPC and supercomputing infrastructures. | Automated data management integrated with European data ecosystems Advanced analytics tools that support edge-to-HPC workflows Analytics-as-a-Service (MAaaS) capabilities, e.g., for mobility risk forecasting and traffic flow analytics |
EXA4MIND enhances the cluster’s capabilities in bridging high-performance computing with real-world analytics needs, particularly for mobility and large-scale event forecasting |
| EXTRACT – Distributed Data-Mining Platform | EXTRACT focuses on distributed data-mining technologies that scale across heterogeneous infrastructures | Personalised evacuation systems Real-time distributed knowledge extraction Cross-domain data mining for safety and resilience |
EXTRACT brings scalable distributed mining capabilities, enabling the cluster to handle dynamic and geographically dispersed data sources |
| SYCLOPS – Cross-Architecture AI/Data Acceleration | SYCLOPS is committed to democratising AI and data acceleration using open standards and cross-architecture solutions | Hardware-agnostic acceleration frameworks Standard-based AI/data toolchains Accessibility and inclusivity in high-performance analytics |
SYCLOPS strengthens the cluster’s technological foundation by lowering barriers to adopting accelerated computing across diverse hardware environments |
| EMERALDS – Extreme-scale Urban Mobility Data Analytics | EMERALDS develops data-as-a-service and analytics platforms for urban mobility, emphasising scalability and privacy. | Intelligent mobility analytics Event risk assessment and forecasting Integrated traffic management and flow analytics |
Through real-world urban mobility use cases, EMERALDS grounds the cluster’s technologies in practical, impactful deployments that inform smart city development |
| EFRA – Extreme Food Risk Analytics | EFRA targets risk analytics for food safety and supply chain resilience, leveraging extreme data to predict and manage risks | Predictive models for food pathogens Pest and contamination forecasting Decision-support intelligence for regulatory frameworks |
EFRA’s domain-specific analytics enrich the cluster’s multi-sector relevance, particularly for safeguarding food systems using advanced predictive insights. |
The DataNexus cluster has produced a lot of materials that provide more elaborated insights of this work. See links below for reference [1].
EUDATA+ Cluster Overview
| Project | Objectives | Focus Area | Contribution to EUDATA+ |
|---|---|---|---|
| Graph-Massivizer – Extreme and Sustainable Graph Processing | Extreme data processing and analytics for complex data structures using massive graphs. | Digital twins for sustainable exascale computing Green AI for automotive and industrial domains Foresight modelling for environmental protection Sustainable and green financial analytics |
Graph-Massivizer contributes scalable tools for data ingestion and analysis that help transform large, relational datasets into actionable knowledge pipelines — an essential component for data marketplaces and lifecycle orchestration within EUDATA+ |
| PISTIS – Promoting and Incentivising Federated, Trusted, and Fair Sharing and Trading of Interoperable Data Assets | Secure platform for sharing, trading, and monetizing proprietary data with technologies such as federated sharing and AI-driven quality assessment | Mobility and Urban Planning Energy Automotive |
PISTIS brings capabilities for trusted data exchange and trading infrastructure, underpinning monetization and governance solutions across cluster activities.. |
| FAME – Federated decentralized trusted dAta Marketplace for Embedded finance | Federated, trustworthy data marketplace facilitating monetization and trading of data assets, especially in the embedded finance domain with a strong emphasis on energy efficiency and security. | Financial recommendation engine for families Embedding Finance Services in a Personalized Citizen Wallet Personalized Collaborative Intelligence for Enhancing EmFi Services The EU Funds Application Process Made Easy ESG Scorecard Ranking & Sustainable Portfolio Optimisation Embedding Climatic Predictions in Property Insurance Products Assessing the Quality and Monetary Value of Data Assets |
FAME strengthens multi-sided data marketplace frameworks that interconnect producers and consumers across sectors, supporting sustainable monetization models |
| UPCAST – Universal Platform Components for Safe, Fair, Interoperable Data Exchange, Monetization and Trading. | Tools and plugins to automate data-sharing agreements across multiple stakeholders, ensuring transparency and ease of use. | Digital Marketing data and resources Biomedical and genomic data sharing Sharing Public Administration for climate across Thessaloniki cities Health and fitness data sharing Cactus marketing data |
UPCAST brings practical workflow automation for contractual and technical data sharing, enabling seamless integration of distributed datasets in shared environments |
| enRichMyData – Empower AI-driven business products and services | An open toolbox of scalable components for data enrichment, improving data quality, reusability, and value creation | Marketing data Enrichment for smart-bidding optimization Artificial Intelligence-based Welding Analytics Service Data Enrichment for Smart Maintenance European Register of Entities from Known Actions Innovation Knowledge Graph for understanding Innovation lifecycle Industrial Data Enrichment for Mineral Processing Optimization |
By enhancing the quality and richness of data assets, enRichMyData supports the cluster’s mission to strengthen data value and enhance utility for analytics and monetization pathways. |
| DATAMITE – Monetization, Interoperability, Trading & Exchange | Open-source framework to boost data monetization, interoperability, and exchange for diverse stakeholders including SMEs and public administrations | Corporate Multi-Domain Data Exchange with DIH support Corporate Multi-Site Data Exchange Offering Data to Service Providers with DataSpaces Leveraging Electricity Distribution Open Data Connecting eDWIN to Data Markets Connecting MISTRAL to the EU AI-ON-Demand Platform |
DATAMITE contributes infrastructure and interoperability components for data sharing frameworks and exchange ecosystems, integral to cluster demonstrations and standards engagement |
| ExtremeXP – Experiment-driven and user-oriented analytics for extremely precise outcomes and decisions | Human-centred analytics framework optimising complex data-driven workflows with integration of user preferences and feedback for personalised insights | Crisis management Cybersecurity Public safety Mobility Manufacturing |
ExtremeXP adds an experience-driven analytics dimension to cluster outputs, focusing on impactful, trustworthy insights derived from advanced data workflows |
The EUDATA+ cluster has produced a lot of materials that provide more elaborated insights of this work. See links below for reference [2].
Conclusion
The set up of the DataNexus and EUDATA+ clusters and activities therein have been instrumental to give visibility to the outcomes generated by Graph-Massivizer to a wide audience. In addition to the increased number of dissemination opportunities (and thus Graph-Massivizer exposure), they have enabled a more clear positioning of our project in a complex ecosystem of projects that work in related fields, allowing us to derive concrete messages to our target audiences and to define a more accurate and finetuned value proposition, both aspects of great importance to foster the adoption of project results, which is one of the ultimate goals of the committed investments.
Author: Nuria de Lama (Consulting Director, IDC)
References
[1] https://www.youtube.com/watch?v=CLBs7Si0MNo;
https://www.youtube.com/watch?v=7kTdhwvELB4&t=139s
https://extract-project.eu/introducing-datanexus/
https://emeralds-horizon.eu/synergies/data-nexus-cluster
[2] https://datamite-horizon.eu/eudata/
Working Groups of the EUDATA+ cluster (link to zenodo)