E2Data

Project

European Extreme Performing Big Data Stacks

Duration:
01/01/2018 - 12/31/2020

Research Topics
Data Management & Analysis Other

Application fields
Knowledge & Business Intelligence Other

In today´world, data is streamed from the local network or edge devices to a cloud provider which is rented by a customer to perform the data execution. The Big Data software stack, in an application and hardware agnostic manner, splits the execution stream into multiple tasks and send them for processing on the nodes the customer has paid for. If the outcome does not match the strict three second business requirement, then the customer has two options: 1) scale-up (by upgrading processors at node level) 2) scale-out (by adding nodes to their clusters), or 3) manually implement code optimizations specific to the underlying hardware. However, the customer does not have the financial capability to achieve that. Ideally, they would like to achieve their business requirements without stretching their hardware budget. In order to address the alarming scalability concerns, both end-users as well as cloud infrastructure vendors (such as Google, Microsoft, Amazon, and Alibaba) are investing in heterogeneous hardware resources able to utilize a diverse selection of architectures such as CPUs, GPUs, FPGAs, and MICs aiming to further increase performance while minimizing the climbing operational costs. Furthermore, despite current investments in heterogeneous resources, large companies such as Google develop in-house ASICs with TensorFlow being the prime example.

E2Data proposes an end-to-end solution for Big Data deployments that will fully exploit and advance the state-of-the-art in infrastructure services by delivering a performance increase of up to 10x while utilizing up to 50% less cloud resources. E2Data will provide a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e., no code changes in the original source). The proposed solution takes a cross-layer approach by allowing vertical communication between the four key layers of Big Data deployments (application, Big Data software, scheduler/cloud provider, and execution run time).

Partners

The University of Manchester, Institute of Communications and Computer Systems, Neurocom Luxembourg, KALEAO Limited, Computer Technology Institute and Press "Diophantus" (CTI), Spark Works Limited, iProov Limited

Keyfacts

Involved research areas
Intelligent Analytics for Massive Data
Website
https://e2data.eu/

Publications about the project

Christos Kotselidis; Sotiris Diamantopoulos; Orestis Akrivopoulos; Viktor Rosenfeld; Katerina DOka; Hazeef Mohammed; Georgios Mylonas; Vassilis Spitadakis; Will Morgan; Juan Fumero; Foivos S. Zakkak; Michail Papadimitriou; Maria Xekalaki; Nikos Foutris; Athanasios Stratikopoulos; Nectarios Koziris; Ioannis Konstantinou; Ioannis Mytilinis; Constatinos Bitsakos; Christos Tsalidis; Christos Tselios; Nikolaos Kanakis; Clemens Lutz; Sebastian Breß; Volker Markl

In: Design, Automation & Test in Europe. Design, Automation & Test in Europe (DATE-2020), March 9-13, Grenoble, France, IEEE, 2020.

Clemens Lutz; Sebastian Breß; Steffen Zeuch; Tilmann Rabl; Volker Markl

In: David Maier; Rachel Pottinger (Hrsg.). Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM SIGMOD International Conference on Management of Data (SIGMOD-2020), June 14-19, Portland, OR, USA, Pages 1633-1649, ISBN 978-1-4503-6735-6, The Association for Computing Machinery, 2020.

Steffen Zeuch; Bonaventura Del Monte; Jeyhun Karimov; Clemens Lutz; Manuel Renz; Jonas Traub; Sebastian Breß; Tilmann Rabl; Volker Markl

In: Proceedings of the VLDB Endowment (PVLDB), Vol. 12, No. 5, Pages 516-530, VLDB Endowment, 2019.

All publications

European Extreme Performing Big Data Stacks

Partners

Sponsors

EU - European Union

780245

Dr. Steffen Zeuch

Keyfacts

Efficient Compilation and Execution of JVM-Based Data Processing Frameworks on Heterogeneous Co-Processors

Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects

Analyzing Efficient Stream Processing on Modern Hardware