Skip to main content Skip to main navigation

Publikation

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

Pratyush Agnihotri; Boris Koldehofe; Roman Heinrich; Carsten Binnig; Manisha Luthra
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2504.10704, Pages 1-22, arXiv, 2025.

Zusammenfassung

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of paral- lel stream processing in a distributed environment. Such an understand- ing is essential for determining how Stream Processing Systems (SPS) use operator parallelism and the available resources to process massive work- loads of modern applications. Existing benchmarking systems focus on analyzing SPS using queries with sequential operator pipelines within a homogeneous centralized environment. Quite differently, PDSP-Bench emphasizes the aspects of parallel stream processing in a distributed het- erogeneous environment and simultaneously allows the integration of ma- chine learning models for SPS workloads. In our results, we benchmark a well-known SPS, Apache Flink, using parallel query structures derived from real-world applications and synthetic queries to show the capa- bilities of PDSP-Bench towards parallel stream processing. Moreover, we compare different learned cost models using generated SPS work- loads on PDSP-Bench by showcasing their evaluations on model and training efficiency. We present key observations from our experiments us- ing PDSP-Bench that highlight interesting trends given different query workloads, such as non-linearity and paradoxical effects of parallelism on the performance.

Weitere Links