Publikation
PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing
Pratyush Agnihotri; Boris Koldehofe; Roman Heinrich; Carsten Binnig; Manisha Luthra
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2504.10704, Pages 1-22, arXiv, 2025.
Zusammenfassung
The paper introduces PDSP-Bench, a novel benchmarking
system designed for a systematic understanding of performance of paral-
lel stream processing in a distributed environment. Such an understand-
ing is essential for determining how Stream Processing Systems (SPS) use
operator parallelism and the available resources to process massive work-
loads of modern applications. Existing benchmarking systems focus on
analyzing SPS using queries with sequential operator pipelines within a
homogeneous centralized environment. Quite differently, PDSP-Bench
emphasizes the aspects of parallel stream processing in a distributed het-
erogeneous environment and simultaneously allows the integration of ma-
chine learning models for SPS workloads. In our results, we benchmark
a well-known SPS, Apache Flink, using parallel query structures derived
from real-world applications and synthetic queries to show the capa-
bilities of PDSP-Bench towards parallel stream processing. Moreover,
we compare different learned cost models using generated SPS work-
loads on PDSP-Bench by showcasing their evaluations on model and
training efficiency. We present key observations from our experiments us-
ing PDSP-Bench that highlight interesting trends given different query
workloads, such as non-linearity and paradoxical effects of parallelism on
the performance.
