Publikation
Demonstrating PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing
Pratyush Agnihotri; Carsten Binnig
In: Volker Markl; Joseph M. Hellerstein; Azza Abouzied (Hrsg.). Companion of the 2025 International Conference on Management of Data, SIGMOD/PODS 2025, Berlin, Germany, June 22-27, 2025. ACM SIGMOD International Conference on Management of Data (SIGMOD), Pages 7-10, ACM, 2025.
Zusammenfassung
The paper introduces PDSP-Bench, a novel benchmarking
system designed for a systematic understanding of performance of paral-
lel stream processing in a distributed environment. Such an understand-
ing is essential for determining how Stream Processing Systems (SPS) use
operator parallelism and the available resources to process massive work-
loads of modern applications. Existing benchmarking systems focus on
analyzing SPS using queries with sequential operator pipelines within a
homogeneous centralized environment. Quite differently, PDSP-Bench
emphasizes the aspects of parallel stream processing in a distributed het-
erogeneous environment and simultaneously allows the integration of ma-
chine learning models for SPS workloads. In our results, we benchmark
a well-known SPS, Apache Flink, using parallel query structures derived
from real-world applications and synthetic queries to show the capa-
bilities of PDSP-Bench towards parallel stream processing. Moreover,
we compare different learned cost models using generated SPS work-
loads on PDSP-Bench by showcasing their evaluations on model and
training efficiency. We present key observations from our experiments us-
ing PDSP-Bench that highlight interesting trends given different query
workloads, such as non-linearity and paradoxical effects of parallelism on
the performance.
