Publication
The Stretto Execution Engine for LLM-Augmented Data Systems
Gabriele Sanmartino; Matthias Urban; Paolo Papotti; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2602.04430, Pages 1-13, arXiv, 2026.
Abstract
LLM-augmented data systems enable semantic querying over struc-
tured and unstructured data, but executing queries with LLM-
powered operators introduces a fundamental runtime–accuracy
trade-off. In this paper, we present Stretto, a new execution en-
gine that provides end-to-end query guarantees while efficiently
navigating this trade-off in a holistic manner. For this, Stretto
formulates query planning as a constrained optimization problem
and uses a gradient-based optimizer to jointly select operator imple-
mentations and allocate error budgets across pipelines. Moreover,
to enable fine-grained execution choices, Stretto introduces a
novel idea on how KV-caching can be used to realize a spectrum of
different physical operators that transform a sparse design space
into a dense continuum of runtime–accuracy trade-offs. Experi-
ments show that Stretto outperforms state-of-the-art systems
while consistently meeting quality guarantees.
