Publication
Do GPUs Really Need New Tabular File Formats?
Jigao Luo; Qi Chen; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2602.17335, Pages 1-3, arXiv, 2026.
Abstract
Over the past decade, Parquet has become
the de facto columnar file format in modern analytics systems. It
is widely adopted across analytical databases and query engines
such as DuckDB [21], Velox [20], and DataFusion [14], all of which
support direct querying of Parquet files. Parquet’s configuration
practices – page counts, row group sizes, encoding and compression
choices – were shaped by CPU-oriented assumptions about access
patterns and I/O behavior [7, 8, 16, 29]. These inherited defaults
now govern how files are generated, yet it remains unclear whether
those defaults are appropriate for GPU scans.
