Skip to main content Skip to main navigation

Publication

Do GPUs Really Need New Tabular File Formats?

Jigao Luo; Qi Chen; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2602.17335, Pages 1-3, arXiv, 2026.

Abstract

Over the past decade, Parquet has become the de facto columnar file format in modern analytics systems. It is widely adopted across analytical databases and query engines such as DuckDB [21], Velox [20], and DataFusion [14], all of which support direct querying of Parquet files. Parquet’s configuration practices – page counts, row group sizes, encoding and compression choices – were shaped by CPU-oriented assumptions about access patterns and I/O behavior [7, 8, 16, 29]. These inherited defaults now govern how files are generated, yet it remains unclear whether those defaults are appropriate for GPU scans.

More links