Publikation
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran; Hikaru Shindo; Patrick Schramowski; Simone Schaub-Meyer; Kristian Kersting; Stefan Schroth
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2507.23543, Pages 1-17, Computing Research Repository, 2025.
Zusammenfassung
Visual relation detection (VRD) is the task of identifying
the relationships between objects in a scene. VRD models
trained solely on relation detection data struggle to gener-
alize beyond the relations on which they are trained. While
prompt tuning has been used to adapt vision-language mod-
els (VLMs) for VRD, it uses handcrafted prompts and strug-
gles with novel or complex relations. We argue that instruc-
tion tuning offers a more effective solution by fine-tuning
VLMs on diverse instructional data. We thus introduce ART,
an Adaptive Relation Tuning framework that adapts VLMs
for VRD through instruction tuning and strategic instance
selection. By converting VRD datasets into an instruction-
tuning format and employing an adaptive sampling algo-
rithm, ART directs the VLM to focus on informative rela-
tions while maintaining generalizability. Specifically, we
focus on the relation classification, where subject-object
boxes are given and the model predicts the predicate be-
tween them. We tune on a held-in set and evaluate across
multiple held-out datasets of varying complexity. Our ap-
proach strongly improves over its baselines and can infer
unseen relation concepts, a capability absent in mainstream
VRD methods. We demonstrate ART’s practical value by us-
ing the predicted relations for segmenting complex scenes.
