Publikation
Rethinking progression of memory state in robotic manipulation: An object-centric perspective
Nhat Chung; Taisei Hanyu; Toan Nguyen; Huy Le; Frederick Bumgarner; Ho Minh Duy Nguyen; Khoa Vo; Kashu Yamazaki; Chase Rainwater; Tung Kieu; Anh Nguyen; Ngan Le
In: Proceedings of AAAI 2026. AAAI Conference on Artificial Intelligence (AAAI-2026), AAAI, 2026.
Zusammenfassung
As embodied agents operate in increasingly complex environments,
the ability to perceive, track, and reason about individual
object instances over time becomes essential, especially
in tasks requiring sequenced interactions with visually similar
objects. In these non-Markovian settings, key decision cues
are often hidden in object-specific histories rather than the
current scene. Without persistent memory of prior interactions
(what has been interacted with, where it has been, or how
it has changed) visuomotor policies may fail, repeat past actions,
or overlook completed ones. To surface this challenge,
we introduce LIBERO-Mem, a non-Markovian task suite for
stress-testing robotic manipulation under object-level partial
observability. It combines short- and long-horizon object tracking
with temporally sequenced subgoals, requiring reasoning
beyond the current frame. However, vision-language-action
(VLA) models often struggle in such settings, with token scaling
quickly becoming intractable even for tasks spanning just
a few hundred frames.We propose Embodied-SlotSSM, a slotcentric
VLA framework built for temporal scalability. It maintains
spatio-temporally consistent slot identities and leverages
them through two mechanisms: (1) slot-state-space modeling
for reconstructing short-term history, and (2) a relational encoder
to align the input tokens with action decoding. Together,
these components enable temporally grounded, context-aware
action prediction. Experiments show Embodied-SlotSSM’s
baseline performance on LIBERO-Mem and general tasks,
offering a scalable solution for non-Markovian reasoning in
object-centric robotic policies.
