Publikation
AlphaZe(_ast)(_ast): AlphaZero-like baselines for imperfect information games are surprisingly strong
Jannis Blüml; Johannes Czech; Kristian Kersting
In: Frontiers in Artificial Intelligence, Vol. 6, Pages 1-18, Frontiers, 2023.
Zusammenfassung
In recent years, deep neural networks for strategy games have made significant
progress. AlphaZero-like frameworks which combine Monte-Carlo tree search
with reinforcement learning have been successfully applied to numerous games
with perfect information. However, they have not been developed for domains
where uncertainty and unknowns abound, and are therefore often considered
unsuitable due to imperfect observations. Here, we challenge this view and argue
that they are a viable alternative for games with imperfect information—a domain
currently dominated by heuristic approaches or methods explicitly designed for
hidden information, such as oracle-based techniques. To this end, we introduce a
novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which
is an AlphaZero-based framework for games with imperfect information. We
examine its learning convergence on the games Stratego and DarkHex and show
that it is a surprisingly strong baseline, while using a model-based approach: it
achieves similar win rates against other Stratego bots like Pipeline Policy Space
Response Oracle (P2SRO), while not winning in direct comparison against P2SRO
or reaching the much stronger numbers of DeepNash. Compared to heuristics
and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g.,
when more information than usual is given, and drastically outperforms other
approaches in this respect.
