Explainable RL: Transparent Decision-Making in Complex Environments
Main Article Content
Abstract
Reinforcement learning (RL) has achieved remarkable progress in complex environments, underpinning breakthroughs across robotics, gaming, finance, and autonomous systems. Nonetheless, the “black-box” nature of modern RL policies particularly those based on deep learning has hindered their adoption in safety-critical, regulated, or ethically-sensitive domains due to a lack of transparency. Explainable RL (XRL) seeks to address this gap by generating human-interpretable rationales for agent actions and policy decisions. This paper presents a comprehensive review and new methodology for explainable RL. It critically examines diverse XRL methods, including model-agnostic post-hoc explainers, intrinsically interpretable architectures, reward decomposition, saliency mapping, and human-in-the-loop frameworks. Their novel system, XRL-Transp, integrates attention-based attribution and state-level policy summarization for transparent sequential decision-making. Empirical experiments are conducted on the OpenAI Gym CartPole and MinAtar Breakout benchmarks, with results demonstrating competitive performance and high user-rated interpretability. It discusses open challenges, evaluation protocols, and societal impacts, offering actionable recommendations for practical deployment and future work.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.