A Minecraft Agent Based on a Hierarchical Deep Reinforcement Learning Model

Main Article Content

Arjun Panwar

Abstract

Open-world games such as Minecraft pose significant challenges for reinforcement learning (RL) systems due to their long-horizon objectives, sparse rewards, and requirement for compositional skill learning. This study investigates how a Hierarchical Deep Reinforcement Learning (HDRL) approach can improve agent performance and sample efficiency in such complex environments. We develop a hierarchical agent composed of three interconnected levels: (i) a high-level planner responsible for decomposing tasks into subtasks using the options framework for temporal abstraction, (ii) mid-level controllers that manage reusable subtasks such as resource gathering, crafting, and smelting, and (iii) a low-level visuomotor policy that interacts with the environment through human-like keyboard and mouse inputs. The agent’s learning pipeline integrates pretraining from human demonstration datasets (MineRL) and large-scale video pretraining (VPT) to establish behavioural priors before reinforcement learning fine-tuning. This design leverages modern hierarchical algorithms such as Option-Critic, FeUdal Networks (FuN), HIRO, and Hierarchical Actor-Critic (HAC), enabling the agent to operate across multiple temporal scales. Evaluation is conducted using Obtain Diamond-style benchmarks and BASALT “reward-free” tasks to measure generalization and human alignment. Ablation studies assess the effect of each hierarchical layer, the inclusion of demonstrations, and large-scale video-based priors on overall performance. Results indicate that HDRL substantially enhances task completion rates and sample efficiency compared to monolithic RL agents, particularly in longhorizon and reward-sparse scenarios. This research was conducted to address the limitations of existing RL systems in complex, open-ended worlds and to explore how hierarchical structures can bridge the gap between low-level control and highlevel planning. The findings demonstrate that hierarchical reinforcement learning provides a scalable and interpretable framework for developing agents capable of long-term reasoning and adaptive skill composition. The proposed model advances the state of the art in game-based AI, offering insights applicable to both Minecraft research and broader domains involving openended task learning and autonomous decision-making.

Downloads

Download data is not yet available.

Article Details

Section

Articles

How to Cite

[1]
Arjun Panwar , Tran., “A Minecraft Agent Based on a Hierarchical Deep Reinforcement Learning Model”, IJITEE, vol. 14, no. 11, pp. 8–12, Oct. 2025, doi: 10.35940/ijitee.K1154.14111025.
Share |

References

Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The Malmo Platform for Artificial Intelligence Experimentation. IJCAI. DOI: https://doi.org/10.5555/3061053.3061259. (ACM Digital Library)

Guss, W. H., et al. (2019). The MineRL 2019 Competition on Sample-Efficient Reinforcement Learning Using Human Priors. arXiv.

DOI: https://doi.org/10.48550/arXiv.1904.10079. (arXiv)

Guss, W. H., et al. (2019). MineRL: A Large-Scale Dataset of Minecraft Demonstrations. IJCAI-19.

DOI: https://doi.org/10.24963/ijcai.2019/339. (ResearchGate)

Guss, W. H., et al. (2021). The MineRL 2020 Competition on Sample-Efficient Reinforcement Learning Using Human Priors. arXiv.

DOI: https://doi.org/10.48550/arXiv.2101.11071. (ADS)

Kanervisto, A., et al. (2022). MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned. arXiv.

DOI: https://doi.org/10.48550/arXiv.2202.10583. (Proceedings of Machine Learning Research)

Shah, R., et al. (2021). The MineRL BASALT Competition on Learning from Human Feedback. arXiv.

DOI: https://doi.org/10.48550/arXiv.2107.01969. (arXiv)

Shah, R., et al. (2022). Retrospective on the 2021 BASALT Competition. arXiv. DOI: https://doi.org/10.48550/arXiv.2204.07123. (arXiv)

9Fan, L., et al. (2022). MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. arXiv.

DOI: https://doi.org/10.48550/arXiv.2206.08853. (arXiv)

Fan, L., et al. (2022). MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. NeurIPS Datasets & Benchmarks. DOI: https://doi.org/10.5555/3600270.3601603. (NeurIPS Proceedings)

Baker, B., et al. (2022). Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos. arXiv.

DOI: https://doi.org/10.48550/arXiv.2206.11795. (arXiv)

Jucys, K., et al. (2024). Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent. arXiv.

DOI: https://doi.org/10.48550/arXiv.2407.12161. (arXiv)

Bacon, P.-L., Harb, J., & Precup, D. (2017). The Option-Critic Architecture. AAAI.

DOI: https://doi.org/10.1609/aaai.v31i1.10916. (AAAI Open Access Articles)

Vezhnevets, A. S., et al. (2017). Feudal Networks for Hierarchical Reinforcement Learning. PMLR 70. arXiv

DOI: https://doi.org/10.48550/arXiv.1703.01161. (Proceedings of Machine Learning Research)

Nachum, O., Gu, S., Lee, H., & Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning (HIRO). NeurIPS. arXiv

DOI: https://doi.org/10.48550/arXiv.1805.08296. (arXiv)

Levy, A., Konidaris, G., Platt, R., & Saenko, K. (2018). Hierarchical Actor-Critic (HAC). arXiv.

DOI: https://doi.org/10.48550/arXiv.1712.00948. (arXiv)

Röder, F., et al. (2020). Curious Hierarchical Actor-Critic Reinforcement Learning. arXiv. DOI: https://doi.org/10.48550/arXiv.2005.03420 (arXiv)

(Interpretability Risk Example) Jucys, K., et al. (2024). Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent. arXiv.

DOI: https://doi.org/10.48550/arXiv.2407.12161. (cited again for misgeneralization findings). (arXiv)

Milani, S., et al. (2023). BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset. OpenReview dataset paper. (Includes Zenodo DOI for evaluation dataset: 10.5281/zenodo 8021960). https://openreview.net/forum?id=D1MOK2t2t2&noteId=4NBenmWacu (OpenReview)

Wang, G., et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv.

DOI: https://doi.org/10.48550/arXiv.2305.16291. (arXiv)

Watanabe, K., et al. (2022). SHIRO: Soft Hierarchical Reinforcement Learning with Off-Policy Correction. arXiv.

DOI: https://doi.org/10.48550/arXiv.2212.12786. (arXiv)

Chunduru, R., et al. (2022). Attention Option-Critic. arXiv. DOI: https://doi.org/10.48550/arXiv.2201.02628. (arXiv)

Scheller, C., & Milani, S., et al. (2020). Sample-Efficient RL through Learning from Demonstrations (MineRL Competition Report). PMLR 123. (Describes 8M-step budget.) (Proceedings of Machine Learning Research). https://proceedings.mlr.press/v123/scheller20a/scheller20a.pdf

Milani, S., et al. (2023). The MineRL BASALT Evaluation and Demonstrations Dataset. NeurIPS D&B. (Task details including ObtainDiamondShovel.) (NeurIPS Proceedings). DOI: https://doi.org/10.48550/arXiv.2312.02405

Most read articles by the same author(s)

<< < 4 5 6 7 8 9 10 11 12 13 > >>