Page 67 - 《软件学报》2024年第4期
P. 67
陈奕宇 等: 元强化学习研究综述 1645
[48] Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: Proc. of the IEEE Int’l Conf. on Intelligent
Robots and Systems. 2012. 5026−5033.
[49] Lee K, Seo Y, Lee S, et al. Context-aware dynamics model for generalization in model-based reinforcement learning. In: Proc. of
the Int’l Conf. on Machine Learning. 2020. 5757−5766.
[50] Benjamins C, Eimer T, Schubert F, et al. CARL: A benchmark for contextual and adaptive reinforcement learning. arXiv:2110.
02102, 2021.
2
[51] Duan Y, Schulman J, Chen X, et al. RL : Fast reinforcement learning via slow reinforcement learning. arXiv:1611.02779, 2016.
[52] Nichol A, Pfau V, Hesse C, et al. Gotta learn fast: A new benchmark for generalization in RL. arXiv:1804.03720, 2018.
[53] Cobbe K, Klimov O, Hesse C, et al. Quantifying generalization in reinforcement learning. In: Proc. of the Int’l Conf. on Machine
Learning. 2019. 1282−1289.
[54] Alver S, Precup D. A brief look at generalization in visual meta-reinforcement learning. arXiv:2006.07262, 2020.
[55] Cobbe K, Hesse C, Hilton J, et al. Leveraging procedural generation to benchmark reinforcement learning. In: Proc. of the Int’l
Conf. on Machine Learning. 2020. 2048−2056.
[56] Chevalier-Boisvert M, Willems L, Pal S. Minimalistic Gridworld Environment for Gymnasium. 2018.
[57] Samvelyan M, Kirk R, Kurin V, et al. MiniHack the planet: A sandbox for open-ended reinforcement learning research. In: Proc. of
the Neural Information Processing Systems Track on Datasets and Benchmarks. 2021.
[58] Lin Z, Li J, Shi J, et al. JueWu-MC: Playing minecraft with sample-efficient hierarchical reinforcement learning. In: Proc. of the
Int’l Joint Conf. on Artificial Intelligence. 2022. 3257−3263.
[59] Yu T, Quillen D, He Z, et al. Meta-World: A benchmark and evaluation for multi-task and meta reinforcement learning. In: Proc. of
the Conf. on Robot Learning. 2019. 1094−1100.
[60] Zintgraf L, Feng L, Lu C, et al. Exploration in approximate hyper-state space for meta reinforcement learning. In: Proc. of the Int’l
Conf. on Machine Learning. 2021. 12991−13001.
[61] Berseth G, Zhang Z, Zhang G, et al. CoMPS: Continual meta policy search. In: Proc. of the Int’l Conf. on Learning Representations.
2022.
[62] Mitchell E, Rafailov R, Peng X Bin, et al. Offline meta-reinforcement learning with advantage weighting. In: Proc. of the Int’l
Conf. on Machine Learning. 2021. 7780−7791.
[63] Wang JX, King M, Porcel N, et al. Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents.
arXiv:2102.02926, 2021.
[64] Antoniou A, Storkey A, Edwards H. How to train your MAML. In: Proc. of the Int’l Conf. on Learning Representations. 2019.
[65] Song X, Gao W, Yang Y, et al. ES-MAML: Simple hessian-free meta learning. arXiv:1910.01215, 2019.
[66] Rothfuss J, Lee D, Clavera I, et al. ProMP: Proximal meta-policy search. In: Proc. of the Int’l Conf. on Learning Representations.
2019.
[67] Liu H, Socher R, Xiong C. Taming MAML: Efficient unbiased meta-reinforcement learning. In: Proc. of the Int’l Conf. on Machine
Learning. 2019. 4061−4071.
[68] Fallah A, Mokhtari A, Ozdaglar A. On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In: Proc.
of the Int’l Conf. on Artificial Intelligence and Statistics. 2020. 1082−1092.
[69] Khodak M, Balcan MF, Talwalkar A. Provable guarantees for gradient-based meta-learning. In: Proc. of the Int’l Conf. on Machine
Learning. 2019. 424−433.
[70] Molybog I, Lavaei J. When does MAML objective have benign landscape? In: Proc. of the IEEE Conf. on Control Technology and
Applications. 2021. 220−227.
[71] Wang L, Cai Q, Yang Z, et al. On the global optimality of model-agnostic meta-learning. In: Proc. of the Int’l Conf. on Machine
Learning. 2020. 9837−9846.
[72] Fallah A, Georgiev K, Mokhtari A, et al. On the convergence theory of debiased model-agnostic meta-reinforcement learning. In:
Proc. of the Advances in Neural Information Processing Systems, Vol.34. 2021. 3096−3107.
[73] Ji K, Yang J, Liang Y. Theoretical convergence of multi-step model-agnostic meta-learning. Journal of Machine Learning Research,
2022, 23: 1−41.
[74] Wang JX, Kurth-Nelson Z, Tirumala D, et al. Learning to reinforcement learn. arXiv:1611.05763, 2016.
[75] Mishra N, Rohaninejad M, Chen X, et al. A simple neural attentive meta-learner. In: Proc. of the Int’l Conf. on Learning
Representations. 2018.
[76] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proc. of the Advances in Neural Information Processing
Systems, Vol.30. 2017.
[77] Parisotto E. Meta Reinforcement Learning through Memory. Carnegie Mellon University, 2021.
[78] Sæmundsson S, Hofmann K, Deisenroth MP. Meta reinforcement learning with latent variable gaussian processes. In: Proc. of the
Conf. on Uncertainty in Artificial Intelligence. 2018. 642−652.