Page 67 - 《软件学报》2024年第4期
P. 67

陈奕宇  等:  元强化学习研究综述                                                               1645


         [48]     Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: Proc. of the IEEE Int’l Conf. on Intelligent
             Robots and Systems. 2012. 5026−5033.
         [49]     Lee K, Seo Y, Lee S, et al. Context-aware dynamics model for generalization in model-based reinforcement learning. In: Proc. of
             the Int’l Conf. on Machine Learning. 2020. 5757−5766.
         [50]     Benjamins C, Eimer T, Schubert F, et al. CARL: A benchmark for contextual and adaptive reinforcement learning. arXiv:2110.
             02102, 2021.
                                      2
         [51]     Duan Y, Schulman J, Chen X, et al. RL : Fast reinforcement learning via slow reinforcement learning. arXiv:1611.02779, 2016.
         [52]     Nichol A, Pfau V, Hesse C, et al. Gotta learn fast: A new benchmark for generalization in RL. arXiv:1804.03720, 2018.
         [53]     Cobbe K, Klimov O, Hesse C, et al. Quantifying generalization in reinforcement learning. In: Proc. of the Int’l Conf. on Machine
             Learning. 2019. 1282−1289.
         [54]     Alver S, Precup D. A brief look at generalization in visual meta-reinforcement learning. arXiv:2006.07262, 2020.
         [55]     Cobbe K, Hesse C, Hilton J, et al. Leveraging procedural generation to benchmark reinforcement learning. In: Proc. of the Int’l
             Conf. on Machine Learning. 2020. 2048−2056.
         [56]     Chevalier-Boisvert M, Willems L, Pal S. Minimalistic Gridworld Environment for Gymnasium. 2018.
         [57]     Samvelyan M, Kirk R, Kurin V, et al. MiniHack the planet: A sandbox for open-ended reinforcement learning research. In: Proc. of
             the Neural Information Processing Systems Track on Datasets and Benchmarks. 2021.
         [58]     Lin Z, Li J, Shi J, et al. JueWu-MC: Playing minecraft with sample-efficient hierarchical reinforcement learning. In: Proc. of the
             Int’l Joint Conf. on Artificial Intelligence. 2022. 3257−3263.
         [59]     Yu T, Quillen D, He Z, et al. Meta-World: A benchmark and evaluation for multi-task and meta reinforcement learning. In: Proc. of
             the Conf. on Robot Learning. 2019. 1094−1100.
         [60]     Zintgraf L, Feng L, Lu C, et al. Exploration in approximate hyper-state space for meta reinforcement learning. In: Proc. of the Int’l
             Conf. on Machine Learning. 2021. 12991−13001.
         [61]    Berseth G, Zhang Z, Zhang G, et al. CoMPS: Continual meta policy search. In: Proc. of the Int’l Conf. on Learning Representations.
             2022.
         [62]     Mitchell E, Rafailov R, Peng X Bin, et al. Offline meta-reinforcement learning with advantage weighting. In: Proc. of the Int’l
             Conf. on Machine Learning. 2021. 7780−7791.
         [63]     Wang JX,  King M,  Porcel  N,  et al. Alchemy:  A  benchmark and analysis toolkit  for meta-reinforcement learning agents.
             arXiv:2102.02926, 2021.
         [64]     Antoniou A, Storkey A, Edwards H. How to train your MAML. In: Proc. of the Int’l Conf. on Learning Representations. 2019.
         [65]     Song X, Gao W, Yang Y, et al. ES-MAML: Simple hessian-free meta learning. arXiv:1910.01215, 2019.
         [66]     Rothfuss J, Lee D, Clavera I, et al. ProMP: Proximal meta-policy search. In: Proc. of the Int’l Conf. on Learning Representations.
             2019.
         [67]     Liu H, Socher R, Xiong C. Taming MAML: Efficient unbiased meta-reinforcement learning. In: Proc. of the Int’l Conf. on Machine
             Learning. 2019. 4061−4071.
         [68]    Fallah A, Mokhtari A, Ozdaglar A. On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In: Proc.
             of the Int’l Conf. on Artificial Intelligence and Statistics. 2020. 1082−1092.
         [69]     Khodak M, Balcan MF, Talwalkar A. Provable guarantees for gradient-based meta-learning. In: Proc. of the Int’l Conf. on Machine
             Learning. 2019. 424−433.
         [70]     Molybog I, Lavaei J. When does MAML objective have benign landscape? In: Proc. of the IEEE Conf. on Control Technology and
             Applications. 2021. 220−227.
         [71]     Wang L, Cai Q, Yang Z, et al. On the global optimality of model-agnostic meta-learning. In: Proc. of the Int’l Conf. on Machine
             Learning. 2020. 9837−9846.
         [72]     Fallah A, Georgiev K, Mokhtari A, et al. On the convergence theory of debiased model-agnostic meta-reinforcement learning. In:
             Proc. of the Advances in Neural Information Processing Systems, Vol.34. 2021. 3096−3107.
         [73]    Ji K, Yang J, Liang Y. Theoretical convergence of multi-step model-agnostic meta-learning. Journal of Machine Learning Research,
             2022, 23: 1−41.
         [74]     Wang JX, Kurth-Nelson Z, Tirumala D, et al. Learning to reinforcement learn.   arXiv:1611.05763, 2016.
         [75]     Mishra  N,  Rohaninejad  M, Chen X,  et al. A simple neural attentive meta-learner.  In: Proc. of the  Int’l  Conf.  on Learning
             Representations. 2018.
         [76]     Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proc. of the Advances in Neural Information Processing
             Systems, Vol.30. 2017.
         [77]     Parisotto E. Meta Reinforcement Learning through Memory. Carnegie Mellon University, 2021.
         [78]     Sæmundsson S, Hofmann K, Deisenroth MP. Meta reinforcement learning with latent variable gaussian processes. In: Proc. of the
             Conf. on Uncertainty in Artificial Intelligence. 2018. 642−652.
   62   63   64   65   66   67   68   69   70   71   72