Page 69 - 《软件学报》2024年第4期
P. 69

陈奕宇  等:  元强化学习研究综述                                                               1647


        [109]     Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proc. of the Int’l Conf. on Machine Learning. 2014.
             387−395.
        [110]     Houthooft R, Chen RY, Isola P, et al. Evolved policy gradients. In: Advances in Neural Information Processing Systems, Vol.31.
             2018. 5400−5409.
        [111]     Kirsch L, Van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In:
             Proc. of the Int’l Conf. on Learning Representations. 2020.
        [112]     Xu Z, Van Hasselt H, Hessel M, et al. Meta-gradient reinforcement learning with an objective discovered online. In: Advances in
             Neural Information Processing Systems, Vol.33. 2020. 15254−15264.
        [113]     Zhou W, Li Y, Yang Y, et al. Online meta-critic learning for off-policy actor-critic methods. In: Advances in Neural Information
             Processing Systems, Vol.33. 2020. 17662−17673.
        [114]     Oh J,  Hessel  M, Czarnecki WM,  et al. Discovering reinforcement learning algorithms.  In:  Advances in Neural  Information
             Processing Systems, Vol.33. 2020. 1060−1070.
        [115]     Veeriah V, Hessel M, Xu Z, et al. Discovery of useful questions as auxiliary tasks. In: Advances in Neural Information Processing
             Systems, Vol.32. 2019.
        [116]     Zheng Z, Oh J, Singh S. On learning intrinsic rewards for policy gradient methods. In: Advances in Neural Information Processing
             Systems, Vol.31. 2018. 4644−4654.
        [117]     Yang Y, Caluwaerts K, Iscen A, et al. NoRML: No-reward meta learning. In: Proc. of the Int’l Joint Conf. on Autonomous Agents
             and MultiAgent Systems, Vol.1. 2019. 323−331.
        [118]     Xu K, Ratner E, Dragan A, et al. Learning a prior over intent via meta-inverse reinforcement learning. In: Proc. of the Int’l Conf.
             on Machine Learning. 2019. 6952−6962.
        [119]     Yu  L, Yu T,  Finn C,  et al. Meta-inverse  reinforcement learning with probabilistic  context variables.  In:  Advances in Neural
             Information Processing Systems, Vol.32. 2019. 1−15.
        [120]     Ghasemipour SKS, Gu S, Zemel R. SMILe: Scalable meta inverse reinforcement learning through context-conditional policies. In:
             Advances in Neural Information Processing Systems, Vol.32. 2019. 1−11.
        [121]     Pong VH, Nair A, Smith L, et al. Offline meta-reinforcement learning with online self-supervision. In: Proc. of the Int’l Conf. on
             Machine Learning. 2022. 17811−17829.
        [122]     Luo FM, Xu T, Lai H, et al. A survey on model-based reinforcement learning. arXiv:2206.09328, 2022.
        [123]     Clavera I, Rothfuss J, Schulman J, et al. Model-based reinforcement learning via meta-policy optimization. In: Proc. of the Conf.
             on Robot Learning. 2018. 617−629.
        [124]     Mendonca  R,  Geng X, Finn C,  et al. Meta-reinforcement  learning robust to distributional shift via model identification and
             experience relabeling. arXiv:2006.07178, 2020.
        [125]     Wang Q, Van Hoof H. Model-based meta reinforcement learning using graph structured surrogate models and amortized policy
             search. In: Proc. of the Int’l Conf. on Machine Learning. 2022. 23055−23077.
        [126]     Xu Z, Van Hasselt H, Silver D. Meta-gradient reinforcement learning. In: Advances in Neural Information Processing Systems,
             Vol.31. 2018. 2396−2407.
        [127]     Zahavy T, Xu Z, Veeriah V, et al. A self-tuning actor-critic algorithm. In: Advances in Neural Information Processing Systems,
             Vol.33. 2020. 20913-20924.
        [128]     Wang Y, Ni T. Meta-SAC: Auto-tune the entropy temperature of soft actor-critic via metagradient. arXiv:2007.01932, 2020.
        [129]     Beck J, Jackson MT, Vuorio R, et al. Hypernetworks in meta-reinforcement learning. In: Liu K, Kulic D, Ichnowski J, eds. Proc. of
             the 6th Conf. on Robot Learning, Vol.205. 2023. 1478−1487.
        [130]     Mehta B, Diaz M, Golemo F, et al. Active domain randomization. In: Proc. of the Conf. on Robot Learning. 2020. 1162−1176.
        [131]     Pan X, Seita D, Gao Y, et al. Risk averse robust adversarial reinforcement learning. In: Proc. of the IEEE Int’l Conf. on Robotics
             and Automation. 2019. 8522−8528.
        [132]     Mehta B, Deleu T, Raparthy SC, et al. Curriculum in gradient-based meta-reinforcement learning. arXiv:2002.07956, 2020.
        [133]     Gutierrez RL,  Leonetti  M. Information-theoretic  task selection for meta-reinforcement learning.  In:  Advances in Neural
             Information Processing Systems, Vol.33. 2020. 20532−20542.
        [134]     Gupta A, Eysenbach B, Finn C, et al. Unsupervised meta-learning for reinforcement learning. arXiv:1806.04640, 2018.
        [135]     Eysenbach B, Ibarz J, Gupta A, et al. Diversity is all you need: learning skills without a reward function. In: Proc. of the Int’l Conf.
             on Learning Representations. 2019.
        [136]     Jabri  A, Hsu K,  Eysenbach  B,  et al. Unsupervised curricula for visual meta-reinforcement learning.  In:  Advances in Neural
             Information Processing Systems, Vol.32. 2019.
        [137]     Rimon Z, Tamar A, Adler G. Meta reinforcement learning with finite training tasks -a density estimation approach. arXiv:2206.
             10716, 2022.
        [138]     Zhang J, Wang J, Hu H, et al. MetaCURE: Meta reinforcement learning with empowerment-driven exploration. In: Proc. of the
             Int’l Conf. on Machine Learning. 2021. 12600−12610.
   64   65   66   67   68   69   70   71   72   73   74