Page 69 - 《软件学报》2024年第4期
P. 69
陈奕宇 等: 元强化学习研究综述 1647
[109] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proc. of the Int’l Conf. on Machine Learning. 2014.
387−395.
[110] Houthooft R, Chen RY, Isola P, et al. Evolved policy gradients. In: Advances in Neural Information Processing Systems, Vol.31.
2018. 5400−5409.
[111] Kirsch L, Van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In:
Proc. of the Int’l Conf. on Learning Representations. 2020.
[112] Xu Z, Van Hasselt H, Hessel M, et al. Meta-gradient reinforcement learning with an objective discovered online. In: Advances in
Neural Information Processing Systems, Vol.33. 2020. 15254−15264.
[113] Zhou W, Li Y, Yang Y, et al. Online meta-critic learning for off-policy actor-critic methods. In: Advances in Neural Information
Processing Systems, Vol.33. 2020. 17662−17673.
[114] Oh J, Hessel M, Czarnecki WM, et al. Discovering reinforcement learning algorithms. In: Advances in Neural Information
Processing Systems, Vol.33. 2020. 1060−1070.
[115] Veeriah V, Hessel M, Xu Z, et al. Discovery of useful questions as auxiliary tasks. In: Advances in Neural Information Processing
Systems, Vol.32. 2019.
[116] Zheng Z, Oh J, Singh S. On learning intrinsic rewards for policy gradient methods. In: Advances in Neural Information Processing
Systems, Vol.31. 2018. 4644−4654.
[117] Yang Y, Caluwaerts K, Iscen A, et al. NoRML: No-reward meta learning. In: Proc. of the Int’l Joint Conf. on Autonomous Agents
and MultiAgent Systems, Vol.1. 2019. 323−331.
[118] Xu K, Ratner E, Dragan A, et al. Learning a prior over intent via meta-inverse reinforcement learning. In: Proc. of the Int’l Conf.
on Machine Learning. 2019. 6952−6962.
[119] Yu L, Yu T, Finn C, et al. Meta-inverse reinforcement learning with probabilistic context variables. In: Advances in Neural
Information Processing Systems, Vol.32. 2019. 1−15.
[120] Ghasemipour SKS, Gu S, Zemel R. SMILe: Scalable meta inverse reinforcement learning through context-conditional policies. In:
Advances in Neural Information Processing Systems, Vol.32. 2019. 1−11.
[121] Pong VH, Nair A, Smith L, et al. Offline meta-reinforcement learning with online self-supervision. In: Proc. of the Int’l Conf. on
Machine Learning. 2022. 17811−17829.
[122] Luo FM, Xu T, Lai H, et al. A survey on model-based reinforcement learning. arXiv:2206.09328, 2022.
[123] Clavera I, Rothfuss J, Schulman J, et al. Model-based reinforcement learning via meta-policy optimization. In: Proc. of the Conf.
on Robot Learning. 2018. 617−629.
[124] Mendonca R, Geng X, Finn C, et al. Meta-reinforcement learning robust to distributional shift via model identification and
experience relabeling. arXiv:2006.07178, 2020.
[125] Wang Q, Van Hoof H. Model-based meta reinforcement learning using graph structured surrogate models and amortized policy
search. In: Proc. of the Int’l Conf. on Machine Learning. 2022. 23055−23077.
[126] Xu Z, Van Hasselt H, Silver D. Meta-gradient reinforcement learning. In: Advances in Neural Information Processing Systems,
Vol.31. 2018. 2396−2407.
[127] Zahavy T, Xu Z, Veeriah V, et al. A self-tuning actor-critic algorithm. In: Advances in Neural Information Processing Systems,
Vol.33. 2020. 20913-20924.
[128] Wang Y, Ni T. Meta-SAC: Auto-tune the entropy temperature of soft actor-critic via metagradient. arXiv:2007.01932, 2020.
[129] Beck J, Jackson MT, Vuorio R, et al. Hypernetworks in meta-reinforcement learning. In: Liu K, Kulic D, Ichnowski J, eds. Proc. of
the 6th Conf. on Robot Learning, Vol.205. 2023. 1478−1487.
[130] Mehta B, Diaz M, Golemo F, et al. Active domain randomization. In: Proc. of the Conf. on Robot Learning. 2020. 1162−1176.
[131] Pan X, Seita D, Gao Y, et al. Risk averse robust adversarial reinforcement learning. In: Proc. of the IEEE Int’l Conf. on Robotics
and Automation. 2019. 8522−8528.
[132] Mehta B, Deleu T, Raparthy SC, et al. Curriculum in gradient-based meta-reinforcement learning. arXiv:2002.07956, 2020.
[133] Gutierrez RL, Leonetti M. Information-theoretic task selection for meta-reinforcement learning. In: Advances in Neural
Information Processing Systems, Vol.33. 2020. 20532−20542.
[134] Gupta A, Eysenbach B, Finn C, et al. Unsupervised meta-learning for reinforcement learning. arXiv:1806.04640, 2018.
[135] Eysenbach B, Ibarz J, Gupta A, et al. Diversity is all you need: learning skills without a reward function. In: Proc. of the Int’l Conf.
on Learning Representations. 2019.
[136] Jabri A, Hsu K, Eysenbach B, et al. Unsupervised curricula for visual meta-reinforcement learning. In: Advances in Neural
Information Processing Systems, Vol.32. 2019.
[137] Rimon Z, Tamar A, Adler G. Meta reinforcement learning with finite training tasks -a density estimation approach. arXiv:2206.
10716, 2022.
[138] Zhang J, Wang J, Hu H, et al. MetaCURE: Meta reinforcement learning with empowerment-driven exploration. In: Proc. of the
Int’l Conf. on Machine Learning. 2021. 12600−12610.