Page 192 - 《软件学报》2025年第5期

P. 192

2092 软件学报 2025 年第 36 卷第 5 期

Learning Representations. San Juan, 2016. 1–16.
[13] Abel D, Jinnai Y, Guo SY, Konidaris GD, Littman ML. Policy and value transfer in lifelong reinforcement learning. In: Proc. of the 35th
Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 20–29.
[14] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proc. of the 30th AAAI Conf. on Artificial
Intelligence. Phoenix: AAAI, 2016. 2094–2100. [doi: 10.1609/aaai.v30i1.10295]
[15] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
[16] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic
actor. In: Proc. of the 35th Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 1861–1870.
[17] Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.
Artificial Intelligence, 1999, 112(1–2): 181–211. [doi: 10.1016/S0004-3702(99)00052-1]
[18] Bacon PL, Harb J, Precup D. The option-critic architecture. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco:
AAAI, 2017. 1726–1734. [doi: 10.1609/aaai.v31i1.10916]
[19] Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is all you need: Learning skills without a reward function. In: Proc. of the 7th Int’l
Conf. on Learning Representations. New Orleans: OpenReview.net, 2019. 1–22.
[20] Sharma A, Gu SX, Levine S, Kumar V, Hausman K. Dynamics-aware unsupervised discovery of skills. In: Proc. of the 8th Int’l Conf. on
Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–21.
[21] Frans K, Ho J, Chen X, Abbeel P, Schulman J. Meta learning shared hierarchies. In: Proc. of the 6th Int’l Conf. on Learning
Representations. Vancouver: OpenReview.net, 2018. 1–11.
[22] Achiam J, Edwards H, Amodei D, Abbeel P. Variational option discovery algorithms. arXiv:1807.10299, 2018.
[23] Kim J, Park S, Kim G. Unsupervised skill discovery with bottleneck option learning. In: Proc. of the 38th Int’l Conf. on Machine
Learning. PMLR, 2021. 5572–5582.
[24] Nachum O, Gu SX, Lee H, Levine S. Data-efficient hierarchical reinforcement learning. In: Proc. of the 32nd Int’l Conf. on Neural
Information Processing Systems. Montréal: Curran Associates Inc., 2018. 3307–3317.
[25] Levy A, Konidaris GD, Platt Jr R, Saenko K. Learning multi-level hierarchies with hindsight. In: Proc. of the 7th Int’l Conf. on Learning
Representations. New Orleans: OpenReview.net, 2019. 1–16.
[26] Li AC, Florensa C, Clavera I, Abbeel P. Sub-policy adaptation for hierarchical reinforcement learning. In: Proc. of the 8th Int’l Conf. on
Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–15.
[27] Zhang J, Yu HN, Xu W. Hierarchical reinforcement learning by discovering intrinsic options. In: Proc. of the 9th Int’l Conf. on Learning
Representations. OpenReview.net, 2021. 1–19.
[28] Gregor K, Rezende DJ, Wierstra D. Variational intrinsic control. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon:
OpenReview.net, 2017. 1–15.
[29] Hénaff OJ, Srinivas A, De Fauw J, Razavi A, Doersch C, Eslami SMA, van Den Oord A. Data-efficient image recognition with
contrastive predictive coding. In: Proc. of the 37th Int’l Conf. on Machine Learning. JMLR.org, 2020. 391.
[30] He KM, Fan HQ, Wu YX, Xie SN, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proc. of the 2020
IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 9726–9735. [doi: 10.1109/CVPR42600.2020.00975]
[31] Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proc. of the 37th
Int’l Conf. on Machine Learning. JMLR.org, 2020. 149.
[32] Chen XL, He KM. Exploring simple Siamese representation learning. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and
Pattern Recognition. Nashville: IEEE, 2021. 15745–15753. [doi: 10.1109/CVPR46437.2021.01549]
[33] Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar MG, Piot B, Kavukcuoglu K,
Munos R, Valko M. Bootstrap your own latent a new approach to self-supervised learning. In: Proc. of the 34th Int’l Conf. on Neural
Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1786.
[34] Kaelbling LP. Learning to achieve goals. In: Proc. of the 13th Int’l Joint Conf. on Artificial Intelligence. Chambéry: Morgan Kaufmann,
1993. 1094–1099.
[35] Schaul T, Horgan D, Gregor K, Silver D. Universal value function approximators. In: Proc. of the 32nd Int’l Conf. on Machine Learning.
Lille: JMLR.org, 2015. 1312–1320.
[36] Pong V, Gu SX, Dalal M, Levine S. Temporal difference models: Model-free deep RL for model-based control. In: Proc. of the 6th Int’l
Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
[37] Zhao R, Sun XD, Tresp V. Maximum entropy-regularized multi-goal reinforcement learning. In: Proc. of the 36th Int’l Conf. on Machine
Learning. Long Beach: PMLR, 2019. 7553–7562.

187 188 189 190 191 192 193 194 195 196 197