Page 192 - 《软件学报》2025年第5期
P. 192

2092                                                       软件学报  2025  年第  36  卷第  5  期


                     Learning Representations. San Juan, 2016. 1–16.
                 [13]  Abel D, Jinnai Y, Guo SY, Konidaris GD, Littman ML. Policy and value transfer in lifelong reinforcement learning. In: Proc. of the 35th
                     Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 20–29.
                 [14]  Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proc. of the 30th AAAI Conf. on Artificial
                     Intelligence. Phoenix: AAAI, 2016. 2094–2100. [doi: 10.1609/aaai.v30i1.10295]
                 [15]  Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
                 [16]  Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic
                     actor. In: Proc. of the 35th Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 1861–1870.
                 [17]  Sutton  RS,  Precup  D,  Singh  S.  Between  MDPs  and  semi-MDPs:  A  framework  for  temporal  abstraction  in  reinforcement  learning.
                     Artificial Intelligence, 1999, 112(1–2): 181–211. [doi: 10.1016/S0004-3702(99)00052-1]
                 [18]  Bacon PL, Harb J, Precup D. The option-critic architecture. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco:
                     AAAI, 2017. 1726–1734. [doi: 10.1609/aaai.v31i1.10916]
                 [19]  Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is all you need: Learning skills without a reward function. In: Proc. of the 7th Int’l
                     Conf. on Learning Representations. New Orleans: OpenReview.net, 2019. 1–22.
                 [20]  Sharma A, Gu SX, Levine S, Kumar V, Hausman K. Dynamics-aware unsupervised discovery of skills. In: Proc. of the 8th Int’l Conf. on
                     Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–21.
                 [21]  Frans  K,  Ho  J,  Chen  X,  Abbeel  P,  Schulman  J.  Meta  learning  shared  hierarchies.  In:  Proc.  of  the  6th  Int’l  Conf.  on  Learning
                     Representations. Vancouver: OpenReview.net, 2018. 1–11.
                 [22]  Achiam J, Edwards H, Amodei D, Abbeel P. Variational option discovery algorithms. arXiv:1807.10299, 2018.
                 [23]  Kim  J,  Park  S,  Kim  G.  Unsupervised  skill  discovery  with  bottleneck  option  learning.  In:  Proc.  of  the  38th  Int’l  Conf.  on  Machine
                     Learning. PMLR, 2021. 5572–5582.
                 [24]  Nachum O, Gu SX, Lee H, Levine S. Data-efficient hierarchical reinforcement learning. In: Proc. of the 32nd Int’l Conf. on Neural
                     Information Processing Systems. Montréal: Curran Associates Inc., 2018. 3307–3317.
                 [25]  Levy A, Konidaris GD, Platt Jr R, Saenko K. Learning multi-level hierarchies with hindsight. In: Proc. of the 7th Int’l Conf. on Learning
                     Representations. New Orleans: OpenReview.net, 2019. 1–16.
                 [26]  Li AC, Florensa C, Clavera I, Abbeel P. Sub-policy adaptation for hierarchical reinforcement learning. In: Proc. of the 8th Int’l Conf. on
                     Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–15.
                 [27]  Zhang J, Yu HN, Xu W. Hierarchical reinforcement learning by discovering intrinsic options. In: Proc. of the 9th Int’l Conf. on Learning
                     Representations. OpenReview.net, 2021. 1–19.
                 [28]  Gregor K, Rezende DJ, Wierstra D. Variational intrinsic control. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon:
                     OpenReview.net, 2017. 1–15.
                 [29]  Hénaff  OJ,  Srinivas  A,  De  Fauw  J,  Razavi  A,  Doersch  C,  Eslami  SMA,  van  Den  Oord  A.  Data-efficient  image  recognition  with
                     contrastive predictive coding. In: Proc. of the 37th Int’l Conf. on Machine Learning. JMLR.org, 2020. 391.
                 [30]  He KM, Fan HQ, Wu YX, Xie SN, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proc. of the 2020
                     IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 9726–9735. [doi: 10.1109/CVPR42600.2020.00975]
                 [31]  Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proc. of the 37th
                     Int’l Conf. on Machine Learning. JMLR.org, 2020. 149.
                 [32]  Chen XL, He KM. Exploring simple Siamese representation learning. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and
                     Pattern Recognition. Nashville: IEEE, 2021. 15745–15753. [doi: 10.1109/CVPR46437.2021.01549]
                 [33]  Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar MG, Piot B, Kavukcuoglu K,
                     Munos R, Valko M. Bootstrap your own latent a new approach to self-supervised learning. In: Proc. of the 34th Int’l Conf. on Neural
                     Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1786.
                 [34]  Kaelbling LP. Learning to achieve goals. In: Proc. of the 13th Int’l Joint Conf. on Artificial Intelligence. Chambéry: Morgan Kaufmann,
                     1993. 1094–1099.
                 [35]  Schaul T, Horgan D, Gregor K, Silver D. Universal value function approximators. In: Proc. of the 32nd Int’l Conf. on Machine Learning.
                     Lille: JMLR.org, 2015. 1312–1320.
                 [36]  Pong V, Gu SX, Dalal M, Levine S. Temporal difference models: Model-free deep RL for model-based control. In: Proc. of the 6th Int’l
                     Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
                 [37]  Zhao R, Sun XD, Tresp V. Maximum entropy-regularized multi-goal reinforcement learning. In: Proc. of the 36th Int’l Conf. on Machine
                     Learning. Long Beach: PMLR, 2019. 7553–7562.
   187   188   189   190   191   192   193   194   195   196   197