Page 193 - 《软件学报》2025年第5期
P. 193

杨尚东 等: 基于分组对比学习的序贯感知技能发现                                                        2093


                 [38]  Eysenbach B, Salakhutdinov R, Levine S. C-Learning: Learning to achieve goals via recursive classification. In: Proc. of the 9th Int’l
                     Conf. on Learning Representations. OpenReview.net, 2021.
                 [39]  Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W. Hindsight experience
                     replay.  In:  Proc.  of  the  31st  Int’l  Conf.  on  Neural  Information  Processing  Systems.  Long  Beach:  Curran  Associates  Inc.,  2017.
                     5055–5065.
                 [40]  Nasiriany S, Pong VH, Lin S, Levine S. Planning with goal-conditioned policies. In: Proc. of the 33rd Int’l Conf. on Neural Information
                     Processing Systems. Vancouver: Curran Associates Inc., 2019. 1329.
                 [41]  Burda Y, Edwards H, Storkey AJ, Klimov O. Exploration by random network distillation. In: Proc. of the 7th Int’l Conf. on Learning
                     Representations. New Orleans: OpenReview.net, 2019.
                 [42]  Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. First return, then explore. Nature, 2021, 590(7847): 580–586. [doi: 10.1038/
                     s41586-020-03157-9]
                 [43]  Jiang  YD,  Liu  EZ,  Eysenbach  B,  Kolter  JZ,  Finn  C.  Learning  options  via  compression.  In:  Proc.  of  the  36th  Int’l  Conf.  on  Neural
                     Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1540.
                 [44]  Todorov  E,  Erez  T,  Tassa  Y.  MuJoCo:  A  physics  engine  for  model-based  control.  In:  Proc.  of  the  2012  IEEE/RSJ  Int’l  Conf.  on
                     Intelligent Robots and Systems. Vilamoura-Algarve: IEEE, 2012. 5026–5033. [doi: 10.1109/IROS.2012.6386109]
                 [45]  Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI gym. arXiv:1606.01540, 2016.
                 [46]  Neil D, Segler MHS, Guasch L, Ahmed M, Plumbley D, Sellwood M, Brown N. Exploring deep recurrent models with reinforcement
                     learning for molecule design. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.

                 附中文参考文献:
                 [5]  余超, 董银昭, 郭宪, 冯旸赫, 卓汉逵, 张强. 结构交互驱动的机器人深度强化学习控制方法. 软件学报, 2023, 34(4): 1749–1764. http://
                    www.jos.org.cn/1000-9825/6708.htm [doi: 10.13328/j.cnki.jos.006708]
                 [6]  王金永, 黄志球, 杨德艳, Huang XW, 祝义, 华高洋. 面向无人驾驶时空同步约束制导的安全强化学习. 计算机研究与发展, 2021,
                    58(12): 2585–2603. [doi: 10.7544/issn1000-1239.2021.20211023]
                 [7]  轩书哲, 柯良军. 基于多智能体强化学习的无人机集群攻防对抗策略研究. 无线电工程, 2021, 51(5): 360–366. [doi: 10.3969/
                    j.issn.1003-3106.2021.05.004]


                             杨尚东(1990-), 男, 博士, 讲师, 主要研究领域                陈兴国(1984-), 男, 博士, 讲师, CCF  专业会员,
                            为强化学习, 多智能体系统, 机器学习.                         主要研究领域为强化学习, 游戏人工智能, 机器
                                                                         学习.



                             余淼盈(1998-), 女, 硕士生, 主要研究领域为强                 陈蕾(1975-), 男, 博士, 教授, 博士生导师, CCF
                            化学习, 机器学习, 数据挖掘.                             高级会员, 主要研究领域为机器学习, 模式识别.
   188   189   190   191   192   193   194   195   196   197   198