Page 193 - 《软件学报》2025年第5期

P. 193

杨尚东等: 基于分组对比学习的序贯感知技能发现 2093

[38] Eysenbach B, Salakhutdinov R, Levine S. C-Learning: Learning to achieve goals via recursive classification. In: Proc. of the 9th Int’l
Conf. on Learning Representations. OpenReview.net, 2021.
[39] Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W. Hindsight experience
replay. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017.
5055–5065.
[40] Nasiriany S, Pong VH, Lin S, Levine S. Planning with goal-conditioned policies. In: Proc. of the 33rd Int’l Conf. on Neural Information
Processing Systems. Vancouver: Curran Associates Inc., 2019. 1329.
[41] Burda Y, Edwards H, Storkey AJ, Klimov O. Exploration by random network distillation. In: Proc. of the 7th Int’l Conf. on Learning
Representations. New Orleans: OpenReview.net, 2019.
[42] Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. First return, then explore. Nature, 2021, 590(7847): 580–586. [doi: 10.1038/
s41586-020-03157-9]
[43] Jiang YD, Liu EZ, Eysenbach B, Kolter JZ, Finn C. Learning options via compression. In: Proc. of the 36th Int’l Conf. on Neural
Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1540.
[44] Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: Proc. of the 2012 IEEE/RSJ Int’l Conf. on
Intelligent Robots and Systems. Vilamoura-Algarve: IEEE, 2012. 5026–5033. [doi: 10.1109/IROS.2012.6386109]
[45] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI gym. arXiv:1606.01540, 2016.
[46] Neil D, Segler MHS, Guasch L, Ahmed M, Plumbley D, Sellwood M, Brown N. Exploring deep recurrent models with reinforcement
learning for molecule design. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.

附中文参考文献:
[5] 余超, 董银昭, 郭宪, 冯旸赫, 卓汉逵, 张强. 结构交互驱动的机器人深度强化学习控制方法. 软件学报, 2023, 34(4): 1749–1764. http://
www.jos.org.cn/1000-9825/6708.htm [doi: 10.13328/j.cnki.jos.006708]
[6] 王金永, 黄志球, 杨德艳, Huang XW, 祝义, 华高洋. 面向无人驾驶时空同步约束制导的安全强化学习. 计算机研究与发展, 2021,
58(12): 2585–2603. [doi: 10.7544/issn1000-1239.2021.20211023]
[7] 轩书哲, 柯良军. 基于多智能体强化学习的无人机集群攻防对抗策略研究. 无线电工程, 2021, 51(5): 360–366. [doi: 10.3969/
j.issn.1003-3106.2021.05.004]

杨尚东(1990－), 男, 博士, 讲师, 主要研究领域陈兴国(1984－), 男, 博士, 讲师, CCF 专业会员,
为强化学习, 多智能体系统, 机器学习. 主要研究领域为强化学习, 游戏人工智能, 机器
学习.

余淼盈(1998－), 女, 硕士生, 主要研究领域为强陈蕾(1975－), 男, 博士, 教授, 博士生导师, CCF
化学习, 机器学习, 数据挖掘. 高级会员, 主要研究领域为机器学习, 模式识别.

188 189 190 191 192 193 194 195 196 197 198