Page 193 - 《软件学报》2025年第5期
P. 193
杨尚东 等: 基于分组对比学习的序贯感知技能发现 2093
[38] Eysenbach B, Salakhutdinov R, Levine S. C-Learning: Learning to achieve goals via recursive classification. In: Proc. of the 9th Int’l
Conf. on Learning Representations. OpenReview.net, 2021.
[39] Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W. Hindsight experience
replay. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017.
5055–5065.
[40] Nasiriany S, Pong VH, Lin S, Levine S. Planning with goal-conditioned policies. In: Proc. of the 33rd Int’l Conf. on Neural Information
Processing Systems. Vancouver: Curran Associates Inc., 2019. 1329.
[41] Burda Y, Edwards H, Storkey AJ, Klimov O. Exploration by random network distillation. In: Proc. of the 7th Int’l Conf. on Learning
Representations. New Orleans: OpenReview.net, 2019.
[42] Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. First return, then explore. Nature, 2021, 590(7847): 580–586. [doi: 10.1038/
s41586-020-03157-9]
[43] Jiang YD, Liu EZ, Eysenbach B, Kolter JZ, Finn C. Learning options via compression. In: Proc. of the 36th Int’l Conf. on Neural
Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1540.
[44] Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: Proc. of the 2012 IEEE/RSJ Int’l Conf. on
Intelligent Robots and Systems. Vilamoura-Algarve: IEEE, 2012. 5026–5033. [doi: 10.1109/IROS.2012.6386109]
[45] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI gym. arXiv:1606.01540, 2016.
[46] Neil D, Segler MHS, Guasch L, Ahmed M, Plumbley D, Sellwood M, Brown N. Exploring deep recurrent models with reinforcement
learning for molecule design. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
附中文参考文献:
[5] 余超, 董银昭, 郭宪, 冯旸赫, 卓汉逵, 张强. 结构交互驱动的机器人深度强化学习控制方法. 软件学报, 2023, 34(4): 1749–1764. http://
www.jos.org.cn/1000-9825/6708.htm [doi: 10.13328/j.cnki.jos.006708]
[6] 王金永, 黄志球, 杨德艳, Huang XW, 祝义, 华高洋. 面向无人驾驶时空同步约束制导的安全强化学习. 计算机研究与发展, 2021,
58(12): 2585–2603. [doi: 10.7544/issn1000-1239.2021.20211023]
[7] 轩书哲, 柯良军. 基于多智能体强化学习的无人机集群攻防对抗策略研究. 无线电工程, 2021, 51(5): 360–366. [doi: 10.3969/
j.issn.1003-3106.2021.05.004]
杨尚东(1990-), 男, 博士, 讲师, 主要研究领域 陈兴国(1984-), 男, 博士, 讲师, CCF 专业会员,
为强化学习, 多智能体系统, 机器学习. 主要研究领域为强化学习, 游戏人工智能, 机器
学习.
余淼盈(1998-), 女, 硕士生, 主要研究领域为强 陈蕾(1975-), 男, 博士, 教授, 博士生导师, CCF
化学习, 机器学习, 数据挖掘. 高级会员, 主要研究领域为机器学习, 模式识别.