Page 312 - 《软件学报》2025年第10期

P. 312

刘全等: 扩散模型期望最大化的离线强化学习方法 4709

[26] Omura M, Osa T, Mukuta Y, Harada T. Symmetric Q-learning: Reducing skewness of bellman error in online reinforcement learning.
Proc. of the AAAI Conf. on Artificial Intelligence, 2024, 38(13): 14474–14481. [doi: 10.1609/AAAI.V38I13.29362]
[27] Jin C, Krishnamurthy A, Simchowitz M, Yu TC. Reward-free exploration for reinforcement learning. In: Proc. of the 37th Int’l Conf. on
Machine Learning. JMLR.org, 2020. 4870–4879.
[28] Racaniere S, Lampinen AK, Santoro A, Reichert DP, Firoiu V, Lillicrap TP. Automated curricula through setter-solver interactions.
arXiv:1909.12892, 2020.
[29] Yin HL, Lin YJ, Yan J, Meng Q, Festl K, Schichler L, Watzenig D. AGV path planning using curiosity-driven deep reinforcement
learning. In: Proc. of the 19th IEEE Int’l Conf. on Automation Science and Engineering. Auckland: IEEE, 2023. 1–6. [doi: 10.1109/
CASE56687.2023.10260579]
[30] Li JN, Tang C, Tomizuka M, Zhan W. Hierarchical planning through goal-conditioned offline reinforcement learning. IEEE Robotics and
Automation Letters, 2022, 7(4): 10216–10223. [doi: 10.1109/LRA.2022.3190100]
[31] Isele D, Rahimi R, Cosgun A, Subramanian K, Fujimura K. Navigating occluded intersections with autonomous vehicles using deep
reinforcement learning. In: Proc. of the 2018 IEEE Int’l Conf. on Robotics and Automation. Brisbane: IEEE, 2018. 2034–2039. [doi: 10.
1109/ICRA.2018.8461233]
[32] Wang ZD, Hunt JJ, Zhou MY. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv:2208.06193, 2023.
[33] Kang BY, Ma X, Du C, Pang TY, Yan SC. Efficient diffusion policies for offline reinforcement learning. In: Proc. of the 37th Int’l Conf.
on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 67195–67212.
[34] Chen HY, Lu C, Ying CY, Su H, Zhu J. Offline reinforcement learning via high-fidelity generative behavior modeling. arXiv:2209.
14548, 2023.
[35] Jiang CX, Jiang M, Xu QF, Huang X. Expectile regression neural network model with applications. Neurocomputing, 2017, 247: 73–86.
[doi: 10.1016/J.NEUCOM.2017.03.040]
[36] Fu J, Kumar A, Nachum O, Tucker G, Levine S. D4RL: Datasets for deep data-driven reinforcement learning. arXiv:2004.07219, 2021.

附中文参考文献:
[2] 刘全, 翟建伟, 章宗长, 钟珊, 周倩, 章鹏, 徐进. 深度强化学习综述. 计算机学报, 2018, 41(1): 1–27. [doi: 10.11897/SP.J.1016.2018.
00001]
[3] 刘建伟, 刘媛, 罗雄麟. 深度学习研究进展. 计算机应用研究, 2014, 31(7): 1921–1930, 1942. [doi: 10.3969/j.issn.1001-3695.2014.
07.001]
[7] 张伯雷, 刘哲闰. 基于自适应不确定性度量的离线强化学习算法. 南京邮电大学学报 (自然科学版), 2024, 44(4): 98–104. [doi:
10.14132/j.cnki.1673-5439.2024.04.009]

刘全(1969－), 男, 博士, 教授, 博士生导师, CCF 乌兰(1999－), 女, 博士生, 主要研究领域为分层
高级会员, 主要研究领域为强化学习, 深度强化强化学习, 离线强化学习.
学习, 自动推理.

颜洁(2000－), 女, 硕士生, 主要研究领域为离线
强化学习.

307 308 309 310 311 312 313 314 315 316 317