Page 312 - 《软件学报》2025年第10期
P. 312

刘全 等: 扩散模型期望最大化的离线强化学习方法                                                        4709


                 [26]   Omura M, Osa T, Mukuta Y, Harada T. Symmetric Q-learning: Reducing skewness of bellman error in online reinforcement learning.
                     Proc. of the AAAI Conf. on Artificial Intelligence, 2024, 38(13): 14474–14481. [doi: 10.1609/AAAI.V38I13.29362]
                 [27]   Jin C, Krishnamurthy A, Simchowitz M, Yu TC. Reward-free exploration for reinforcement learning. In: Proc. of the 37th Int’l Conf. on
                     Machine Learning. JMLR.org, 2020. 4870–4879.
                 [28]   Racaniere  S,  Lampinen  AK,  Santoro  A,  Reichert  DP,  Firoiu  V,  Lillicrap  TP.  Automated  curricula  through  setter-solver  interactions.
                     arXiv:1909.12892, 2020.
                 [29]   Yin  HL,  Lin  YJ,  Yan  J,  Meng  Q,  Festl  K,  Schichler  L,  Watzenig  D.  AGV  path  planning  using  curiosity-driven  deep  reinforcement
                     learning. In: Proc. of the 19th IEEE Int’l Conf. on Automation Science and Engineering. Auckland: IEEE, 2023. 1–6. [doi: 10.1109/
                     CASE56687.2023.10260579]
                 [30]   Li JN, Tang C, Tomizuka M, Zhan W. Hierarchical planning through goal-conditioned offline reinforcement learning. IEEE Robotics and
                     Automation Letters, 2022, 7(4): 10216–10223. [doi: 10.1109/LRA.2022.3190100]
                 [31]   Isele D, Rahimi R, Cosgun A, Subramanian K, Fujimura K. Navigating occluded intersections with autonomous vehicles using deep
                     reinforcement learning. In: Proc. of the 2018 IEEE Int’l Conf. on Robotics and Automation. Brisbane: IEEE, 2018. 2034–2039. [doi: 10.
                     1109/ICRA.2018.8461233]
                 [32]   Wang ZD, Hunt JJ, Zhou MY. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv:2208.06193, 2023.
                 [33]   Kang BY, Ma X, Du C, Pang TY, Yan SC. Efficient diffusion policies for offline reinforcement learning. In: Proc. of the 37th Int’l Conf.
                     on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 67195–67212.
                 [34]   Chen HY, Lu C, Ying CY, Su H, Zhu J. Offline reinforcement learning via high-fidelity generative behavior modeling. arXiv:2209.
                     14548, 2023.
                 [35]   Jiang CX, Jiang M, Xu QF, Huang X. Expectile regression neural network model with applications. Neurocomputing, 2017, 247: 73–86.
                     [doi: 10.1016/J.NEUCOM.2017.03.040]
                 [36]   Fu J, Kumar A, Nachum O, Tucker G, Levine S. D4RL: Datasets for deep data-driven reinforcement learning. arXiv:2004.07219, 2021.

                 附中文参考文献:
                 [2]   刘全, 翟建伟, 章宗长, 钟珊, 周倩, 章鹏, 徐进. 深度强化学习综述. 计算机学报, 2018, 41(1): 1–27. [doi: 10.11897/SP.J.1016.2018.
                    00001]
                 [3]   刘建伟, 刘媛, 罗雄麟. 深度学习研究进展. 计算机应用研究, 2014, 31(7): 1921–1930, 1942. [doi: 10.3969/j.issn.1001-3695.2014.
                    07.001]
                 [7]   张伯雷, 刘哲闰. 基于自适应不确定性度量的离线强化学习算法. 南京邮电大学学报               (自然科学版), 2024, 44(4): 98–104. [doi:
                    10.14132/j.cnki.1673-5439.2024.04.009]


                             刘全(1969-), 男, 博士, 教授, 博士生导师, CCF             乌兰(1999-), 女, 博士生, 主要研究领域为分层
                            高级会员, 主要研究领域为强化学习, 深度强化                      强化学习, 离线强化学习.
                            学习, 自动推理.



                             颜洁(2000-), 女, 硕士生, 主要研究领域为离线
                            强化学习.
   307   308   309   310   311   312   313   314   315   316   317