Page 311 - 《软件学报》2025年第10期
P. 311

4708                                                      软件学报  2025  年第  36  卷第  10  期


                 References:
                  [1]   Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed., Cambridge: The MIT Press, 2018.
                  [2]   Liu Q, Zhai JW, Zhang ZC, Zhong S, Zhou Q, Zhang P, Xu J. A survey on deep reinforcement learning. Chinese Journal of Computers,
                     2018, 41(1): 1–27 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2018.00001]
                  [3]   Liu JW, Liu Y, Luo XL. Research and development on deep learning. Application Research of Computers, 2014, 31(7): 1921–1930, 1942
                     (in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2014.07.001]
                  [4]   Levine  S,  Kumar  A,  Tucker  G,  Fu  J.  Offline  reinforcement  learning:  Tutorial,  review,  and  perspectives  on  open  problems.
                     arXiv:2005.01643, 2020.
                  [5]   Peng  ZY,  Han  CL,  Liu  YD,  Zhou  ZT.  Weighted  policy  constraints  for  offline  reinforcement  learning.  Proc.  of  the  AAAI  Conf.  on
                     Artificial Intelligence, 2023, 37(8): 9435–9443. [doi: 10.1609/AAAI.V37I8.26130]
                  [6]   Mao YX, Zhang HC, Chen C, Xu Y, Ji XY. Supported value regularization for offline reinforcement learning. In: Proc. of the 37th Int’l
                     Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 40587–40609.
                  [7]   Zhang BL, Liu ZR. Adaptive uncertainty quantification for model-based offline reinforcement learning. Journal of Nanjing University of
                     Posts and Telecommunications (Natural Science Edition), 2024, 44(4): 98–104 (in Chinese with English abstract). [doi: 10.14132/j.cnki.
                     1673-5439.2024.04.009]
                  [8]   Moerland TM, Broekens J, Plaat A, Jonker CM. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine
                     Learning, 2023, 16(1): 1–67. [doi: 10.1561/2200000086]
                  [9]   Fujimoto S, Meger D, Precup D. Off-policy deep reinforcement learning without exploration. In: Proc. of the 36th Int’l Conf. on Machine
                     Learning. Long Beach: PMLR, 2019. 2052–2062.
                 [10]   Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proc. of the 35th Int’l Conf. on
                     Machine Learning. Stockholm: PMLR, 2018. 1587–1596.
                 [11]   Kumar A, Zhou A, Tucker G, Levine S. Conservative Q-learning for offline reinforcement learning. In: Proc. of the 34th Int’l Conf. on
                     Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1179–1191.
                 [12]   Yu TH, Thomas G, Yu LT, Ermon S, Zou J, Levine S, Finn C, Ma TY. MOPO: Model-based offline policy optimization. In: Proc. of the
                     34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 14129–14142.
                 [13]   Kostrikov I, Nair A, Levine S. Offline reinforcement learning with implicit Q-learning. arXiv:2110.06169, 2021.
                 [14]   Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A. Reinforcement learning with augmented data. In: Proc. of the 34th Int’l Conf.
                     on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 19884–19895.
                 [15]   Zhu ZD, Lin KX, Jain AK, Zhou JY. Transfer learning in deep reinforcement learning: A survey. IEEE Trans. on Pattern Analysis and
                     Machine Intelligence, 2023, 45(11): 13344–13362. [doi: 10.1109/TPAMI.2023.3292075]
                 [16]   Bhardwaj M, Xie TY, Boots B, Jiang N, Cheng CA. Adversarial model for offline reinforcement learning. In: Proc. of the 37th Int’l Conf.
                     on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 1245–1269.
                 [17]   Wang SY, Li XD, Qu H, Chen WY. State augmentation via self-supervision in offline multiagent reinforcement learning. IEEE Trans. on
                     Cognitive and Developmental Systems, 2024, 16(3): 1051–1062. [doi: 10.1109/TCDS.2023.3326297]
                 [18]   Qiao WD, Yang R. Soft Adversarial offline reinforcement learning via reducing the attack strength for generalization. In: Proc. of the
                     16th Int’l Conf. on Machine Learning and Computing. Shenzhen: ACM, 2024. 498–505. [doi: 10.1145/3651671.3651762]
                 [19]   Rengarajan D, Vaidya G, Sarvesh A, Kalathil D, Shakkottai S. Reinforcement learning with sparse rewards using guidance from offline
                     demonstration. arXiv:2202.04628, 2022.
                 [20]   Liu SF, Sun SL. Safe offline reinforcement learning through hierarchical policies. In: Proc. of the 26th Pacific-Asia Conf. on Knowledge
                     Discovery and Data Mining. Chengdu: Springer, 2022. 380–391. [doi: 10.1007/978-3-031-05936-0_30]
                 [21]   Lin QJ, Liu H, Sengupta B. Switch trajectory Transformer with distributional value approximation for multi-task reinforcement learning.
                     arXiv:2203.07413, 2022.
                 [22]   Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Proc. of the 34th Int’l Conf. on Neural Information Processing
                     Systems. Vancouver: Curran Associates Inc., 2020. 6840–6851.
                 [23]   Hong ZW, Kumar A, Karnik S, Bhandwaldar A, Srivastava A, Pajarinen J, Laroche R, Gupta A, Agrawal P. Beyond uniform sampling:
                     Offline reinforcement learning with imbalanced datasets. In: Proc. of the 37th Int’l Conf. on Neural Information Processing Systems. New
                     Orleans: Curran Associates Inc., 2024. 4985–5009.
                 [24]   Xu HR, Jiang L, Li JX, Yang ZR, Wang ZR, Chan VWK, Zhan XY. Offline RL with no OOD actions: In-sample learning via implicit
                     value regularization. arXiv:2303.15810, 2023.
                 [25]   Garg D, Hejna J, Geist M, Ermon S. Extreme Q-learning: MaxEnt RL without entropy. arXiv:2301.02328, 2023.
   306   307   308   309   310   311   312   313   314   315   316