Page 311 - 《软件学报》2025年第10期
P. 311
4708 软件学报 2025 年第 36 卷第 10 期
References:
[1] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed., Cambridge: The MIT Press, 2018.
[2] Liu Q, Zhai JW, Zhang ZC, Zhong S, Zhou Q, Zhang P, Xu J. A survey on deep reinforcement learning. Chinese Journal of Computers,
2018, 41(1): 1–27 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2018.00001]
[3] Liu JW, Liu Y, Luo XL. Research and development on deep learning. Application Research of Computers, 2014, 31(7): 1921–1930, 1942
(in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2014.07.001]
[4] Levine S, Kumar A, Tucker G, Fu J. Offline reinforcement learning: Tutorial, review, and perspectives on open problems.
arXiv:2005.01643, 2020.
[5] Peng ZY, Han CL, Liu YD, Zhou ZT. Weighted policy constraints for offline reinforcement learning. Proc. of the AAAI Conf. on
Artificial Intelligence, 2023, 37(8): 9435–9443. [doi: 10.1609/AAAI.V37I8.26130]
[6] Mao YX, Zhang HC, Chen C, Xu Y, Ji XY. Supported value regularization for offline reinforcement learning. In: Proc. of the 37th Int’l
Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 40587–40609.
[7] Zhang BL, Liu ZR. Adaptive uncertainty quantification for model-based offline reinforcement learning. Journal of Nanjing University of
Posts and Telecommunications (Natural Science Edition), 2024, 44(4): 98–104 (in Chinese with English abstract). [doi: 10.14132/j.cnki.
1673-5439.2024.04.009]
[8] Moerland TM, Broekens J, Plaat A, Jonker CM. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine
Learning, 2023, 16(1): 1–67. [doi: 10.1561/2200000086]
[9] Fujimoto S, Meger D, Precup D. Off-policy deep reinforcement learning without exploration. In: Proc. of the 36th Int’l Conf. on Machine
Learning. Long Beach: PMLR, 2019. 2052–2062.
[10] Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proc. of the 35th Int’l Conf. on
Machine Learning. Stockholm: PMLR, 2018. 1587–1596.
[11] Kumar A, Zhou A, Tucker G, Levine S. Conservative Q-learning for offline reinforcement learning. In: Proc. of the 34th Int’l Conf. on
Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1179–1191.
[12] Yu TH, Thomas G, Yu LT, Ermon S, Zou J, Levine S, Finn C, Ma TY. MOPO: Model-based offline policy optimization. In: Proc. of the
34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 14129–14142.
[13] Kostrikov I, Nair A, Levine S. Offline reinforcement learning with implicit Q-learning. arXiv:2110.06169, 2021.
[14] Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A. Reinforcement learning with augmented data. In: Proc. of the 34th Int’l Conf.
on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 19884–19895.
[15] Zhu ZD, Lin KX, Jain AK, Zhou JY. Transfer learning in deep reinforcement learning: A survey. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 2023, 45(11): 13344–13362. [doi: 10.1109/TPAMI.2023.3292075]
[16] Bhardwaj M, Xie TY, Boots B, Jiang N, Cheng CA. Adversarial model for offline reinforcement learning. In: Proc. of the 37th Int’l Conf.
on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 1245–1269.
[17] Wang SY, Li XD, Qu H, Chen WY. State augmentation via self-supervision in offline multiagent reinforcement learning. IEEE Trans. on
Cognitive and Developmental Systems, 2024, 16(3): 1051–1062. [doi: 10.1109/TCDS.2023.3326297]
[18] Qiao WD, Yang R. Soft Adversarial offline reinforcement learning via reducing the attack strength for generalization. In: Proc. of the
16th Int’l Conf. on Machine Learning and Computing. Shenzhen: ACM, 2024. 498–505. [doi: 10.1145/3651671.3651762]
[19] Rengarajan D, Vaidya G, Sarvesh A, Kalathil D, Shakkottai S. Reinforcement learning with sparse rewards using guidance from offline
demonstration. arXiv:2202.04628, 2022.
[20] Liu SF, Sun SL. Safe offline reinforcement learning through hierarchical policies. In: Proc. of the 26th Pacific-Asia Conf. on Knowledge
Discovery and Data Mining. Chengdu: Springer, 2022. 380–391. [doi: 10.1007/978-3-031-05936-0_30]
[21] Lin QJ, Liu H, Sengupta B. Switch trajectory Transformer with distributional value approximation for multi-task reinforcement learning.
arXiv:2203.07413, 2022.
[22] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Proc. of the 34th Int’l Conf. on Neural Information Processing
Systems. Vancouver: Curran Associates Inc., 2020. 6840–6851.
[23] Hong ZW, Kumar A, Karnik S, Bhandwaldar A, Srivastava A, Pajarinen J, Laroche R, Gupta A, Agrawal P. Beyond uniform sampling:
Offline reinforcement learning with imbalanced datasets. In: Proc. of the 37th Int’l Conf. on Neural Information Processing Systems. New
Orleans: Curran Associates Inc., 2024. 4985–5009.
[24] Xu HR, Jiang L, Li JX, Yang ZR, Wang ZR, Chan VWK, Zhan XY. Offline RL with no OOD actions: In-sample learning via implicit
value regularization. arXiv:2303.15810, 2023.
[25] Garg D, Hejna J, Geist M, Ermon S. Extreme Q-learning: MaxEnt RL without entropy. arXiv:2301.02328, 2023.

