Page 230 - 《软件学报》2025年第8期

P. 230

田丽丽等: 因果时空语义驱动的深度强化学习抽象建模方法 3653

References:
[1] Radanliev P, de Roure D, van Kleek M, Santos O, Ani U. Artificial intelligence in cyber physical systems. AI & Society, 2021, 36(3):
783–796. [doi: 10.1007/s00146-020-01049-0]
[2] Li SE. Deep reinforcement learning. In: Li SE, ed. Reinforcement Learning for Sequential Decision and Optimal Control. Singapore:
Springer, 2023. 365–402. [doi: 10.1007/978-981-19-7784-8_10]
[3] Junges S, Spaan MTJ. Abstraction-refinement for hierarchical probabilistic models. In: Proc. of the 34th Int’l Conf. on Computer Aided
Verification. Haifa: Springer, 2022. 102–123. [doi: 10.1007/978-3-031-13185-1_6]
[4] Devidze R, Kamalaruban P, Singla A. Exploration-guided reward shaping for reinforcement learning under sparse rewards. In: Proc. of
the 36th Int’l Conf. on Neural Information Processing System. New Orleans: Curran Associates Inc., 2022. 422.
[5] Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J. Hierarchical deep reinforcement learning: Integrating temporal abstraction and
intrinsic motivation. In: Proc. of the 29th Int’l Conf. on Neural Information Processing Systems. Barcelona, 2016. 3675–3683.
[6] Li LH, Walsh TJ, Littman ML. Towards a unified theory of state abstraction for MDPs. 2006. http://anytime.cs.umass.edu/aimath06/
proceedings/P21.pdf
[7] Castro PS. Scalable methods for computing state similarity in deterministic Markov decision processes. In: Proc. of the 34th AAAI Conf.
on Artificial Intelligence. New York: AAAI Press, 2020. 10069–10076. [doi: 10.1609/aaai.v34i06.6564]
[8] Rafati J, Noelle D. Unsupervised subgoal discovery method for learning hierarchical representations. 2019. http://rafati.net/papers/Rafati-
Noelle-2019-SPiRL.pdf
[9] Abel D. A theory of abstraction in reinforcement learning. arXiv:2203.00397, 2022.
[10] Abel D, Arumugam D, Lehnert L, Littman ML. State abstractions for lifelong reinforcement learning. In: Proc. of the 35th Int’l Conf. on
Machine Learning. Stockholmsmässan: PMLR, 2018. 10–19.
[11] Altman E. Constrained Markov Decision Processes. New York: Routledge, 2021. [doi: 10.1201/9781315140223]
[12] Andreas J, Klein D, Levine S. Modular multitask reinforcement learning with policy sketches. In: Proc. of the 34th Int’l Conf. on
Machine Learning. Sydney: PMLR, 2017. 166–175.
[13] Oh J, Singh S, Lee H, Kohli P. Zero-shot task generalization with multi-task deep reinforcement learning. In: Proc. of the 34th Int’l Conf.
on Machine Learning. Sydney: PMLR, 2017. 2661–2670.
[14] Zhang TR, Guo SQ, Tan T, Hu XL, Chen F. Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In: Proc.
of the 35th Int’l Conf. on Neural Information Processing Systems. 2020. 21579–21590.
[15] Allen C, Parikh N, Gottesman O, Konidaris G. Learning Markov state abstractions for deep reinforcement learning. In: Proc. of the 35th
Int’l Conf. on Neural Information Processing Systems. 2021. 8229–8241.
[16] Taïga AA, Courville A, Bellemare MG. Approximate exploration through state abstraction. arXiv:1808.09819, 2018.
[17] Taylor JJ, Precup D, Panagaden P. Bounding performance loss in approximate MDP homomorphisms. In: Proc. of the 22nd Int’l Conf. on
Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2008. 1649–1656.
[18] Feng S, Sun HW, Yan XT, Zhu HJ, Zou ZX, Shen SY, Liu HX. Dense reinforcement learning for safety validation of autonomous
vehicles. Nature, 2023, 615(7953): 620–627. [doi: 10.1038/s41586-023-05732-2]
[19] Abel D, Umbanhowar N, Khetarpal K, Arumugam D, Precup D, Littman ML. Value preserving state-action abstractions. In: Proc. of the
23rd Int’l Conf. on Artificial Intelligence and Statistics. Palermo: PMLR, 2020. 1639–1650.
[20] Song JY, Xie X, Ma L. SIEGE: A semantics-guided safety enhancement framework for AI-enabled cyber-physical systems. IEEE Trans.
on Software Engineering, 2023, 49(8): 4058–4080. [doi: 10.1109/TSE.2023.3282981]
[21] Guo SQ, Yan Q, Su X, Hu XL, Chen F. State-temporal compression in reinforcement learning with the reward-restricted geodesic metric.
IEEE Trans. on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5572–5589. [doi: 10.1109/TPAMI.2021.3069005]
[22] Bacon PL, Harb J, Precup D. The option-critic architecture. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco:
AAAI Press, 2017. 1726–1734. [doi: 10.1609/aaai.v31i1.10916]
[23] Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect. New York: Basic Books, 2018.
[24] Sondhi A, Shojaie A. The reduced PC-algorithm: Improved causal structure learning in large random networks. Journal of Machine
Learning Research, 2019, 20(164): 1–31.
[25] Entner D, Hoyer PO. On causal discovery from time series data using FCI. In: Proc. of the 5th European Workshop on Probabilistic
Graphical Models. Helsinki, 2010.
[26] Huang BW, Lu CC, Liu LQ, Hernández-Lobato JM, Glymour C, Schölkopf B, Zhang K. Action-sufficient state representation learning
for control with structural constraints. In: Proc. of the 39th Int’l Conf. on Machine Learning. Baltimore: PMLR, 2022. 9260–9279.
[27] Wang ZZ, Xiao XS, Xu ZF, Zhu YK, Stone P. Causal dynamics learning for task-independent state abstraction. In: Proc. of the 39th Int’l

225 226 227 228 229 230 231 232 233 234 235