Page 230 - 《软件学报》2025年第8期
P. 230

田丽丽 等: 因果时空语义驱动的深度强化学习抽象建模方法                                                    3653


                 References:
                  [1]  Radanliev P, de Roure D, van Kleek M, Santos O, Ani U. Artificial intelligence in cyber physical systems. AI & Society, 2021, 36(3):
                     783–796. [doi: 10.1007/s00146-020-01049-0]
                  [2]  Li SE. Deep reinforcement learning. In: Li SE, ed. Reinforcement Learning for Sequential Decision and Optimal Control. Singapore:
                     Springer, 2023. 365–402. [doi: 10.1007/978-981-19-7784-8_10]
                  [3]  Junges S, Spaan MTJ. Abstraction-refinement for hierarchical probabilistic models. In: Proc. of the 34th Int’l Conf. on Computer Aided
                     Verification. Haifa: Springer, 2022. 102–123. [doi: 10.1007/978-3-031-13185-1_6]
                  [4]  Devidze R, Kamalaruban P, Singla A. Exploration-guided reward shaping for reinforcement learning under sparse rewards. In: Proc. of
                     the 36th Int’l Conf. on Neural Information Processing System. New Orleans: Curran Associates Inc., 2022. 422.
                  [5]  Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J. Hierarchical deep reinforcement learning: Integrating temporal abstraction and
                     intrinsic motivation. In: Proc. of the 29th Int’l Conf. on Neural Information Processing Systems. Barcelona, 2016. 3675–3683.
                  [6]  Li LH, Walsh TJ, Littman ML. Towards a unified theory of state abstraction for MDPs. 2006. http://anytime.cs.umass.edu/aimath06/
                     proceedings/P21.pdf
                  [7]  Castro PS. Scalable methods for computing state similarity in deterministic Markov decision processes. In: Proc. of the 34th AAAI Conf.
                     on Artificial Intelligence. New York: AAAI Press, 2020. 10069–10076. [doi: 10.1609/aaai.v34i06.6564]
                  [8]  Rafati J, Noelle D. Unsupervised subgoal discovery method for learning hierarchical representations. 2019. http://rafati.net/papers/Rafati-
                     Noelle-2019-SPiRL.pdf
                  [9]  Abel D. A theory of abstraction in reinforcement learning. arXiv:2203.00397, 2022.
                 [10]  Abel D, Arumugam D, Lehnert L, Littman ML. State abstractions for lifelong reinforcement learning. In: Proc. of the 35th Int’l Conf. on
                     Machine Learning. Stockholmsmässan: PMLR, 2018. 10–19.
                 [11]  Altman E. Constrained Markov Decision Processes. New York: Routledge, 2021. [doi: 10.1201/9781315140223]
                 [12]  Andreas  J,  Klein  D,  Levine  S.  Modular  multitask  reinforcement  learning  with  policy  sketches.  In:  Proc.  of  the  34th  Int’l  Conf.  on
                     Machine Learning. Sydney: PMLR, 2017. 166–175.
                 [13]  Oh J, Singh S, Lee H, Kohli P. Zero-shot task generalization with multi-task deep reinforcement learning. In: Proc. of the 34th Int’l Conf.
                     on Machine Learning. Sydney: PMLR, 2017. 2661–2670.
                 [14]  Zhang TR, Guo SQ, Tan T, Hu XL, Chen F. Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In: Proc.
                     of the 35th Int’l Conf. on Neural Information Processing Systems. 2020. 21579–21590.
                 [15]  Allen C, Parikh N, Gottesman O, Konidaris G. Learning Markov state abstractions for deep reinforcement learning. In: Proc. of the 35th
                     Int’l Conf. on Neural Information Processing Systems. 2021. 8229–8241.
                 [16]  Taïga AA, Courville A, Bellemare MG. Approximate exploration through state abstraction. arXiv:1808.09819, 2018.
                 [17]  Taylor JJ, Precup D, Panagaden P. Bounding performance loss in approximate MDP homomorphisms. In: Proc. of the 22nd Int’l Conf. on
                     Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2008. 1649–1656.
                 [18]  Feng  S,  Sun  HW,  Yan  XT,  Zhu  HJ,  Zou  ZX,  Shen  SY,  Liu  HX.  Dense  reinforcement  learning  for  safety  validation  of  autonomous
                     vehicles. Nature, 2023, 615(7953): 620–627. [doi: 10.1038/s41586-023-05732-2]
                 [19]  Abel D, Umbanhowar N, Khetarpal K, Arumugam D, Precup D, Littman ML. Value preserving state-action abstractions. In: Proc. of the
                     23rd Int’l Conf. on Artificial Intelligence and Statistics. Palermo: PMLR, 2020. 1639–1650.
                 [20]  Song JY, Xie X, Ma L. SIEGE: A semantics-guided safety enhancement framework for AI-enabled cyber-physical systems. IEEE Trans.
                     on Software Engineering, 2023, 49(8): 4058–4080. [doi: 10.1109/TSE.2023.3282981]
                 [21]  Guo SQ, Yan Q, Su X, Hu XL, Chen F. State-temporal compression in reinforcement learning with the reward-restricted geodesic metric.
                     IEEE Trans. on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5572–5589. [doi: 10.1109/TPAMI.2021.3069005]
                 [22]  Bacon PL, Harb J, Precup D. The option-critic architecture. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco:
                     AAAI Press, 2017. 1726–1734. [doi: 10.1609/aaai.v31i1.10916]
                 [23]  Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect. New York: Basic Books, 2018.
                 [24]  Sondhi  A,  Shojaie  A.  The  reduced  PC-algorithm:  Improved  causal  structure  learning  in  large  random  networks.  Journal  of  Machine
                     Learning Research, 2019, 20(164): 1–31.
                 [25]  Entner D, Hoyer PO. On causal discovery from time series data using FCI. In: Proc. of the 5th European Workshop on Probabilistic
                     Graphical Models. Helsinki, 2010.
                 [26]  Huang BW, Lu CC, Liu LQ, Hernández-Lobato JM, Glymour C, Schölkopf B, Zhang K. Action-sufficient state representation learning
                     for control with structural constraints. In: Proc. of the 39th Int’l Conf. on Machine Learning. Baltimore: PMLR, 2022. 9260–9279.
                 [27]  Wang ZZ, Xiao XS, Xu ZF, Zhu YK, Stone P. Causal dynamics learning for task-independent state abstraction. In: Proc. of the 39th Int’l
   225   226   227   228   229   230   231   232   233   234   235