Page 231 - 《软件学报》2025年第8期

P. 231

3654 软件学报 2025 年第 36 卷第 8 期

Conf. on Machine Learning. Baltimore: PMLR, 2022. 23151–23180.
[28] Jin P, Tian JX, Zhi DP, Wen XJ, Zhang M. Trainify: A CEGAR-driven training and verification framework for safe deep reinforcement
learning. In: Proc. of the 34th Int’l Conf. on Computer Aided Verification. Cham: Springer, 2022. 193–218. [doi: 10.1007/978-3-031-
13185-1_10]
[29] Hu QY, Liu JY. An Introduction to Markov Decision Processes. Xi’an: Xidian University Press, 2000 (in Chinese).
[30] Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V. In: Proc. of the 1st Annual Conf. on Robot Learning. PMLR, 2017. 1–16.
[31] Huang Z, Shen X, Xing J, Liu TL, Tian XM, Li HQ. Revisiting knowledge distillation: An inheritance and exploration framework. In:
Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 3579–3588. [doi: 10.1109/
CVPR46437.2021.00358]
[32] Nachum O, Gu SX, Lee H, Levine S. Data-efficient hierarchical reinforcement learning. In: Proc. of the 31st Advances in Neural
Information Processing Systems. Montréal, 2018. 3307–3317.
[33] Reimann J, Mansion N, Haydon J, Bray B, Chattopadhyay A, Sato S, Waga M, André É, Hasuo I, Ueda N, Yokoyama Y. Temporal logic
formalisation of ISO 34502 critical scenarios: Modular construction with the RSS safety distance. In: Proc. of the 39th ACM/SIGAPP
Symp. on Applied Computing. Avila: ACM, 2024. 186–195.

附中文参考文献:
[29] 胡奇英, 刘建庸. 马尔可夫决策过程引论. 西安: 西安电子科技大学出版社, 2000.

田丽丽(1994－), 女, 博士生, 主要研究领域为因陈逸康(2001－), 男, 硕士, 主要研究领域为机器
果机器学习, 模型的可解释性. 学习, 因果推理.

杜德慧(1979－), 女, 博士, 教授, 博士生导师, 李荥达(2003－), 男, 本科生, 主要研究领域为强
CCF 高级会员, 主要研究领域为可信软件, 信息化学习, 策略生成.
物理融合系统建模与验证, 人工智能安全可信理
论与方法.

聂基辉(1998－), 男, 硕士, 主要研究领域为强化
学习, 形式化方法.

226 227 228 229 230 231 232 233 234 235 236