Page 184 - 《软件学报》2021年第11期

P. 184

3510 Journal of Software 软件学报 Vol.32, No.11, November 2021

本文提出了一种新型的图形卷积算子,且在实验中验证了它的有效性.
目前,本文所提出的规划网络模型及用于训练网络的 RL 算法仍存在一些不足之处,未来可围绕其做进一
步的研究.例如,可寻找一种更好的方法来定义 GAVIN 的异步值迭代过程中各节点的优先级以及用于选择要更
新的节点的阈值,使得网络可更好地应用于更大规模且内部组成结构更为复杂的应用场景,从而获得更好的泛
化能力.此外,由于 GAVIN 的每轮异步值迭代过程仅会选择特定的节点进行更新,因此在利用 IL 算法进行训练
的 GAVIN 的测试结果中会存在一定的过拟合现象,未来可寻求一种更好的神经网络结构来构建模型或是采用
数据增强以及数据清洗的方法以消除这一现象.在本文所提出的情节式加权双 Q 学习中,加权函数的大小仍是
人为设定的,未来可寻求一种无监督的参数设置方法来自动设定算法中加权函数的大小.

References:
[1] Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012,29(8):2806−2810 (in
Chinese with English abstract).
[2] Liu Q, Zhai JW, Zhang ZZ, Zhong S, Zhou Q, Zhang P, Xu J. A survey of deep reinforcement learning. Chinese Journal of
Computers, 2018,41(1):1−27 (in Chinese with English abstract).
[3] Hussein A, Gaber MM, Elyan E, Jayne C. Imitation learning: A survey of learning methods. ACM Computer Survey, 2017,50(2):
21:1−21:35.
[4] Ciresan DC, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: Proc. of the IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR). 2012. 3642−3649.
[5] Farabet C, Couprie C, Najman L, LeCun Y. Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 2013,35(8):1915−1929.
[6] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proc. of the Advances
in Neural Information Processing Systems (NIPS). 2012. 1106−1114.
[7] Chen XS, Li S, Li H, Jiang SH, Qi Y, Song L. Generative adversarial user model for reinforcement learning based recommendation
system. In: Proc. of the Int’l Conf. on Machine Learning (ICML). 2019. 1052−1061.
[8] Qureshi AH, Boots B, Yip MC. Adversarial imitation via variational inverse reinforcement learning. In: Proc. of the Int’l Conf. on
Learning Representations (ICLR). 2019.
[9] Bertsekas DP. Dynamic Programming and Optimal Control. 3rd ed., Athena Scientific, 2005.
[10] Tamar A, Wu Y, Thomas G, Levine S, Abbeel P. Value iteration networks. In: Proc. of the Advances in Neural Information
Processing Systems (NIPS). 2016. 2154−2162.
[11] Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1957.
[12] Niu SF, Chen SH, Guo HY, Targonski C, Smith MC, Kovacevic J. Generalized value iteration networks: Life beyond lattices. In:
Proc. of the AAAI Conf. on Artificial Intelligence (AAAI). 2018. 6246−6253.
[13] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep
reinforcement learning. In: Proc. of the Int’l Conf. on Machine Learning (ICML). 2016. 1928−1937.
[14] Bertsekas DP. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983,27(1):107−120.
[15] Moore AW, Atkeson CG. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 1993,13:
103−130.
[16] Pan ZY, Zhang ZZ, Chen ZX. Asynchronous value iteration network. In: Proc. of the Int’l Conf. on Neural Information Processing
(ICONIP). 2018. 169−180.
[17] Broumi S, Talea M, Bakali A, Smarandache F. Application of Dijkstra algorithm for solving interval valued neutrosophic shortest
path problem. In: Proc. of the Symp. Series on Computational Intelligence (SSCI). 2016. 1−6.
[18] Zhang ZZ, Pan ZY, Kochenderfer MJ. Weighted double Q-learning. In: Proc. of the Int’l Joint Conf. on Artificial Intelligence
(IJCAI). 2017. 3455−3461.
[19] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed., MIT Press, 2018.
[20] Krose BJA. Learning from delayed rewards. Robotics and Autonomous Systems, 1995,15(4):233−235.

179 180 181 182 183 184 185 186 187 188 189