Page 184 - 《软件学报》2021年第11期
P. 184

3510                                Journal of Software  软件学报 Vol.32, No.11, November 2021

                 本文提出了一种新型的图形卷积算子,且在实验中验证了它的有效性.
                    目前,本文所提出的规划网络模型及用于训练网络的 RL 算法仍存在一些不足之处,未来可围绕其做进一
                 步的研究.例如,可寻找一种更好的方法来定义 GAVIN 的异步值迭代过程中各节点的优先级以及用于选择要更
                 新的节点的阈值,使得网络可更好地应用于更大规模且内部组成结构更为复杂的应用场景,从而获得更好的泛
                 化能力.此外,由于 GAVIN 的每轮异步值迭代过程仅会选择特定的节点进行更新,因此在利用 IL 算法进行训练
                 的 GAVIN 的测试结果中会存在一定的过拟合现象,未来可寻求一种更好的神经网络结构来构建模型或是采用
                 数据增强以及数据清洗的方法以消除这一现象.在本文所提出的情节式加权双 Q 学习中,加权函数的大小仍是
                 人为设定的,未来可寻求一种无监督的参数设置方法来自动设定算法中加权函数的大小.


                 References:
                 [1]    Sun ZJ, Xue L, Xu YM,  Wang Z. Overview  of  deep  learning. Application Research  of Computers,  2012,29(8):2806−2810 (in
                     Chinese with English abstract).
                 [2]    Liu Q, Zhai  JW, Zhang ZZ, Zhong  S, Zhou Q,  Zhang  P, Xu  J. A survey  of  deep  reinforcement  learning. Chinese  Journal  of
                     Computers, 2018,41(1):1−27 (in Chinese with English abstract).
                 [3]    Hussein A, Gaber MM, Elyan E, Jayne C. Imitation learning: A survey of learning methods. ACM Computer Survey, 2017,50(2):
                     21:1−21:35.
                 [4]    Ciresan DC, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: Proc. of the IEEE Conf. on
                     Computer Vision and Pattern Recognition (CVPR). 2012. 3642−3649.
                 [5]    Farabet C, Couprie C, Najman L, LeCun Y. Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analysis and
                     Machine Intelligence, 2013,35(8):1915−1929.
                 [6]    Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proc. of the Advances
                     in Neural Information Processing Systems (NIPS). 2012. 1106−1114.
                 [7]    Chen XS, Li S, Li H, Jiang SH, Qi Y, Song L. Generative adversarial user model for reinforcement learning based recommendation
                     system. In: Proc. of the Int’l Conf. on Machine Learning (ICML). 2019. 1052−1061.
                 [8]    Qureshi AH, Boots B, Yip MC. Adversarial imitation via variational inverse reinforcement learning. In: Proc. of the Int’l Conf. on
                     Learning Representations (ICLR). 2019.
                 [9]    Bertsekas DP. Dynamic Programming and Optimal Control. 3rd ed., Athena Scientific, 2005.
                [10]    Tamar A, Wu Y, Thomas G, Levine  S, Abbeel  P.  Value  iteration  networks.  In:  Proc.  of the Advances  in Neural Information
                     Processing Systems (NIPS). 2016. 2154−2162.
                [11]    Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1957.
                [12]    Niu SF, Chen SH, Guo HY, Targonski C, Smith MC, Kovacevic J. Generalized value iteration networks: Life beyond lattices. In:
                     Proc. of the AAAI Conf. on Artificial Intelligence (AAAI). 2018. 6246−6253.
                [13]    Mnih V, Badia AP, Mirza M, Graves A, Lillicrap  TP, Harley  T,  Silver  D, Kavukcuoglu K. Asynchronous methods for  deep
                     reinforcement learning. In: Proc. of the Int’l Conf. on Machine Learning (ICML). 2016. 1928−1937.
                [14]    Bertsekas DP. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983,27(1):107−120.
                [15]    Moore AW, Atkeson CG. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 1993,13:
                     103−130.
                [16]    Pan ZY, Zhang ZZ, Chen ZX. Asynchronous value iteration network. In: Proc. of the Int’l Conf. on Neural Information Processing
                     (ICONIP). 2018. 169−180.
                [17]    Broumi S, Talea M, Bakali A, Smarandache F. Application of Dijkstra algorithm for solving interval valued neutrosophic shortest
                     path problem. In: Proc. of the Symp. Series on Computational Intelligence (SSCI). 2016. 1−6.
                [18]    Zhang ZZ, Pan  ZY, Kochenderfer MJ. Weighted  double Q-learning.  In:  Proc.  of the Int’l  Joint Conf.  on  Artificial Intelligence
                     (IJCAI). 2017. 3455−3461.
                [19]    Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed., MIT Press, 2018.
                [20]    Krose BJA. Learning from delayed rewards. Robotics and Autonomous Systems, 1995,15(4):233−235.
   179   180   181   182   183   184   185   186   187   188   189