Page 260 - 《软件学报》2021年第8期
P. 260

2542                                   Journal of Software  软件学报 Vol.32, No.8,  August 2021

                 [56]    Wu  Q,  Wang P,  Shen C,  Reid I,  Van  Den Hengel  A.  Are  you talking to  me? Reasoned visual dialog generation through
                      adversarial learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6106−6115. [doi: 10.1109/
                      CVPR.2018.00639]
                 [57]    Yu Z, Yu J, Cui Y, Tao D, Tian Q. Deep modular co-attention networks for visual question answering. In: Proc. of the IEEE Conf.
                      on Computer Vision and Pattern Recognition. 2019. 6281−6290. [doi: 10.1109/CVPR.2019.00644]
                 [58]    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lukasz K, Polosukhin I. Attention is all you need. In: Proc.
                      of the Advances in Neural Information Processing Systems. 2017. 5998−6008.
                 [59]    Gao P, Jiang  Z,  You  H,  Lu P,  Hoi  S, Wang  X, Li H.  Dynamic fusion  with intra-and inter-modality  attention flow for visual
                      question answering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 6639−6648. [doi: 10.1109/
                      CVPR.2019.00680]
                 [60]    Teney D, Anderson  P,  He X, Van Den Hengel  A.  Tips and  tricks  for visual  question answering: Learnings  from  the  2017
                      challenge. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4223−4232. [doi: 10.1109/CVPR.2018.
                      00444]
                 [61]    Lu P, Li H, Zhang W, Wang J, Wang X. Co-attending free-form regions and detections with multi-modal multiplicative feature
                      embedding for visual question answering. arXiv preprint arXiv:1711.06794, 2017.
                 [62]    Wu C, Liu J, Wang X, Dong X. Object-difference attention: A simple relational attention for visual question answering. In: Proc.
                      of the 26th ACM Int’l Conf. on Multimedia. 2018. 519−527.
                 [63]    Cadene R, Ben-Younes H, Cord M, Thome N. Murel: Multimodal relational reasoning for visual question answering. In: Proc. of
                      the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 1989−1998. [doi: 10.1109/CVPR.2019.00209]
                 [64]    Li L, Gan Z, Cheng Y, Liu J. Relation-aware graph attention network for visual question answering. In: Proc. of the IEEE Int’l
                      Conf. on Computer Vision. 2019. 10313−10322. [doi: 10.1109/ICCV.2019.01041]
                 [65]    Andreas  J, Rohrbach  M, Darrell T, Klein D. Neural module  networks. In:  Proc.  of  the  IEEE Conf.  on Computer Vision and
                      Pattern Recognition. 2016. 39−48.
                 [66]    Klein D, Manning CD. Accurate unlexicalized parsing. In: Proc. of the 41st Annual Meeting of the Association for Computational
                      Linguistics. 2003. 423−430.
                 [67]    De Marneffe MC, Manning CD. The Stanford typed dependencies representation. In: Proc. of the Workshop on Cross-framework
                      and Cross-domain Parser Evaluation. 2008. 1−8.
                 [68]    Andreas J, Rohrbach M, Darrell T, Klein D. Learning to compose neural networks for question answering. In: Proc. of the Annual
                      Conf. of the North American Chapter of the Association for Computational Linguistics. 2016. 1545-1554.
                 [69]    Hu R,  Andreas J,  Rohrbach M,  Darrell T, Saenko K. Learning to reason:  End-to-end  module networks for visual question
                      answering. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 804−813.
                 [70]    Kumar  A, Irsoy O,  Ondruska P, Iyyer  M,  Bradbury J, Gulrajani I, Zhong V, Paulus  R, Socher  R.  Ask  me  anything:  Dynamic
                      memory networks for natural language processing. In: Proc. of the Int’l Conf. on Machine Learning. 2016. 1378−1387.
                 [71]    Xiong C, Merity S, Socher R. Dynamic memory networks for visual and textual question answering. In: Proc. of the Int’l Conf. on
                      Machine Learning. 2016. 2397−2406.
                 [72]    Noh H, Han B. Training recurrent answering units with joint loss minimization for VQA. arXiv preprint arXiv:1606.03647, 2016.
                 [73]    Wang P, Wu Q, Shen C, Van Den Hengel A, Dick A. Explicit knowledge-based reasoning for visual question answering. arXiv
                      preprint arXiv:1511.02570, 2015.
                 [74]    Wang P, Wu Q,  Shen C, Dick A, Van Den Hengel A.  FVQA:  Fact-based  visual  question  answering.  IEEE Trans.  on  Pattern
                      Analysis and Machine Intelligence, 2018,40(10):2413−2427.
                 [75]    Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z. Dbpedia: A nucleus for a Web of open data. In: Proc. of the
                      Semantic Web. Berlin, Heidelberg: Springer-Verlag, 2007. 722−735.
                 [76]    Wu Q, Wang P, Shen C, Dick A, Van Den Hengel A. Ask me anything: Free-form visual question answering based on knowledge
                      from external sources. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4622−4630. [doi: 10.1109/
                      CVPR.2016.500]
   255   256   257   258   259   260   261   262   263   264   265