Page 258 - 《软件学报》2021年第8期
P. 258

2540                                   Journal of Software  软件学报 Vol.32, No.8,  August 2021

                 [15]    Wu Q, Teney D, Wang P, Shen CH, Dick A, Van Den Hengel A. Visual question answering: A survey of methods and datasets.
                      Computer Vision and Image Understanding, 2017,163:21−40.
                 [16]    Kafle K, Kanan C. Visual  question answering: Datasets, algorithms, and  future challenges. Computer Vision and  Image
                      Understanding, 2017,163:3−20.
                 [17]    Goyal Y, Khot T, Summers-Stay D, Batra D, Parikh D. Making the V in VQA matter: Elevating the role of image understanding
                      in visual question answering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 6904−6913. [doi:
                      10.1109/CVPR.2017.670]
                 [18]    Ramakrishnan S, Agrawal A, Lee S. Overcoming language priors in visual question answering with adversarial regularization. In:
                      Proc. of the Advances in Neural Information Processing Systems. 2018. 1541−1551.
                 [19]    Yang Z, He X, Gao J, Deng L, Smola A. Stacked attention networks for image question answering. In: Proc. of the IEEE Conf. on
                      Computer Vision and Pattern Recognition. 2016. 21−29. [doi: 10.1109/CVPR.2016.10]
                 [20]    Deng J, Dong W, Socher R, Li L, Li K, Li F. Imagenet: A large-scale hierarchical image database. In: Proc. of the IEEE Conf. on
                      Computer Vision and Pattern Recognition. 2009. 248−255. [doi: 10.1109/CVPR.2009.5206848]
                 [21]    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the Int’l Conf on
                      Learning Representations. 2015.
                 [22]    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc. of the IEEE Conf. on Computer Vision and
                      Pattern Recognition. 2016. 770−778. [doi: 10.1109/CVPR.2016.90]
                 [23]    Szegedy C, Liu W,  Jia Y,  Sermanet  P, Reed  S, Anguelov  D,  Erhan D, Vanhoucke V,  Rabinovich A. Going  deeper with
                      convolutions. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2015. 1−9. [doi: 10.1109/CVPR.2015.
                      7298594]
                 [24]    Anderson P, He X, Buehler C, Teney D, Johnson M, Dould S, Zhang L. Bottom-up and top-down attention for image captioning
                      and visual question answering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6077−6086. [doi:
                      10.1109/CVPR.2018.00636]
                 [25]    Ren S, He K, Girshick R, Sun J. Faster r-CNN: Towards real-time object detection with region proposal networks. In: Proc. of the
                      Advances in Neural Information Processing Systems. 2015. 91−99.
                 [26]    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997,9(8):1735−1780.
                 [27]    Cho K, Van Merriënboer B, Gulcehre C, Bahdnau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using
                      RNN encoder-decoder for statistical machine translation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language
                      Processing (EMNLP). 2014. 1724−1734.
                 [28]    Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S. Skip-thought vectors. In: Proc. of the Advances in
                      Neural Information Processing Systems. 2015. 3294−3302.
                 [29]    Malinowski M, Rohrbach M, Fritz M. Ask your neurons: A neural-based approach to answering questions about images. In: Proc.
                      of the IEEE Int’l Conf on Computer Vision. 2015. 1−9. [doi: 10.1109/ICCV.2015.9]
                 [30]    Gao H, Mao J, Zhou J, Huang Z, Wang L, Xu W. Are you talking to a machine? Dataset and methods for multilingual image
                      question. In: Proc. of the Advances in Neural Information Processing Systems. 2015. 2296−2304.
                 [31]    Noh  H,  Hongsuck Seo P,  Han  B. Image question  answering using  convolutional neural  network with dynamic parameter
                      prediction. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 30−38. [doi: 10.1109/CVPR.2016.11]
                 [32]    Fukui A,  Park DH, Yang  D, Rohrbach  A, Darrell T, Rohrbach M. Multimodal compact  bilinear  pooling  for  visual  question
                      answering  and visual grounding. In: Proc. of the 2016  Conf. on  Empirical  Methods  in  Natural Language Processing. 2016.
                      457−468.
                 [33]    Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li J, Shamma DA, Bernstein MS, Li F. Visual
                      genome: Connecting language and vision using crowdsourced dense image annotations. Int’l Journal of Computer Vision, 2017,
                      123(1):32−73.
                 [34]    Kim JH, On KW, Lim W, Kim J, Ha J, Zhang B. Hadamard product for low-rank bilinear pooling. In: Proc. of the Int’l Conf. on
                      Learning Representations. 2017.
   253   254   255   256   257   258   259   260   261   262   263