Page 261 - 《软件学报》2021年第8期
P. 261

包希港  等:视觉问答研究综述                                                                 2543


                 [77]    Wu Q, Shen C, Wang P, Dick A, Van Den Hengel A. Image captioning and visual question answering based on attributes and
                      external knowledge. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017,40(6):1367−1381.
                 [78]    Agrawal A,  Batra  D, Parikh D.  Analyzing the behavior of visual question  answering  models. In: Proc. of the 2016  Conf. on
                      Empirical Methods in Natural Language Processing. 2016. 1955−1960.
                 [79]    Zhang P, Goyal Y, Summers-Stay D, Batra D, Parikh D. Yin and yang: Balancing and answering binary visual questions. In: Proc.
                      of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 5014−5022. [doi: 10.1109/CVPR.2016.542]
                 [80]    Shah M, Chen X, Rohrbach M, Parikh D. Cycle-consistency for robust visual question answering. In: Proc. of the IEEE Conf. on
                      Computer Vision and Pattern Recognition. 2019. 6649−6658.
                 [81]    Xu X, Chen X, Liu C, Rohrbach A, Darrell T, Song D. Fooling vision and language models despite localization and attention
                      mechanism. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4951−4961. [doi: 10.1109/CVPR.
                      2018.00520]
                 [82]    Agrawal  A, Batra  D,  Parikh D, Kembhavi  A. Don’t just assume;  look and answer: Overcoming  priors  for  visual  question
                      answering. In: Proc. of the IEEE  Conf. on  Computer  Vision  and Pattern  Recognition. 2018. 4971−4980. [doi: 10.1109/CVPR.
                      2018.00520]
                 [83]    Chen L, Yan X, Xiao J, Zhang H, Pu S, Zhuang Y. Counterfactual samples synthesizing for robust visual question answering. In:
                      Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2020. 10800−10809.
                 [84]    Grand G, Belinkov Y. Adversarial regularization for visual question answering: Strengths, shortcomings, and side effects. In: Proc.
                      of the 57th Conf. on Computational Natural Language Learning. ACL, 2019. 1−13.
                 [85]    Belinkov Y,  Poliak A,  Shieber  SM, Durme BV, Rush AM. Don’t  take the  premise  for  granted: Mitigating artifacts  in  natural
                      language inference. In: Proc. of the 57th Conf. on Computational Natural Language Learning. ACL, 2019. 877−891.
                 [86]    Cadene R,  Dancette  C,  Cord  M, Parikh  D.  Rubi:  Reducing unimodal biases for visual question  answering.  In: Proc. of the
                      Advances in Neural Information Processing Systems. 2019. 841−852.
                 [87]    Clark C, Yatskar M, Zettlemoyer L. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In:
                      Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing. 2019. 4069−4082.
                 [88]    Mahabadi RK, Henderson J. Simple but effective techniques to reduce biases. arXiv preprint arXiv:1909.06321, 2019.
                 [89]    Wu J, Mooney R. Self-critical reasoning for robust visual question answering. In: Proc. of the Advances in Neural Information
                      Processing Systems. 2019. 8604−8614.
                 [90]    Singh A, Natarajan V, Shah M, Jiang Y, Chen X. Towards VQA models that can read. In: Proc. of the IEEE Conf. on Computer
                      Vision and Pattern Recognition. 2019. 8317−8326. [doi: 10.1109/CVPR.2019.00851]
                 [91]    Biten AF, Tito R, Mafla A, Gomez L, Rusinol M, Valveny E, Jawahar CV, Karatzas D. Scene text visual question answering. In:
                      Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 4291−4301. [doi: 10.1109/CVPR.2019.00851]
                 [92]    Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. of
                      the IEEE Int’l Conf on Computer Vision. 2017. 2223−2232. [doi: 10.1109/ICCV.2017.244]
                 [93]    Zhang Y, Hare J, Prügel-Bennett A. Learning to count objects in natural images for visual question answering. In: Proc. of the
                      Int’l Conf. on Learning Representations. 2018.
                 [94]    Acharya  M, Kafle K, Kanan C.  TallyQA:  Answering complex counting questions. In:  Proc.  of  the AAAI Conf.  on  Artificial
                      Intelligence, Vol.33. 2019. 8076−8084.
                 [95]    Shrestha R, Kafle K, Kanan C. Answer them all! Toward universal visual question answering models. In: Proc. of the IEEE Conf.
                      on Computer Vision and Pattern Recognition. 2019. 10472−10481. [doi: 10.1109/CVPR.2019.01072]
                 [96]    Hudson D, Manning CD. Learning by abstraction: The neural state machine. In: Proc. of the Advances in Neural Information
                      Processing Systems. 2019. 5903−5916.
                 [97]    Shi  Y, Furlanello  T,  Zha S, Anandkumar  A.  Question type  guided  attention in visual question  answering. In: Proc. of the
                      European Conf. on Computer Vision (ECCV). 2018. 151−166.
                 [98]    Malinowski M, Fritz M. A multi-world approach to question answering about real-world scenes based on uncertain input. In: Proc.
                      of the Advances in Neural Information Processing Systems. 2014. 1682−1690.
   256   257   258   259   260   261   262   263   264   265   266