Page 261 - 《软件学报》2021年第8期
P. 261
包希港 等:视觉问答研究综述 2543
[77] Wu Q, Shen C, Wang P, Dick A, Van Den Hengel A. Image captioning and visual question answering based on attributes and
external knowledge. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017,40(6):1367−1381.
[78] Agrawal A, Batra D, Parikh D. Analyzing the behavior of visual question answering models. In: Proc. of the 2016 Conf. on
Empirical Methods in Natural Language Processing. 2016. 1955−1960.
[79] Zhang P, Goyal Y, Summers-Stay D, Batra D, Parikh D. Yin and yang: Balancing and answering binary visual questions. In: Proc.
of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 5014−5022. [doi: 10.1109/CVPR.2016.542]
[80] Shah M, Chen X, Rohrbach M, Parikh D. Cycle-consistency for robust visual question answering. In: Proc. of the IEEE Conf. on
Computer Vision and Pattern Recognition. 2019. 6649−6658.
[81] Xu X, Chen X, Liu C, Rohrbach A, Darrell T, Song D. Fooling vision and language models despite localization and attention
mechanism. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4951−4961. [doi: 10.1109/CVPR.
2018.00520]
[82] Agrawal A, Batra D, Parikh D, Kembhavi A. Don’t just assume; look and answer: Overcoming priors for visual question
answering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4971−4980. [doi: 10.1109/CVPR.
2018.00520]
[83] Chen L, Yan X, Xiao J, Zhang H, Pu S, Zhuang Y. Counterfactual samples synthesizing for robust visual question answering. In:
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2020. 10800−10809.
[84] Grand G, Belinkov Y. Adversarial regularization for visual question answering: Strengths, shortcomings, and side effects. In: Proc.
of the 57th Conf. on Computational Natural Language Learning. ACL, 2019. 1−13.
[85] Belinkov Y, Poliak A, Shieber SM, Durme BV, Rush AM. Don’t take the premise for granted: Mitigating artifacts in natural
language inference. In: Proc. of the 57th Conf. on Computational Natural Language Learning. ACL, 2019. 877−891.
[86] Cadene R, Dancette C, Cord M, Parikh D. Rubi: Reducing unimodal biases for visual question answering. In: Proc. of the
Advances in Neural Information Processing Systems. 2019. 841−852.
[87] Clark C, Yatskar M, Zettlemoyer L. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In:
Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing. 2019. 4069−4082.
[88] Mahabadi RK, Henderson J. Simple but effective techniques to reduce biases. arXiv preprint arXiv:1909.06321, 2019.
[89] Wu J, Mooney R. Self-critical reasoning for robust visual question answering. In: Proc. of the Advances in Neural Information
Processing Systems. 2019. 8604−8614.
[90] Singh A, Natarajan V, Shah M, Jiang Y, Chen X. Towards VQA models that can read. In: Proc. of the IEEE Conf. on Computer
Vision and Pattern Recognition. 2019. 8317−8326. [doi: 10.1109/CVPR.2019.00851]
[91] Biten AF, Tito R, Mafla A, Gomez L, Rusinol M, Valveny E, Jawahar CV, Karatzas D. Scene text visual question answering. In:
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 4291−4301. [doi: 10.1109/CVPR.2019.00851]
[92] Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. of
the IEEE Int’l Conf on Computer Vision. 2017. 2223−2232. [doi: 10.1109/ICCV.2017.244]
[93] Zhang Y, Hare J, Prügel-Bennett A. Learning to count objects in natural images for visual question answering. In: Proc. of the
Int’l Conf. on Learning Representations. 2018.
[94] Acharya M, Kafle K, Kanan C. TallyQA: Answering complex counting questions. In: Proc. of the AAAI Conf. on Artificial
Intelligence, Vol.33. 2019. 8076−8084.
[95] Shrestha R, Kafle K, Kanan C. Answer them all! Toward universal visual question answering models. In: Proc. of the IEEE Conf.
on Computer Vision and Pattern Recognition. 2019. 10472−10481. [doi: 10.1109/CVPR.2019.01072]
[96] Hudson D, Manning CD. Learning by abstraction: The neural state machine. In: Proc. of the Advances in Neural Information
Processing Systems. 2019. 5903−5916.
[97] Shi Y, Furlanello T, Zha S, Anandkumar A. Question type guided attention in visual question answering. In: Proc. of the
European Conf. on Computer Vision (ECCV). 2018. 151−166.
[98] Malinowski M, Fritz M. A multi-world approach to question answering about real-world scenes based on uncertain input. In: Proc.
of the Advances in Neural Information Processing Systems. 2014. 1682−1690.