Page 260 - 《软件学报》2021年第8期
P. 260
2542 Journal of Software 软件学报 Vol.32, No.8, August 2021
[56] Wu Q, Wang P, Shen C, Reid I, Van Den Hengel A. Are you talking to me? Reasoned visual dialog generation through
adversarial learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6106−6115. [doi: 10.1109/
CVPR.2018.00639]
[57] Yu Z, Yu J, Cui Y, Tao D, Tian Q. Deep modular co-attention networks for visual question answering. In: Proc. of the IEEE Conf.
on Computer Vision and Pattern Recognition. 2019. 6281−6290. [doi: 10.1109/CVPR.2019.00644]
[58] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lukasz K, Polosukhin I. Attention is all you need. In: Proc.
of the Advances in Neural Information Processing Systems. 2017. 5998−6008.
[59] Gao P, Jiang Z, You H, Lu P, Hoi S, Wang X, Li H. Dynamic fusion with intra-and inter-modality attention flow for visual
question answering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 6639−6648. [doi: 10.1109/
CVPR.2019.00680]
[60] Teney D, Anderson P, He X, Van Den Hengel A. Tips and tricks for visual question answering: Learnings from the 2017
challenge. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4223−4232. [doi: 10.1109/CVPR.2018.
00444]
[61] Lu P, Li H, Zhang W, Wang J, Wang X. Co-attending free-form regions and detections with multi-modal multiplicative feature
embedding for visual question answering. arXiv preprint arXiv:1711.06794, 2017.
[62] Wu C, Liu J, Wang X, Dong X. Object-difference attention: A simple relational attention for visual question answering. In: Proc.
of the 26th ACM Int’l Conf. on Multimedia. 2018. 519−527.
[63] Cadene R, Ben-Younes H, Cord M, Thome N. Murel: Multimodal relational reasoning for visual question answering. In: Proc. of
the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 1989−1998. [doi: 10.1109/CVPR.2019.00209]
[64] Li L, Gan Z, Cheng Y, Liu J. Relation-aware graph attention network for visual question answering. In: Proc. of the IEEE Int’l
Conf. on Computer Vision. 2019. 10313−10322. [doi: 10.1109/ICCV.2019.01041]
[65] Andreas J, Rohrbach M, Darrell T, Klein D. Neural module networks. In: Proc. of the IEEE Conf. on Computer Vision and
Pattern Recognition. 2016. 39−48.
[66] Klein D, Manning CD. Accurate unlexicalized parsing. In: Proc. of the 41st Annual Meeting of the Association for Computational
Linguistics. 2003. 423−430.
[67] De Marneffe MC, Manning CD. The Stanford typed dependencies representation. In: Proc. of the Workshop on Cross-framework
and Cross-domain Parser Evaluation. 2008. 1−8.
[68] Andreas J, Rohrbach M, Darrell T, Klein D. Learning to compose neural networks for question answering. In: Proc. of the Annual
Conf. of the North American Chapter of the Association for Computational Linguistics. 2016. 1545-1554.
[69] Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K. Learning to reason: End-to-end module networks for visual question
answering. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 804−813.
[70] Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R. Ask me anything: Dynamic
memory networks for natural language processing. In: Proc. of the Int’l Conf. on Machine Learning. 2016. 1378−1387.
[71] Xiong C, Merity S, Socher R. Dynamic memory networks for visual and textual question answering. In: Proc. of the Int’l Conf. on
Machine Learning. 2016. 2397−2406.
[72] Noh H, Han B. Training recurrent answering units with joint loss minimization for VQA. arXiv preprint arXiv:1606.03647, 2016.
[73] Wang P, Wu Q, Shen C, Van Den Hengel A, Dick A. Explicit knowledge-based reasoning for visual question answering. arXiv
preprint arXiv:1511.02570, 2015.
[74] Wang P, Wu Q, Shen C, Dick A, Van Den Hengel A. FVQA: Fact-based visual question answering. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 2018,40(10):2413−2427.
[75] Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z. Dbpedia: A nucleus for a Web of open data. In: Proc. of the
Semantic Web. Berlin, Heidelberg: Springer-Verlag, 2007. 722−735.
[76] Wu Q, Wang P, Shen C, Dick A, Van Den Hengel A. Ask me anything: Free-form visual question answering based on knowledge
from external sources. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4622−4630. [doi: 10.1109/
CVPR.2016.500]