Page 356 - 《软件学报》2025年第12期
P. 356
余建兴 等: 基于常识推理问答的多模态题文不符检测 5737
Annual Meeting of the Association for Computational Linguistics. Toronto: Association for Computational Linguistics, 2023. 5823–5840.
[doi: 10.18653/v1/2023.acl-long.320]
[66] Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z, Lin S, Guo BN. Swin Transformer: Hierarchical vision Transformer using shifted
windows. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 9992–10002. [doi: 10.1109/
ICCV48922.2021.00986]
[67] Lin TY, Goyal P, Girshick RB, He KM, Dollár P. Focal loss for dense object detection. In: Proc. of the 2017 IEEE Int’l Conf. on
Computer Vision. Venice: IEEE, 2017. 2999–3007. [doi: 10.1109/ICCV.2017.324]
[68] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc.
of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Minneapolis: Association for Computational Linguistics, 2019. 4171–4186. [doi: 10.18653/v1/N19-1423]
[69] Lu JS, Batra D, Parikh D, Lee S. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In:
Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 13–23.
[70] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Pamela M, Clark J, Krueger G, Sutskever I. Learning
transferable visual models from natural language supervision. In Proc. of the 38th Int’l Conf. on Machine Learning. 2021. 8748–8763.
[71] Neculoiu P, Versteegh M, Rotaru M. Learning text similarity with siamese recurrent networks. In: Proc. of the 1st Workshop on
Representation Learning for NLP. Berlin: Association for Computational Linguistics, 2016. 148–157. [doi: 10.18653/v1/W16-1617]
[72] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In:
Proc. of the 13th European Conf. on Computer Vision–ECCV. Zurich: Springer, 2014. 740–755. [doi: 10.1007/978-3-319-10602-1_48]
[73] Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y. Show, attend and tell: Neural image caption generation
with visual attention. In: Proc. of the 32nd Int’l Conf. on Machine Learning. Lille: JMLR.org, 2015. 2048–2057.
[74] Hossain A, Karimuzzaman M, Hossain MM, Rahman A. Text mining and sentiment analysis of newspaper headlines. Information, 2021,
12(10): 414. [doi: 10.3390/info12100414]
[75] Alcântara C, Moreira V, Feijo D. Offensive video detection: Dataset and baseline results. In: Proc. of the 12th Language Resources and
Evaluation Conf. Marseille: European Language Resources Association, 2020. 4309–4319.
[76] Ha Y, Kim J, Won D, Cha M, Joo J. Characterizing clickbaits on instagram. In: Proc. of the 12th Int’l AAAI Conf. on Web and Social
Media. Stanford: AAAI, 2018. 92–101. [doi: 10.1609/icwsm.v12i1.15019]
[77] Potthast M, Gollub T, Komlossy K, Schuster S, Wiegmann M, Fernandez EPG, Hagen M, Stein B. Crowdsourcing a large corpus of
clickbait on twitter. In: Proc. of the 27th Int’l Conf. on Computational Linguistics. Santa Fe: Association for Computational Linguistics,
2018. 1498–1507.
[78] Shu K, Mahudeswaran D, Wang SH, Lee D, Liu H. FakeNewsNet: A data repository with news content, social context, and
spatiotemporal information for studying fake news on social media. Big Data, 2020, 8(3): 171–188. [doi: 10.1089/big.2020.0062]
[79] Shu K, Mahudeswaran D, Wang SH, Liu H. Hierarchical propagation networks for fake news detection: Investigation and exploitation.
In: Proc. of the 14th Int’l AAAI Conf. on Web and Social Media. AAAI, 2020. 626–637. [doi: 10.1609/icwsm.v14i1.7329]
[80] Yang J, Vega-Oliveros D, Seibt T, Rocha A. Explainable fact-checking through question answering. In: Proc. of the 2022 IEEE Int’l
Conf. on Acoustics, Speech and Signal Processing. Singapore: IEEE, 2022. 8952–8956. [doi: 10.1109/ICASSP43922.2022.9747214]
[81] Wu Y, Zhan PW, Zhang YJ, Wang LM, Xu Z. Multimodal fusion with co-attention networks for fake news detection. In: Findings of the
Association for Computational Linguistics. Association for Computational Linguistics, 2021. 2560–2569. [doi: 10.18653/v1/2021.
findings-acl.226]
[82] Mowar P, Jain M, Goel R, Vishwakarma DK. Clickbait in youtube prevention, detection and analysis of the bait using ensemble learning.
arXiv:2112.08611, 2021.
[83] Chen ZW, Hu LM, Li WX, Shao YX, Nie LQ. Causal intervention and counterfactual reasoning for multi-modal fake news detection. In:
Proc. of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto: Association for Computational Linguistics,
2023. 627–638. [doi: 10.18653/v1/2023.acl-long.37]
[84] Wang JP, Ge YX, Yan R, Ge YY, Lin KQ, Tsutsui S, Lin XD, Cai GY, Wu JP, Shan Y, Qie XH, Shou MZ. All in one: Exploring unified
video-language pre-training. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE,
2023. 6598–6608. [doi: 10.1109/CVPR52729.2023.00638]
[85] Randolph JJ. Free-marginal multirater Kappa (multirater kfree): An alternative to Fleiss’ fixed-marginal multirater Kappa. 2005. https://
www.researchgate.net/publication/224890485_Free-Marginal_Multirater_Kappa_multirater_kfree_An_Alternative_to_Fleiss_Fixed-
Marginal_Multirater_Kappa
[86] Viera AJ, Garrett JM. Understanding interobserver agreement: The Kappa statistic. Family Medicine, 2005, 37(5): 360–363.

