Page 212 - 《软件学报》2025年第4期
P. 212
1618 软件学报 2025 年第 36 卷第 4 期
without training group information. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 6781–6792.
[26] Wu YX, Gardner M, Stenetorp P, Dasigi P. Generating data to mitigate spurious correlations in natural language inference datasets. In:
Proc. of the 60th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Dublin: Association for
Computational Linguistics, 2022. 2660–2676. [doi: 10.18653/v1/2022.acl-long.190]
[27] Gardner M, Merrill W, Dodge J, Peters M, Ross A, Singh S, Smith NA. Competency problems: On finding and removing artifacts in
language data. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: Association for
Computational Linguistics, 2021. 1801–1813. [doi: 10.18653/v1/2021.emnlp-main.135]
[28] Kaushik D, Hovy E, Lipton ZC. Learning the difference that makes a difference with counterfactually-augmented data. arXiv:
1909.12434, 2020.
[29] Si CL, Zhang ZY, Qi FC, Liu ZY, Wang YS, Liu Q, Sun MS. Better robustness by more coverage: Adversarial and mixup data
augmentation for robust finetuning. In: Proc. of the 2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Association for Computational Linguistics, 2021. 1569–1576. [doi: 10.18653/v1/2021.findings-acl.137]
[30] Nie YX, Williams A, Dinan E, Bansal M, Weston J, Kiela D. Adversarial NLI: A new benchmark for natural language understanding. In:
Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020.
4885–4901. [doi: 10.18653/v1/2020.acl-main.441]
Linguistics, 2022. 93–104. [doi: 10.18653/v1/2022.acl-demo.9]
[31] Lu KJ, Mardziel P, Wu FJ, Amancharla P, Datta A. Gender bias in neural natural language processing. In: Nigam V, Kirigin TB, Talcott
C, Guttman J, Kuznetsov S, Loo BT, Okada M, eds. Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion
of His 65th Birthday. Cham: Springer, 2020. 189–202. [doi: 10.1007/978-3-030-62077-6_14]
[32] Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019,
1(8): 9.
[33] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou YQ, Li W, Liu PJ. Exploring the limits of transfer learning with a
unified text-to-text Transformer. Journal of Machine Learning Research, 2020, 21(1): 5485–5551.
[34] Wei J, Wang XZ, Schuurmans D, Bosma M, Ichter B, Xia F, Chi EH, Le QV, Zhou D. Chain-of-thought prompting elicits reasoning in
large language models. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc.,
2022. 24824–24837.
[35] Lampinen A, Dasgupta I, Chan S, Mathewson K, Tessler M, Creswell A, McClelland J, Wang J, Hill F. Can language models learn from
explanations in context? In: Proc. of the 2022 Findings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi:
Association for Computational Linguistics, 2022. 537–563. [doi: 10.18653/v1/2022.findings-emnlp.38]
[36] Stacey J, Belinkov Y, Rei M. Supervising model attention with human explanations for robust natural language inference. In: Proc. of the
36th AAAI Conf. on Artificial Intelligence. AAAI, 2022. 11349–11357. [doi: 10.1609/aaai.v36i10.21386]
[37] Chen H, He J, Narasimhan K, Chen DQ. Can rationalization improve robustness? In: Proc. of the 2022 Conf. of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle: Association for Computational
Linguistics, 2022. 3792–3805. [doi: 10.18653/v1/2022.naacl-main.278]
[38] Schuster T, Fisch A, Barzilay R. Get your vitamin C! Robust fact verification with contrastive evidence. In: Proc. of the 2021 Conf. of the
North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for
Computational Linguistics, 2021. 624–643. [doi: 10.18653/v1/2021.naacl-main.52]
[39] Min S, Lyu XX, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L. Rethinking the role of demonstrations: What makes in-
context learning work? In: Proc. of the 2022 Conf. on Empirical Methods in Natural Language Processing. Abu Dhabi: Association for
Computational Linguistics, 2022. 11048–11064. [doi: 10.18653/v1/2022.emnlp-main.759]
[40] Bach S, Sanh V, Yong ZX, Webson A, Raffel C, Nayak NV, Sharma A, Kim T, Bari MS, Fevry T, Alyafeai Z, Dey M, Santilli A, Sun ZQ,
Ben-David S, Xu CW, Chhablani G, Wang H, Fries J, Al-Shaibani M, Sharma S, Thakker U, Almubarak K, Tang XR, Radev D, Jiang
MTJ, Rush A. PromptSource: An integrated development environment and repository for natural language prompts. In: Proc. of the 60th
Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Dublin: Association for Computational
[41] Sclar M, Choi Y, Tsvetkov Y, Suhr A. Quantifying language models’ sensitivity to spurious features in prompt design or: How I learned
to start worrying about prompt formatting. arXiv:2310.11324, 2023.
[42] Ludan JM, Meng YX, Nguyen T, Shah S, Lyu Q, Apidianaki M, Callison-Burch C. Explanation-based finetuning makes models more
robust to spurious cues. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers).
Toronto: Association for Computational Linguistics, 2023. 4420–4441. [doi: 10.18653/v1/2023.acl-long.242]
[43] Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A,