Page 212 - 《软件学报》2025年第4期
P. 212

1618                                                       软件学报  2025  年第  36  卷第  4  期


                     without training group information. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 6781–6792.
                 [26]  Wu YX, Gardner M, Stenetorp P, Dasigi P. Generating data to mitigate spurious correlations in natural language inference datasets. In:
                     Proc.  of  the  60th  Annual  Meeting  of  the  Association  for  Computational  Linguistics  (Vol.  1:  Long  Papers).  Dublin:  Association  for
                     Computational Linguistics, 2022. 2660–2676. [doi: 10.18653/v1/2022.acl-long.190]
                 [27]  Gardner M, Merrill W, Dodge J, Peters M, Ross A, Singh S, Smith NA. Competency problems: On finding and removing artifacts in
                     language  data.  In:  Proc.  of  the  2021  Conf.  on  Empirical  Methods  in  Natural  Language  Processing.  Punta  Cana:  Association  for
                     Computational Linguistics, 2021. 1801–1813. [doi: 10.18653/v1/2021.emnlp-main.135]
                 [28]  Kaushik  D,  Hovy  E,  Lipton  ZC.  Learning  the  difference  that  makes  a  difference  with  counterfactually-augmented  data.  arXiv:
                     1909.12434, 2020.
                 [29]  Si  CL,  Zhang  ZY,  Qi  FC,  Liu  ZY,  Wang  YS,  Liu  Q,  Sun  MS.  Better  robustness  by  more  coverage:  Adversarial  and  mixup  data
                     augmentation for robust finetuning. In: Proc. of the 2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
                     Association for Computational Linguistics, 2021. 1569–1576. [doi: 10.18653/v1/2021.findings-acl.137]
                 [30]  Nie YX, Williams A, Dinan E, Bansal M, Weston J, Kiela D. Adversarial NLI: A new benchmark for natural language understanding. In:
                     Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020.
                     4885–4901. [doi: 10.18653/v1/2020.acl-main.441]
                     Linguistics, 2022. 93–104. [doi: 10.18653/v1/2022.acl-demo.9]
                 [31]  Lu KJ, Mardziel P, Wu FJ, Amancharla P, Datta A. Gender bias in neural natural language processing. In: Nigam V, Kirigin TB, Talcott
                     C, Guttman J, Kuznetsov S, Loo BT, Okada M, eds. Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion
                     of His 65th Birthday. Cham: Springer, 2020. 189–202. [doi: 10.1007/978-3-030-62077-6_14]
                 [32]  Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019,
                     1(8): 9.
                 [33]  Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou YQ, Li W, Liu PJ. Exploring the limits of transfer learning with a
                     unified text-to-text Transformer. Journal of Machine Learning Research, 2020, 21(1): 5485–5551.
                 [34]  Wei J, Wang XZ, Schuurmans D, Bosma M, Ichter B, Xia F, Chi EH, Le QV, Zhou D. Chain-of-thought prompting elicits reasoning in
                     large language models. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc.,
                     2022. 24824–24837.
                 [35]  Lampinen A, Dasgupta I, Chan S, Mathewson K, Tessler M, Creswell A, McClelland J, Wang J, Hill F. Can language models learn from
                     explanations in context? In: Proc. of the 2022 Findings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi:
                     Association for Computational Linguistics, 2022. 537–563. [doi: 10.18653/v1/2022.findings-emnlp.38]
                 [36]  Stacey J, Belinkov Y, Rei M. Supervising model attention with human explanations for robust natural language inference. In: Proc. of the
                     36th AAAI Conf. on Artificial Intelligence. AAAI, 2022. 11349–11357. [doi: 10.1609/aaai.v36i10.21386]
                 [37]  Chen H, He J, Narasimhan K, Chen DQ. Can rationalization improve robustness? In: Proc. of the 2022 Conf. of the North American
                     Chapter  of  the  Association  for  Computational  Linguistics:  Human  Language  Technologies.  Seattle:  Association  for  Computational
                     Linguistics, 2022. 3792–3805. [doi: 10.18653/v1/2022.naacl-main.278]
                 [38]  Schuster T, Fisch A, Barzilay R. Get your vitamin C! Robust fact verification with contrastive evidence. In: Proc. of the 2021 Conf. of the
                     North  American  Chapter  of  the  Association  for  Computational  Linguistics:  Human  Language  Technologies.  Online:  Association  for
                     Computational Linguistics, 2021. 624–643. [doi: 10.18653/v1/2021.naacl-main.52]
                 [39]  Min S, Lyu XX, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L. Rethinking the role of demonstrations: What makes in-
                     context learning work? In: Proc. of the 2022 Conf. on Empirical Methods in Natural Language Processing. Abu Dhabi: Association for
                     Computational Linguistics, 2022. 11048–11064. [doi: 10.18653/v1/2022.emnlp-main.759]
                 [40]  Bach S, Sanh V, Yong ZX, Webson A, Raffel C, Nayak NV, Sharma A, Kim T, Bari MS, Fevry T, Alyafeai Z, Dey M, Santilli A, Sun ZQ,
                     Ben-David S, Xu CW, Chhablani G, Wang H, Fries J, Al-Shaibani M, Sharma S, Thakker U, Almubarak K, Tang XR, Radev D, Jiang
                     MTJ, Rush A. PromptSource: An integrated development environment and repository for natural language prompts. In: Proc. of the 60th
                     Annual  Meeting  of  the  Association  for  Computational  Linguistics:  System  Demonstrations.  Dublin:  Association  for  Computational


                 [41]  Sclar M, Choi Y, Tsvetkov Y, Suhr A. Quantifying language models’ sensitivity to spurious features in prompt design or: How I learned
                     to start worrying about prompt formatting. arXiv:2310.11324, 2023.
                 [42]  Ludan JM, Meng YX, Nguyen T, Shah S, Lyu Q, Apidianaki M, Callison-Burch C. Explanation-based finetuning makes models more
                     robust to spurious cues. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers).
                     Toronto: Association for Computational Linguistics, 2023. 4420–4441. [doi: 10.18653/v1/2023.acl-long.242]
                 [43]  Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A,
   207   208   209   210   211   212   213   214   215   216   217