Page 100 - 《软件学报》2026年第1期
P. 100
吉品 等: 面向智能软件系统的测试用例生成方法综述 97
Software Engineering Workshops. Seoul: ACM, 2020. 388–395. [doi: 10.1145/3387940.3391484]
[74] Ji P, Feng Y, Liu J, Zhao ZH, Xu BW. Automated testing for machine translation via constituency invariance. In: Proc. of the 36th
IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 468–479. [doi: 10.1109/ASE51524.2021.
9678715]
[75] Cao JL, Li MZN, Li YT, Wen M, Cheung SC, Chen HM. SemMT: A semantic-based testing approach for machine translation systems.
ACM Trans. on Software Engineering and Methodology, 2022, 31(2): 34e. [doi: 10.1145/3490488]
[76] Wang J, Li YH, Huang X, Chen L, Zhang XF, Zhou YM. Back deduction based testing for word sense disambiguation ability of
machine translation systems. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023.
601–613. [doi: 10.1145/3597926.3598081]
[77] Xu YH, Li YH, Wang J, Zhang XF. Evaluating terminology translation in machine translation systems via metamorphic testing. In:
Proc. of the 39th IEEE/ACM Int’l Conf. on Automated Software Engineering. Sacramento: ACM, 2024. 758–769. [doi: 10.1145/
3691620.3695069]
[78] Sun ZY, Chen ZP, Zhang J, Hao D. Fairness testing of machine translation systems. ACM Trans. on Software Engineering and
Methodology, 2024, 33(6): 156. [doi: 10.1145/3664608]
[79] Zhang QJ, Zhai J, Fang CR, Liu JW, Sun WS, Hu HC, Wang QY. Machine translation testing via syntactic tree pruning. ACM Trans. on
Software Engineering and Methodology, 2024, 33(5): 125. [doi: 10.1145/3640329]
[80] Xie XY, Jin S, Chen SQ, Cheung SC. Word closure-based metamorphic testing for machine translation. ACM Trans. on Software
Engineering and Methodology, 2024, 33(8): 203. [doi: 10.1145/3675396]
[81] Chen SQ, Jin S, Xie XY. Testing your question answering software via asking recursively. In: Proc. of the 36th IEEE/ACM Int’l Conf.
on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 104–116. [doi: 10.1109/ASE51524.2021.9678670]
[82] Shen QC, Chen JJ, Zhang JM, Wang HY, Liu S, Tian MH. Natural test generation for precise testing of question answering software. In:
Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2023. 71. [doi: 10.1145/3551349.
3556953]
[83] Liu ZX, Feng Y, Yin YN, Sun JY, Chen ZY, Xu BW. QATest: A uniform fuzzing framework for question answering systems. In: Proc.
of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2023. 81. [doi: 10.1145/3551349.3556929]
[84] Kann K, Ebrahimi A, Koh J, Dudy S, Roncone A. Open-domain dialogue generation: What we can do, cannot do, and should do next.
In: Proc. of the 4th Workshop on NLP for Conversational AI. Dublin: ACL, 2022. 148–165. [doi: 10.18653/v1/2022.nlp4convai-1.13]
[85] Feng Y, Shi QK, Gao XY, Wan J, Fang CR, Chen ZY. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural
networks. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 177–188. [doi: 10.1145/
3395363.3397357]
[86] Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A method for automatic evaluation of machine translation. In: Proc. of the 40th Annual
Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 311–318. [doi: 10.3115/1073083.1073135]
[87] Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proc. of
the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor:
ACL, 2005. 65–72.
[88] Przybocki M, Peterson K, Bronsart S, Sanders G. The NIST 2008 metrics for machine translation challenge—Overview, methodology,
metrics, and results. Machine Translation, 2009, 23(2): 71–103. [doi: 10.1007/s10590-009-9065-6]
[89] Asyrofi MH, Yang Z, Yusuf INB, Kang HJ, Thung F, Lo D. BiasFinder: Metamorphic test generation to uncover bias for sentiment
analysis systems. IEEE Trans. on Software Engineering, 2022, 48(12): 5087–5101. [doi: 10.1109/TSE.2021.3136169]
[90] Yagcioglu S, Erdem A, Erdem E, Ikizler-Cinbis N. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes.
In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 1358–1368. [doi: 10.18653/v1/
D18-1166]
[91] Labied M, Belangour A, Banane M, Erraissi A. An overview of automatic speech recognition preprocessing techniques. In: Proc. of the
2022 Int’l Conf. on Decision Aid Sciences and Applications (DASA). Chiangrai: IEEE, 2022. 804–809. [doi: 10.1109/DASA54658.
2022.9765043]
[92] Asyrofi MH, Thung F, Lo D, Jiang LX. CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech. In:
Proc. of the 2020 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Adelaide: IEEE, 2020. 640–650. [doi: 10.1109/
ICSME46990.2020.00066]
[93] Asyrofi MH, Yang Z, Lo D. CrossASR++: A modular differential testing framework for automatic speech recognition. In: Proc. of the
29th ACM Joint Meeting European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM,

