Page 100 - 《软件学报》2026年第1期
P. 100

吉品 等: 面向智能软件系统的测试用例生成方法综述                                                         97


                      Software Engineering Workshops. Seoul: ACM, 2020. 388–395. [doi: 10.1145/3387940.3391484]
                 [74]   Ji P, Feng Y, Liu J, Zhao ZH, Xu BW. Automated testing for machine translation via constituency invariance. In: Proc. of the 36th
                      IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 468–479. [doi: 10.1109/ASE51524.2021.
                      9678715]
                 [75]   Cao JL, Li MZN, Li YT, Wen M, Cheung SC, Chen HM. SemMT: A semantic-based testing approach for machine translation systems.
                      ACM Trans. on Software Engineering and Methodology, 2022, 31(2): 34e. [doi: 10.1145/3490488]
                 [76]   Wang  J,  Li  YH,  Huang  X,  Chen  L,  Zhang  XF,  Zhou  YM.  Back  deduction  based  testing  for  word  sense  disambiguation  ability  of
                      machine translation systems. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023.
                      601–613. [doi: 10.1145/3597926.3598081]
                 [77]   Xu YH, Li YH, Wang J, Zhang XF. Evaluating terminology translation in machine translation systems via metamorphic testing. In:
                      Proc.  of  the  39th  IEEE/ACM  Int’l  Conf.  on  Automated  Software  Engineering.  Sacramento:  ACM,  2024.  758–769.  [doi:  10.1145/
                      3691620.3695069]
                 [78]   Sun  ZY,  Chen  ZP,  Zhang  J,  Hao  D.  Fairness  testing  of  machine  translation  systems.  ACM  Trans.  on  Software  Engineering  and
                      Methodology, 2024, 33(6): 156. [doi: 10.1145/3664608]
                 [79]   Zhang QJ, Zhai J, Fang CR, Liu JW, Sun WS, Hu HC, Wang QY. Machine translation testing via syntactic tree pruning. ACM Trans. on
                      Software Engineering and Methodology, 2024, 33(5): 125. [doi: 10.1145/3640329]
                 [80]   Xie  XY,  Jin  S,  Chen  SQ,  Cheung  SC.  Word  closure-based  metamorphic  testing  for  machine  translation.  ACM  Trans.  on  Software
                      Engineering and Methodology, 2024, 33(8): 203. [doi: 10.1145/3675396]
                 [81]   Chen SQ, Jin S, Xie XY. Testing your question answering software via asking recursively. In: Proc. of the 36th IEEE/ACM Int’l Conf.
                      on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 104–116. [doi: 10.1109/ASE51524.2021.9678670]
                 [82]   Shen QC, Chen JJ, Zhang JM, Wang HY, Liu S, Tian MH. Natural test generation for precise testing of question answering software. In:
                      Proc.  of  the  37th  IEEE/ACM  Int’l  Conf.  on  Automated  Software  Engineering.  Rochester:  ACM,  2023.  71.  [doi:  10.1145/3551349.
                      3556953]
                 [83]   Liu ZX, Feng Y, Yin YN, Sun JY, Chen ZY, Xu BW. QATest: A uniform fuzzing framework for question answering systems. In: Proc.
                      of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2023. 81. [doi: 10.1145/3551349.3556929]
                 [84]   Kann K, Ebrahimi A, Koh J, Dudy S, Roncone A. Open-domain dialogue generation: What we can do, cannot do, and should do next.
                      In: Proc. of the 4th Workshop on NLP for Conversational AI. Dublin: ACL, 2022. 148–165. [doi: 10.18653/v1/2022.nlp4convai-1.13]
                 [85]   Feng Y, Shi QK, Gao XY, Wan J, Fang CR, Chen ZY. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural
                      networks. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 177–188. [doi: 10.1145/
                      3395363.3397357]
                 [86]   Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A method for automatic evaluation of machine translation. In: Proc. of the 40th Annual
                      Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 311–318. [doi: 10.3115/1073083.1073135]
                 [87]   Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proc. of
                      the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor:
                      ACL, 2005. 65–72.
                 [88]   Przybocki M, Peterson K, Bronsart S, Sanders G. The NIST 2008 metrics for machine translation challenge—Overview, methodology,
                      metrics, and results. Machine Translation, 2009, 23(2): 71–103. [doi: 10.1007/s10590-009-9065-6]
                 [89]   Asyrofi MH, Yang Z, Yusuf INB, Kang HJ, Thung F, Lo D. BiasFinder: Metamorphic test generation to uncover bias for sentiment
                      analysis systems. IEEE Trans. on Software Engineering, 2022, 48(12): 5087–5101. [doi: 10.1109/TSE.2021.3136169]
                 [90]   Yagcioglu S, Erdem A, Erdem E, Ikizler-Cinbis N. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes.
                      In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 1358–1368. [doi: 10.18653/v1/
                      D18-1166]
                 [91]   Labied M, Belangour A, Banane M, Erraissi A. An overview of automatic speech recognition preprocessing techniques. In: Proc. of the
                      2022 Int’l Conf. on Decision Aid Sciences and Applications (DASA). Chiangrai: IEEE, 2022. 804–809. [doi: 10.1109/DASA54658.
                      2022.9765043]
                 [92]   Asyrofi MH, Thung F, Lo D, Jiang LX. CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech. In:
                      Proc. of the 2020 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Adelaide: IEEE, 2020. 640–650. [doi: 10.1109/
                      ICSME46990.2020.00066]
                 [93]   Asyrofi MH, Yang Z, Lo D. CrossASR++: A modular differential testing framework for automatic speech recognition. In: Proc. of the
                      29th ACM Joint Meeting European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM,
   95   96   97   98   99   100   101   102   103   104   105