Page 162 - 《软件学报》2025年第10期
P. 162

李志强 等: SZZ  误标变更对移动      APP  即时缺陷预测性能和解释的影响                                    4559


                 3
                 (School of Computer Science, Wuhan University, Wuhan 430072, China)
                 4
                 (School of Computer, Guangdong University of Petrochemical Technology, Maoming 525011, China)
                 5
                 (School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China)
                 Abstract:  In recent years, as an algorithm for identifying bug-introducing changes, SZZ has been widely employed in just-in-time software
                 defect  prediction.  Previous  studies  show  that  the  SZZ  algorithm  may  mislabel  data  during  data  annotation,  which  could  influence  the
                 dataset  quality  and  consequently  the  performance  of  the  defect  prediction  model.  Therefore,  researchers  have  made  improvements  to  the
                 SZZ  algorithm  and  proposed  multiple  variants  of  SZZ.  However,  there  is  no  empirical  study  to  explore  the  effect  of  data  annotation
                 quality  by  SZZ  on  the  performance  and  interpretability  of  just-in-time  defect  prediction  for  mobile  APP.  To  investigate  the  influence  of
                 mislabeled  changes  by  SZZ  on  just-in-time  defect  prediction  for  mobile  APP,  this  study  conducts  an  extensive  and  in-depth  empirical
                 comparison  of  four  SZZ  algorithms.  Firstly,  17  large-scale  mobile  APP  projects  are  selected  from  the  GitHub  repository,  and  software
                 metrics  are  extracted  by  adopting  the  PyDriller  tool.  Then,  B-SZZ  (original  SZZ),  AG-SZZ,  MA-SZZ,  and  RA-SZZ  are  employed  for  data
                 annotation.  Then,  the  just-in-time  defect  prediction  models  are  built  with  random  forest,  naive  Bayes,  and  logistic  regression  classifiers
                 based  on  the  time-series  data  partitioning.  Finally,  the  performance  of  the  models  is  evaluated  by  traditional  measures  of  AUC,  MCC,  and
                 G-mean,  and  effort-aware  measures  of  F-measure@20%  and  IFA,  and  a  statistical  significance  test  and  interpretability  analysis  are
                 conducted  on  the  results  by  employing  SKESD  and  SHAP  respectively.  By  comparing  the  annotation  performance  of  the  four  SZZ
                 algorithms,  the  results  are  as  follows.  (1)  The  data  annotation  quality  conforms  to  the  progressive  relationship  among  SZZ  variants.  (2)
                 The  mislabeled  changes  by  B-SZZ,  AG-SZZ,  and  MA-SZZ  can  cause  performance  reduction  of  AUC  and  MCC  of  different  levels,  but
                 cannot lead to performance reduction of G-mean. (3) B-SZZ is likely to cause a performance reduction of F-measure@20%, while B-SZZ,
                 AG-SZZ,  and  MA-SZZ  are  unlikely  to  increase  effort  during  code  inspection.  (4)  In  terms  of  model  interpretation,  different  SZZ
                 algorithms  will  influence  the  three  metrics  with  the  largest  contribution  during  the  prediction,  and  the  la  metric  has  a  significant  influence
                 on the prediction results.
                 Key words:  just-in-time  software  defect  prediction;  mobile  APP;  SZZ  method;  mining  software  repository;  interpretability;  effort  aware;
                         empirical software engineering
                    随着互联网的快速发展, 智能手机已成为人们生活中不可或缺的必备工具. 截至目前, 全球移动用户数量
                 已达到   30  亿 (https://newzoo.com/resources/trend-reports/newzoo-global-mobile-market-report-2019-light-version), 这
                 极大地促进了移动应用市场的繁荣发展. 然而, 随着用户需求的不断提高, 应用程序的各种功能需要不断更新.
                 例如, 在移动    APP  的版本迭代过程中, 由于一些不可控因素, 新版本应用程序发布后可能会引入缺陷, 从而影
                 响软件质量. 因此, 在发布新版本之前及时发现缺陷并反馈给相关开发人员进行修复已成为一项迫切需要解决
                 的问题  [1−4] .
                    为了降低软件缺陷所带来的成本并提升软件质量, 研究人员提出了基于变更级的软件缺陷预测                                [5] . 近年来, 该
                 技术越来越受到关注       [6−8] . Kamei 等人  [9] 将该技术称为即时缺陷预测   (just-in-time defect prediction). 相较于预测文
                 件或模块的缺陷倾向性        [10,11] , 即时缺陷预测可以帮助开发人员检查更少的风险代码, 在代码变更提交时即可进行
                 预测, 以判定是否为缺陷引入的变更, 从而更容易进行缺陷定位, 便于开发人员及时地进行代码审查, 并能及早地
                 在代码提交前发现缺陷        [9] . 由于即时缺陷预测技术具有细粒度、即时性和易追溯的特点, 尤其适用于频繁进行更
                 新且涉及大量的代码提交的软件产品, 例如移动               APP. 因此, 本文将重点研究面向移动          APP  的即时软件缺陷预测.
                 主要原因如下: (1) 移动    APP  的发布周期通常较短, 版本迭代速度较快, 这对于及时发现和修复缺陷至关重要, 以
                 确保新版本的稳定性与质量; (2) 用户可以随时随地下载与使用移动                    APP  应用, 意味着缺陷在任何时间都有可能
                 发生, 在缺陷出现后若能尽快提供反馈, 这将有助于开发小组及时修复缺陷; (3) 用户体验至关重要, 及时发现和修
                 复缺陷可以避免用户在使用          APP  应用时遇到问题, 进而提升用户满意度.
                    在即时软件缺陷预测技术中, 从项目的代码变更历史中准确定位引入缺陷的变更是其中最关键的环节之一.
                 软件开发过程通常包含了大量的变更历史, 手动筛选引入缺陷的变更非常耗时且繁琐. 因此, 研究人员提出了                                 SZZ
                 算法, 旨在自动识别引入缺陷的变更           [12−15] . SZZ  算法由  Sliwerski、Zimmermann  和  Zeller 这  3  位研究人员提出  [12] ,
                 该算法首先通过缺陷关键词来定位引入缺陷的变更, 例如                  bug、fix、crash、fault 等. 具体而言, SZZ  首先根据代码
                 变更日志中包含这些关键词的变更来定位缺陷, 并将这些变更中所修改的代码行标注为缺陷行. 其次, SZZ                               对这些
   157   158   159   160   161   162   163   164   165   166   167