Page 162 - 《软件学报》2025年第10期
P. 162
李志强 等: SZZ 误标变更对移动 APP 即时缺陷预测性能和解释的影响 4559
3
(School of Computer Science, Wuhan University, Wuhan 430072, China)
4
(School of Computer, Guangdong University of Petrochemical Technology, Maoming 525011, China)
5
(School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China)
Abstract: In recent years, as an algorithm for identifying bug-introducing changes, SZZ has been widely employed in just-in-time software
defect prediction. Previous studies show that the SZZ algorithm may mislabel data during data annotation, which could influence the
dataset quality and consequently the performance of the defect prediction model. Therefore, researchers have made improvements to the
SZZ algorithm and proposed multiple variants of SZZ. However, there is no empirical study to explore the effect of data annotation
quality by SZZ on the performance and interpretability of just-in-time defect prediction for mobile APP. To investigate the influence of
mislabeled changes by SZZ on just-in-time defect prediction for mobile APP, this study conducts an extensive and in-depth empirical
comparison of four SZZ algorithms. Firstly, 17 large-scale mobile APP projects are selected from the GitHub repository, and software
metrics are extracted by adopting the PyDriller tool. Then, B-SZZ (original SZZ), AG-SZZ, MA-SZZ, and RA-SZZ are employed for data
annotation. Then, the just-in-time defect prediction models are built with random forest, naive Bayes, and logistic regression classifiers
based on the time-series data partitioning. Finally, the performance of the models is evaluated by traditional measures of AUC, MCC, and
G-mean, and effort-aware measures of F-measure@20% and IFA, and a statistical significance test and interpretability analysis are
conducted on the results by employing SKESD and SHAP respectively. By comparing the annotation performance of the four SZZ
algorithms, the results are as follows. (1) The data annotation quality conforms to the progressive relationship among SZZ variants. (2)
The mislabeled changes by B-SZZ, AG-SZZ, and MA-SZZ can cause performance reduction of AUC and MCC of different levels, but
cannot lead to performance reduction of G-mean. (3) B-SZZ is likely to cause a performance reduction of F-measure@20%, while B-SZZ,
AG-SZZ, and MA-SZZ are unlikely to increase effort during code inspection. (4) In terms of model interpretation, different SZZ
algorithms will influence the three metrics with the largest contribution during the prediction, and the la metric has a significant influence
on the prediction results.
Key words: just-in-time software defect prediction; mobile APP; SZZ method; mining software repository; interpretability; effort aware;
empirical software engineering
随着互联网的快速发展, 智能手机已成为人们生活中不可或缺的必备工具. 截至目前, 全球移动用户数量
已达到 30 亿 (https://newzoo.com/resources/trend-reports/newzoo-global-mobile-market-report-2019-light-version), 这
极大地促进了移动应用市场的繁荣发展. 然而, 随着用户需求的不断提高, 应用程序的各种功能需要不断更新.
例如, 在移动 APP 的版本迭代过程中, 由于一些不可控因素, 新版本应用程序发布后可能会引入缺陷, 从而影
响软件质量. 因此, 在发布新版本之前及时发现缺陷并反馈给相关开发人员进行修复已成为一项迫切需要解决
的问题 [1−4] .
为了降低软件缺陷所带来的成本并提升软件质量, 研究人员提出了基于变更级的软件缺陷预测 [5] . 近年来, 该
技术越来越受到关注 [6−8] . Kamei 等人 [9] 将该技术称为即时缺陷预测 (just-in-time defect prediction). 相较于预测文
件或模块的缺陷倾向性 [10,11] , 即时缺陷预测可以帮助开发人员检查更少的风险代码, 在代码变更提交时即可进行
预测, 以判定是否为缺陷引入的变更, 从而更容易进行缺陷定位, 便于开发人员及时地进行代码审查, 并能及早地
在代码提交前发现缺陷 [9] . 由于即时缺陷预测技术具有细粒度、即时性和易追溯的特点, 尤其适用于频繁进行更
新且涉及大量的代码提交的软件产品, 例如移动 APP. 因此, 本文将重点研究面向移动 APP 的即时软件缺陷预测.
主要原因如下: (1) 移动 APP 的发布周期通常较短, 版本迭代速度较快, 这对于及时发现和修复缺陷至关重要, 以
确保新版本的稳定性与质量; (2) 用户可以随时随地下载与使用移动 APP 应用, 意味着缺陷在任何时间都有可能
发生, 在缺陷出现后若能尽快提供反馈, 这将有助于开发小组及时修复缺陷; (3) 用户体验至关重要, 及时发现和修
复缺陷可以避免用户在使用 APP 应用时遇到问题, 进而提升用户满意度.
在即时软件缺陷预测技术中, 从项目的代码变更历史中准确定位引入缺陷的变更是其中最关键的环节之一.
软件开发过程通常包含了大量的变更历史, 手动筛选引入缺陷的变更非常耗时且繁琐. 因此, 研究人员提出了 SZZ
算法, 旨在自动识别引入缺陷的变更 [12−15] . SZZ 算法由 Sliwerski、Zimmermann 和 Zeller 这 3 位研究人员提出 [12] ,
该算法首先通过缺陷关键词来定位引入缺陷的变更, 例如 bug、fix、crash、fault 等. 具体而言, SZZ 首先根据代码
变更日志中包含这些关键词的变更来定位缺陷, 并将这些变更中所修改的代码行标注为缺陷行. 其次, SZZ 对这些

