Page 144 - 《软件学报》2025年第5期
P. 144
2044 软件学报 2025 年第 36 卷第 5 期
Abstract: The interactions between elements in contemporary software systems are notably intricate, encompassing relationships between
packages, classes, and functions. Accurate comprehension of these relationships is pivotal for optimizing system structures and enhancing
software quality. Analyzing inter-package relationships can help unveil dependencies between modules, thereby assisting developers in
more effectively managing and organizing software architectures. On the other hand, a clear understanding of inter-class relationships
contributes to the creation of code repositories that are more scalable and maintainable. Moreover, a clear understanding of inter-function
relationships facilitates rapid identification and resolution of logical errors within programs, consequently enhancing the robustness and
reliability of the software. However, current predictions of software system interaction confront challenges such as granularity disparities,
inadequate features, and version changes. To address this challenge, this study constructs corresponding software network models based on
the three granularities, including software packages, classes, and functions. It introduces a novel approach combining local and global
features to reinforce the analysis and prediction of software systems through feature extraction and link prediction of software networks.
This approach is based on the construction and handling of software networks, involving specific steps such as leveraging the node2vec
method to learn local features of software networks and combining Laplacian feature vector encoding to comprehensively represent the
global positional information of nodes. Subsequently, the Graph Transformer model is employed to further optimize the feature vectors of
node attributes, culminating in the completion of the interaction prediction task of the software system. Extensive experimental validations
are conducted on three Java open-source projects, encompassing within-version and cross-version interaction prediction tasks. The
experimental results demonstrate that, compared to benchmark methods, the proposed approach achieves an average increase of 8.2% and
8.5% in AUC and AP values, respectively in within-version prediction tasks. This approach reaches an average rise of 3.5% and 2.4% in
AUC and AP values, respectively, in cross-version prediction tasks.
Key words: software network; interaction prediction; Graph Transformer; granularity difference; software quality
在当今软件工程领域, 随着软件系统日益复杂, 准确理解软件系统中元素间的交互调用变得尤为重要. 这些调
用直接影响着系统结构、软件质量和整体性能. 然而, 当前的软件开发与维护面临人员频繁变动、经验积累不足
和交互文档不完整等一系列问题, 这些因素导致了软件系统后期交互关系的不一致性、错误依赖以及功能故障等
挑战, 严重损害了软件系统的稳定性和可靠性, 同时也增加了维护成本 [1,2] . 因此, 准确预测元素间的调用关系有助
于优化代码结构、降低耦合度、提高代码复用性和可维护性 [3] . 此外, 对版本迭代和更新的预测也至关重要, 有助
于理解系统演化的影响, 减少版本兼容性问题, 从而确保系统更新过程的顺畅进行 [4] . 因此, 精准预测软件系统中
元素间合理的设计关系, 减少错误依赖的产生, 从而优化软件的设计架构, 提高软件质量, 确保软件系统在其生命
周期的更新迭代过程中趋向良性发展.
软件系统早已被证实可以抽象为简洁明了的软件网络, 且具有复杂网络的基本特性 [5,6] . 因此, 在软件系统中,
将包、类、方法、接口、属性等元素视为节点, 元素间的交互关系视为连边 [7] , 即可构建相应的软件网络结构. 于
是, 软件系统中元素交互关系预测则可映射为图结构数据中的链路预测问题, 即, 基于软件系统各元素间的关联和
连接关系, 预测未知元素间的函数调用、依赖关系和继承关系等. 这种将软件系统抽象理解为软件网络的方法, 将
有助于软件工程设计人员直观认识和深入理解软件中结构决定功能的实质含义, 以遵循“高内聚、低耦合”的设计
原则, 也为软件结构的复杂性、稳定性、演化特性等方面提供新的度量指标和评价标准 [1,2] .
早期的软件系统交互关系预测方法主要依赖于静态分析和基于规则的技术, 如依赖图分析、静态代码分析和
基于规则的模式匹配等 [8] . 尽管这些方法有助于理解软件结构, 但随着软件系统的复杂性增加, 静态分析往往只能
提供有限的信息, 难以捕捉软件系统中元素间复杂的交互模式和真实的关联信息, 可能导致误报或漏报. 此外, 由
于缺乏上下文信息, 导致节点间的细微关联特征难以准确地捕获, 并且也缺乏对复杂系统变化的适应性. 不难发
现, 现有方法不能很好地适应现代软件系统中交互关系预测的复杂性和多变性, 限制了对真实关联信息特征的准
确捕获和高效利用.
近年来, 图表征技术在图数据挖掘领域展现出显著成效. 其核心思想在于设计一种映射函数, 将图网络中的每
个节点转换为低维、实值、稠密的潜在表示 [8] , 从而用作基于图的各种下游任务. 其中, 图神经网络 (graph neural
network, GNN) 在挖掘节点属性和图拓扑结构信息等方面表现出色 [9,10] . 并且图特征学习策略也逐渐从静态的换
能式学习向动态的归纳式学习发展, 拟合能力和泛化能力都有了很大的提高 [11−13] . 这一技术有效解决了上述问题,
也为软件系统中交互关系预测任务提供了一个全新思路.
受此启发, 本文提出一种软件系统多粒度交互关系预测方法 (local and global combined with Graph Transformer