Page 144 - 《软件学报》2025年第5期
P. 144

2044                                                       软件学报  2025  年第  36  卷第  5  期


                 Abstract:  The  interactions  between  elements  in  contemporary  software  systems  are  notably  intricate,  encompassing  relationships  between
                 packages,  classes,  and  functions.  Accurate  comprehension  of  these  relationships  is  pivotal  for  optimizing  system  structures  and  enhancing
                 software  quality.  Analyzing  inter-package  relationships  can  help  unveil  dependencies  between  modules,  thereby  assisting  developers  in
                 more  effectively  managing  and  organizing  software  architectures.  On  the  other  hand,  a  clear  understanding  of  inter-class  relationships
                 contributes  to  the  creation  of  code  repositories  that  are  more  scalable  and  maintainable.  Moreover,  a  clear  understanding  of  inter-function
                 relationships  facilitates  rapid  identification  and  resolution  of  logical  errors  within  programs,  consequently  enhancing  the  robustness  and
                 reliability  of  the  software.  However,  current  predictions  of  software  system  interaction  confront  challenges  such  as  granularity  disparities,
                 inadequate  features,  and  version  changes.  To  address  this  challenge,  this  study  constructs  corresponding  software  network  models  based  on
                 the  three  granularities,  including  software  packages,  classes,  and  functions.  It  introduces  a  novel  approach  combining  local  and  global
                 features  to  reinforce  the  analysis  and  prediction  of  software  systems  through  feature  extraction  and  link  prediction  of  software  networks.
                 This  approach  is  based  on  the  construction  and  handling  of  software  networks,  involving  specific  steps  such  as  leveraging  the  node2vec
                 method  to  learn  local  features  of  software  networks  and  combining  Laplacian  feature  vector  encoding  to  comprehensively  represent  the
                 global  positional  information  of  nodes.  Subsequently,  the  Graph  Transformer  model  is  employed  to  further  optimize  the  feature  vectors  of
                 node  attributes,  culminating  in  the  completion  of  the  interaction  prediction  task  of  the  software  system.  Extensive  experimental  validations
                 are  conducted  on  three  Java  open-source  projects,  encompassing  within-version  and  cross-version  interaction  prediction  tasks.  The
                 experimental  results  demonstrate  that,  compared  to  benchmark  methods,  the  proposed  approach  achieves  an  average  increase  of  8.2%  and
                 8.5%  in  AUC  and  AP  values,  respectively  in  within-version  prediction  tasks.  This  approach  reaches  an  average  rise  of  3.5%  and  2.4%  in
                 AUC and AP values, respectively, in cross-version prediction tasks.
                 Key words:  software network; interaction prediction; Graph Transformer; granularity difference; software quality
                    在当今软件工程领域, 随着软件系统日益复杂, 准确理解软件系统中元素间的交互调用变得尤为重要. 这些调
                 用直接影响着系统结构、软件质量和整体性能. 然而, 当前的软件开发与维护面临人员频繁变动、经验积累不足
                 和交互文档不完整等一系列问题, 这些因素导致了软件系统后期交互关系的不一致性、错误依赖以及功能故障等
                 挑战, 严重损害了软件系统的稳定性和可靠性, 同时也增加了维护成本                     [1,2] . 因此, 准确预测元素间的调用关系有助
                 于优化代码结构、降低耦合度、提高代码复用性和可维护性                     [3] . 此外, 对版本迭代和更新的预测也至关重要, 有助
                 于理解系统演化的影响, 减少版本兼容性问题, 从而确保系统更新过程的顺畅进行                          [4] . 因此, 精准预测软件系统中
                 元素间合理的设计关系, 减少错误依赖的产生, 从而优化软件的设计架构, 提高软件质量, 确保软件系统在其生命
                 周期的更新迭代过程中趋向良性发展.
                    软件系统早已被证实可以抽象为简洁明了的软件网络, 且具有复杂网络的基本特性                            [5,6] . 因此, 在软件系统中,
                 将包、类、方法、接口、属性等元素视为节点, 元素间的交互关系视为连边                         [7] , 即可构建相应的软件网络结构. 于
                 是, 软件系统中元素交互关系预测则可映射为图结构数据中的链路预测问题, 即, 基于软件系统各元素间的关联和
                 连接关系, 预测未知元素间的函数调用、依赖关系和继承关系等. 这种将软件系统抽象理解为软件网络的方法, 将
                 有助于软件工程设计人员直观认识和深入理解软件中结构决定功能的实质含义, 以遵循“高内聚、低耦合”的设计
                 原则, 也为软件结构的复杂性、稳定性、演化特性等方面提供新的度量指标和评价标准                            [1,2] .
                    早期的软件系统交互关系预测方法主要依赖于静态分析和基于规则的技术, 如依赖图分析、静态代码分析和
                 基于规则的模式匹配等        [8] . 尽管这些方法有助于理解软件结构, 但随着软件系统的复杂性增加, 静态分析往往只能
                 提供有限的信息, 难以捕捉软件系统中元素间复杂的交互模式和真实的关联信息, 可能导致误报或漏报. 此外, 由
                 于缺乏上下文信息, 导致节点间的细微关联特征难以准确地捕获, 并且也缺乏对复杂系统变化的适应性. 不难发
                 现, 现有方法不能很好地适应现代软件系统中交互关系预测的复杂性和多变性, 限制了对真实关联信息特征的准
                 确捕获和高效利用.
                    近年来, 图表征技术在图数据挖掘领域展现出显著成效. 其核心思想在于设计一种映射函数, 将图网络中的每
                 个节点转换为低维、实值、稠密的潜在表示                [8] , 从而用作基于图的各种下游任务. 其中, 图神经网络           (graph neural
                 network, GNN) 在挖掘节点属性和图拓扑结构信息等方面表现出色                [9,10] . 并且图特征学习策略也逐渐从静态的换
                 能式学习向动态的归纳式学习发展, 拟合能力和泛化能力都有了很大的提高                         [11−13] . 这一技术有效解决了上述问题,
                 也为软件系统中交互关系预测任务提供了一个全新思路.
                    受此启发, 本文提出一种软件系统多粒度交互关系预测方法                   (local and global combined with Graph Transformer
   139   140   141   142   143   144   145   146   147   148   149