Page 239 - 《软件学报》2021年第7期

P. 239

牛长安等:基于指针生成网络的代码注释自动生成模型 2157

在解码的每一个时刻对分解前后的输入序列的 Attention 权重.

Table 2 A sample from test dataset and predictions of two models
表 2 测试集中的一个样本以及两个模型的预测输出
public NinePatchBorder(Insets insets, NinePatch np){
this.insets=insets;
源代码
this.np=np;
}
Source Encoder 输入 public ninepatchborder (insets insets, ninepatch np) {this. insets=insets; this. np=np;}
Code Encoder 输入 public nine patch border (insets insets, nine patch np) {this. insets=insets; this. np=np;}
参考注释 instantiates a new nine patch border.
Hybrid-DeepCom 输出 instantiates the nine setting.
CodePtr-PGN 输出 instantiates a new nine patch border.

(a) Attention weights of source input (b) Attention weights of code input (c) Attention weights of code input
in CodePtr (without PGN) in CodePtr (without PGN) in Hybrid-DeepCom
Fig.5 Each step’s Attention weights of input sequence in CodePtr-PGN and Hybrid-DeepCom when decoding
图 5 CodePtr-PGN 和 Hybrid-DeepCom 在解码时每一时刻对输入序列的 Attention 权重分布
图 5(a)和图 5(b)分别为 CodePtr-PGN 中 Source Encoder 和 Code Encoder 对应的输入序列在解码时的
Attention 权重分布,图 5(c)为 Hybrid-DeepCom 中 Code Encoder 输入序列的权重分布,横坐标表示解码时生成的
每一个单词的时刻,纵坐标表示输入序列的每个单词.从图 5(c)可以看出,Hybrid-DeepCom 在生成“nine”时,正确
地关注到了输入序列中的“nine”,而在下一时刻,模型仍然较多地关注了“nine”,而不是输入序列中的下一个单
词“patch”,从而生成了看起来没有相关性的“setting”.而对于图 5(b),CodePtr-PGN 在生成“nine”“patch”和
“border”时都正确地关注到了输入序列中对应的单词.
我们认为这是 CodePtr-PGN 的 Source Encoder 分担了 Hybrid-DeepCom 中 Code Encoder 的匹配 AST 信息
任务带来的效果.因为对标识符“NinePatchBorder”进行分解破坏了源代码的语法结构,使得分解后的“nine”代替
了原来的类名的位置,而解码器在解码时根据 AST 提取到的信息,类的名称本应指向“NinePatchBorder”出现的
位置,分解后该位置变成了“nine”,而且 Hybrid-DeepCom 中的 Code Encoder 需要在提取语义信息的同时匹配

234 235 236 237 238 239 240 241 242 243 244