Page 301 - 《软件学报》2024年第6期
P. 301
周光有 等: 基于关系图卷积网络的代码搜索方法 2877
5 总结与展望
代码搜索是实现代码复用的关键, 也是智能化软件开发各项应用得以运行的基础 [47] . 基于传统信息检索技术
的代码搜索方法只能捕捉到文本和代码的浅层特征, 而运用深度神经网络得到代码和查询语言的向量表示, 然后
对向量之间的空间距离进行优化能够挖掘更高层次的特征, 模型可以搜索出更加多样化的结果 [48] . 本文提出的基
于关系图卷积网络的代码搜索方法, 图表示能够完整保留代码片段的结构和语义信息, 提出的匹配操作能够探索
文本图与代码图的细粒度匹配关系和全局关系, 相比基线模型在性能上有较大的进步.
候选代码库中可能会出现多个语义相同但形式不同的代码片段, 也会影响模型的性能, 这是一个值得关注但
现有工作 [8,15,22−24,44−46] 尚未解决的难题. 因此, 在未来的工作中我们会重点考虑如何收集并标注高质量的代码搜索
数据集, 探索语义相同但形式不同的代码片段对搜索结果的影响. 另外我们会考虑更多热门编程语言如 Javascript
数据集进行实验探究, 图编码模块可以采用针对代码结构有效的其他神经网络, 如基于代码数据流设计元路径, 使
用 metapath2vec [49] 算法对图结构编码, 进一步优化图节点嵌入模块. 我们还会重点关注代码搜索模型的可解释性
问题, 以期对模型效果进行更合理的分析.
References:
[1] Liu BB, Dong W, Wang J. Survey on intelligent search and construction methods of program. Ruan Jian Xue Bao/Journal of Software,
2018, 29(8): 2177−2197 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5529.htm [doi: 10.13328/j.cnki.jos.005529]
[2] Bajracharya S, Ngo T, Linstead E, Dou YM, Rigor P, Baldi P, Lopes C. Sourcerer: A search engine for open source code supporting
structure-based search. In: Proc. of the Companion to the 21st ACM SIGPLAN Symp. on Object-oriented Programming Systems,
Languages, and Applications. Portland: ACM, 2006. 681–682. [doi: 10.1145/1176617.1176671]
[3] McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio: A search engine for finding functions and their usages. In: Proc. of the
33rd Int’l Conf. on Software Engineering. Honolulu: IEEE, 2011. 1043–1045. [doi: 10.1145/1985793.1985991]
[4] Lu ML, Sun XB, Wang SW, Lo D, Duan YC. Query expansion via WordNet for effective code search. In: Proc. of the 22nd IEEE Int’l
Conf. on Software Analysis, Evolution, and Reengineering. Montreal: IEEE, 2015. 545–549. [doi: 10.1109/SANER.2015.7081874]
[5] Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 1998.
[6] Lv F, Zhang HY, Lou JG, Wang SW, Zhang DM, Zhao JJ. CodeHow: Effective code search based on API understanding and extended
Boolean model (E). In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering. Lincoln: IEEE, 2015. 260–270.
[doi: 10.1109/ASE.2015.42]
[7] Li X, Wang QX, Jin Z. Description reinforcement based code search. Ruan Jian Xue Bao/Journal of Software, 2017, 28(6): 1405–1417
(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5226.htm [doi: 10.13328/j.cnki.jos.005226]
[8] Gu XD, Zhang HY, Kim S. Deep code search. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering. Gothenburg: IEEE,
2018. 933–944. [doi: 10.1145/3180155.3180167]
[9] Yao ZY, Peddamail JR, Sun H. CoaCor: Code annotation for code retrieval with reinforcement learning. In: Proc. of the 2019 World
Wide Web Conf. San Francisco: ACM, 2019. 2203–2214. [doi: 10.1145/3308558.3313632]
[10] Chen QY, Zhou MH. A neural framework for retrieval and summarization of source code. In: Proc. of the 33rd IEEE/ACM Int’l Conf. on
Automated Software Engineering. Montpellier: IEEE, 2018. 826–831. [doi: 10.1145/3238147.3240471]
[11] Wan Y, Shu JD, Sui YL, Xu GD, Zhao Z, Wu J, Yu P. Multi-modal attention network learning for semantic source code retrieval. In:
Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering. San Diego: IEEE, 2019. 13–25. [doi: 10.1109/ASE.2019.
00012]
[12] Ling CY, Zou YZ, Lin ZQ, Xie B, Zhao JF. Approach to searching software source code with graph embedding. Ruan Jian Xue
Bao/Journal of Software, 2019, 30(5): 1481−1497 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5721.htm [doi: 10.
13328/j.cnki.jos.005721]
[13] Tang J, Qu M, Wang MZ, Zhang M, Yan J, Mei QZ. LINE: Large-scale information network embedding. In: Proc. of the 24th Int’l Conf.
on World Wide Web. Florence: ACM, 2015. 1067–1077. [doi: 10.1145/2736277.2741093]
[14] Huang SY, Zhao YH, Liang YM. Code search combining graph embedding and attention mechanism. Journal of Frontiers of Computer
Science and Technology, 2022, 16(4): 844–854 (in Chinese with English abstract). [doi: 10.3778/j.issn.1673-9418.2010087]
[15] Liu SQ, Xie XF, Siow J, Ma L, Meng ZG, Liu Y. GraphSearchNet: Enhancing GNNs via capturing global dependencies for semantic
code search. IEEE Trans. on Software Engineering, 2023, 49(4): 2839–2855. [doi: 10.1109/TSE.2022.3233901]