Page 301 - 《软件学报》2024年第6期
P. 301

周光有 等: 基于关系图卷积网络的代码搜索方法                                                         2877


                  5   总结与展望

                    代码搜索是实现代码复用的关键, 也是智能化软件开发各项应用得以运行的基础                           [47] . 基于传统信息检索技术
                 的代码搜索方法只能捕捉到文本和代码的浅层特征, 而运用深度神经网络得到代码和查询语言的向量表示, 然后
                 对向量之间的空间距离进行优化能够挖掘更高层次的特征, 模型可以搜索出更加多样化的结果                                [48] . 本文提出的基
                 于关系图卷积网络的代码搜索方法, 图表示能够完整保留代码片段的结构和语义信息, 提出的匹配操作能够探索
                 文本图与代码图的细粒度匹配关系和全局关系, 相比基线模型在性能上有较大的进步.
                    候选代码库中可能会出现多个语义相同但形式不同的代码片段, 也会影响模型的性能, 这是一个值得关注但
                 现有工作   [8,15,22−24,44−46] 尚未解决的难题. 因此, 在未来的工作中我们会重点考虑如何收集并标注高质量的代码搜索
                 数据集, 探索语义相同但形式不同的代码片段对搜索结果的影响. 另外我们会考虑更多热门编程语言如                                 Javascript
                 数据集进行实验探究, 图编码模块可以采用针对代码结构有效的其他神经网络, 如基于代码数据流设计元路径, 使
                 用  metapath2vec [49] 算法对图结构编码, 进一步优化图节点嵌入模块. 我们还会重点关注代码搜索模型的可解释性
                 问题, 以期对模型效果进行更合理的分析.

                 References:
                  [1]  Liu BB, Dong W, Wang J. Survey on intelligent search and construction methods of program. Ruan Jian Xue Bao/Journal of Software,
                     2018, 29(8): 2177−2197 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5529.htm [doi: 10.13328/j.cnki.jos.005529]
                  [2]  Bajracharya S, Ngo T, Linstead E, Dou YM, Rigor P, Baldi P, Lopes C. Sourcerer: A search engine for open source code supporting
                     structure-based  search.  In:  Proc.  of  the  Companion  to  the  21st  ACM  SIGPLAN  Symp.  on  Object-oriented  Programming  Systems,
                     Languages, and Applications. Portland: ACM, 2006. 681–682. [doi: 10.1145/1176617.1176671]
                  [3]  McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio: A search engine for finding functions and their usages. In: Proc. of the
                     33rd Int’l Conf. on Software Engineering. Honolulu: IEEE, 2011. 1043–1045. [doi: 10.1145/1985793.1985991]
                  [4]  Lu ML, Sun XB, Wang SW, Lo D, Duan YC. Query expansion via WordNet for effective code search. In: Proc. of the 22nd IEEE Int’l
                     Conf. on Software Analysis, Evolution, and Reengineering. Montreal: IEEE, 2015. 545–549. [doi: 10.1109/SANER.2015.7081874]
                  [5]  Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 1998.
                  [6]  Lv F, Zhang HY, Lou JG, Wang SW, Zhang DM, Zhao JJ. CodeHow: Effective code search based on API understanding and extended
                     Boolean model (E). In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering. Lincoln: IEEE, 2015. 260–270.
                     [doi: 10.1109/ASE.2015.42]
                  [7]  Li X, Wang QX, Jin Z. Description reinforcement based code search. Ruan Jian Xue Bao/Journal of Software, 2017, 28(6): 1405–1417
                     (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5226.htm [doi: 10.13328/j.cnki.jos.005226]
                  [8]  Gu XD, Zhang HY, Kim S. Deep code search. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering. Gothenburg: IEEE,
                     2018. 933–944. [doi: 10.1145/3180155.3180167]
                  [9]  Yao ZY, Peddamail JR, Sun H. CoaCor: Code annotation for code retrieval with reinforcement learning. In: Proc. of the 2019 World
                     Wide Web Conf. San Francisco: ACM, 2019. 2203–2214. [doi: 10.1145/3308558.3313632]
                 [10]  Chen QY, Zhou MH. A neural framework for retrieval and summarization of source code. In: Proc. of the 33rd IEEE/ACM Int’l Conf. on
                     Automated Software Engineering. Montpellier: IEEE, 2018. 826–831. [doi: 10.1145/3238147.3240471]
                 [11]  Wan Y, Shu JD, Sui YL, Xu GD, Zhao Z, Wu J, Yu P. Multi-modal attention network learning for semantic source code retrieval. In:
                     Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering. San Diego: IEEE, 2019. 13–25. [doi: 10.1109/ASE.2019.
                     00012]
                 [12]  Ling  CY,  Zou  YZ,  Lin  ZQ,  Xie  B,  Zhao  JF.  Approach  to  searching  software  source  code  with  graph  embedding.  Ruan  Jian  Xue
                     Bao/Journal of Software, 2019, 30(5): 1481−1497 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5721.htm [doi: 10.
                     13328/j.cnki.jos.005721]
                 [13]  Tang J, Qu M, Wang MZ, Zhang M, Yan J, Mei QZ. LINE: Large-scale information network embedding. In: Proc. of the 24th Int’l Conf.
                     on World Wide Web. Florence: ACM, 2015. 1067–1077. [doi: 10.1145/2736277.2741093]
                 [14]  Huang SY, Zhao YH, Liang YM. Code search combining graph embedding and attention mechanism. Journal of Frontiers of Computer
                     Science and Technology, 2022, 16(4): 844–854 (in Chinese with English abstract). [doi: 10.3778/j.issn.1673-9418.2010087]
                 [15]  Liu SQ, Xie XF, Siow J, Ma L, Meng ZG, Liu Y. GraphSearchNet: Enhancing GNNs via capturing global dependencies for semantic
                     code search. IEEE Trans. on Software Engineering, 2023, 49(4): 2839–2855. [doi: 10.1109/TSE.2022.3233901]
   296   297   298   299   300   301   302   303   304   305   306