Page 26 - 《软件学报》2021年第11期
P. 26

3352                                Journal of Software  软件学报 Vol.32, No.11, November 2021

                 developers according to the existing code lines, then it will not only help the developer to complete the development task better, but also
                 improve  the efficiency  of software development.  However, most existing approaches  only  focus  on code  repair  or completion, which
                 seldom considers how to meet the demand of recommending code lines based on contextual information. To solve this problem, a feasible
                 solution is using deep learning methods to extract the relevant context factors of code lines through mining hidden context information
                 based on the existing massive source data. Therefore, this study proposes a novel approach based on deep learning for onsite programming.
                 In this approach, the contextual relationships among various code lines are learned from existing large-scale code data sets and then Top-N
                 code lines are recommended to programmers. The approach utilizes the RNN encoder-decoder framework, which can encode several lines
                 of code to a vector with context-aware information, and then obtain the Top-N new code lines based on the context vector. Finally, the
                 approach is empirically evaluated with a large-scale code line data set collected from the open source platform. The study results show
                 that the proposed approach can recommend the relevant code lines to developers according to the existing context, and the accuracy value
                 is approaching to 60%. In addition, the MRR value is about 0.3, indicating that the recommended items are ranked in the top of the N
                 recommended results.
                 Key words:    onsite programming; source code context; code line; deep learning; RNN Encoder-Decoder

                    在实际开发过程中,开发者通常会选择搜索引擎查询需要的代码 .但利用搜索引擎搜索需要确切的功能
                 性描述 ,而对一个单一代码行而言并不具备一个完整功能.并且由于编程语言的复杂性和多样性 ,比如数据
                 果 [4,5] ,所以查询结果通常不尽如人意.已有的一些方法通常是进行代码修复或者代码补全 ,这类工作粒度更
                 细,并且对自动补全功能的限制性较高,主要针对确定的 API 或者已经定义的变量之类进行补全或推荐                                [7,8] ,不能
                 码行的相关上下文因子,挖掘隐含上下文信息,为精准推荐提供基础.受此启发,本文提出一种基于深度学习 的
                 编程现场上下文深度感知的代码行推荐方法(deep awareness for code line recommendation,简称 DA4CLR),其
                    DA4CLR 使用 RNN Encoder-Decoder 的框架   [10] .该框架是一种 Sequence-to-Sequence 框架,其编码-解码结
                 构对解决 Sequence2Sequence 问题有独到的优势 .编码器能够将输入序列进行编码,进而得到一个固定长度的
                 了对推荐方法的有效性进行检测,本文从 GitHub 上关注度较高的项目和部分认可度较好的 jar 包中收集了数百
                 万个带上下文的代码行,并选择其中的部分数据作为测试数据集,在准确率和 MRR 两个指标上对方法进行测
                 困难和不足,本论文工作的主要贡献在于以下 3 个方面.
                    1)   针对源码提出了一种面向开源源码大数据的数据质量评估方法.从不同的维度和粒度级别,分析了从
                        一个 Java project 切分成多个 Java method 过程中各个步骤的质量问题,最终给出对单个方法块的质量
                    2)   利用深度学习模型,从已有的带有代码行上下文的大规模开源数据集中学习潜在的一般性代码行上
                    3)   通过编程现场任务数据捕捉开发者意图,并利用语义相似度匹配对推荐结果进行优先级调整,更好地
   21   22   23   24   25   26   27   28   29   30   31