Page 120 - 《软件学报》2020年第11期

P. 120

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(11):3436−3447 [doi: 10.13328/j.cnki.jos.005863] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
子图相似性的恶意程序检测方法

汪洁, 王长青

(中南大学计算机学院,湖南长沙 410083)
通讯作者: 汪洁, E-mail: jwang@csu.edu.cn

摘要: 动态行为分析是一种常见的恶意程序分析方法,常用图来表示恶意程序系统调用或资源依赖等,通过图
挖掘算法找出已知恶意程序样本中公共的恶意特征子图,并通过这些特征子图对恶意程序进行检测.然而这些方法
往往依赖于图匹配算法,且图匹配不可避免计算慢,同时,算法中还忽视了子图之间的关系,而考虑子图间的关系有
助于提高模型检测效果.为了解决这两个问题,提出了一种基于子图相似性恶意程序检测方法,即 DMBSS.该方法使
用数据流图来表示恶意程序运行时的系统行为或事件,再从数据流图中提取出恶意行为特征子图,并使用“逆拓扑
标识”算法将特征子图表示成字符串,字符串蕴含了子图的结构信息,使用字符串替代图的匹配.然后,通过神经网络
来计算子图间的相似性即将子图结构表示成高维向量,使得相似子图在向量空间的距离也较近.最后,使用子图向量
构建恶意程序的相似性函数,并在此基础上,结合 SVM 分类器对恶意程序进行检测.实验结果显示,与其他方法相
比,DMBSS 在检测恶意程序时速度较快,且准确率较高.
关键词: 恶意程序检测;神经网络;子图分布式表示;图相似函数
中图法分类号: TP311

中文引用格式: 汪洁,王长青.子图相似性的恶意程序检测方法.软件学报,2020,31(11):3436−3447. http://www.jos.org.cn/1000-
9825/5863.htm
英文引用格式: Wang J, Wang CQ. Malware detection method based on subgraph similarity. Ruan Jian Xue Bao/Journal of
Software, 2020,31(11):3436−3447 (in Chinese). http://www.jos.org.cn/1000-9825/5863.htm

Malware Detection Method Based on Subgraph Similarity
WANG Jie, WANG Chang-Qing

(School of Computer Science and Engineering, Central South University, Changsha 410083, China)
Abstract: Dynamic behavior analysis is a common method of malware detection. It uses graphs to represent malware’s system calls or
resource dependencies. It uses graph mining algorithms to find common malicious feature subgraphs in known malware samples, and
detect unknown programs through these features. However, these methods often rely on the graph matching algorithm, and the inevitable
calculation of the graph matching is slow, and the relationship between the subgraphs is also neglected in the algorithm. It can improve the
detection accuracy of the model if the subgraphs’ relationship is considered. In order to solve these two problems, a sub-graph similarity
malware detection method called DMBSS is proposed. It uses the data flow graph to represent the system behavior or event of the running
malicious program, and then extracts the malicious behavior feature subgraph from the data flow graph, and uses “inverse topology
identification” algorithm to represent the feature subgraph as a string, and the string implied the structural information of the subgraph,
using a string instead of the matching of the graph. The neural network is then used to calculate the similarity between the subgraphs and
to represent the subgraph structure as a high dimensional vector, so that the similar subgraphs’ distance is also shorter in the vector space.
Finally, the subgraph vector is used to construct the similarity function of the malicious program, and based on this, the SVM classifier is
used to detect the malicious program. The experimental results show that compared with other methods, DMBSS is faster in detecting
malicious programs and has higher accuracy.

∗ 基金项目: 国家自然科学基金(61202495)
Foundation item: National Natural Science Foundation of China (61202495)
收稿时间: 2018-12-10; 修改时间: 2019-01-17, 2019-03-23; 采用时间: 2019-04-22

115 116 117 118 119 120 121 122 123 124 125