Page 292 - 《软件学报》2021年第9期

P. 292

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2021,32(9):2916−2934 [doi: 10.13328/j.cnki.jos.005983] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
融合多种特征的恶意 URL 检测方法

1
2
1
吴森焱 , 罗熹 , 王伟平 , 覃岩 1
1
(中南大学计算机学院,湖南长沙 410083)
2 (湖南警察学院信息技术系,湖南长沙 410083)
通讯作者: 王伟平, E-mail: wpwang@csu.edu.cn

摘要: 随着 Web 应用的日益广泛,Web 浏览过程中,恶意网页对用户造成的危害日趋严重.恶意 URL 是指其所对
应的网页中含有对用户造成危害的恶意代码,会利用浏览器或插件存在的漏洞攻击用户,导致浏览器自动下载恶意
软件.基于对大量存活恶意 URL 特征的统计分析,并重点结合了恶意 URL 的重定向跳转、客户端环境探测等逃避
检测特征,从页面内容、JavaScript 函数参数和 Web 会话流程这 3 个方面设计了 25 个特征,提出了基于多特征融合
和机器学习的恶意 URL 检测方法——HADMW.测试结果表明:该方法取得了 96.2%的精确率和 94.6%的召回率,能
够有效地检测恶意 URL.与开源项目以及安全软件的检测结果相比,HADMW 取得了更好的效果.
关键词: Web 安全;恶意 URL 检测;多特征融合;机器学习
中图法分类号: TP393

中文引用格式: 吴森焱,罗熹,王伟平,覃岩.融合多种特征的恶意 URL 检测方法.软件学报,2021,32(9):2916−2934. http://www.
jos.org.cn/1000-9825/5983.htm
英文引用格式: Wu SY, Luo X, Wang WP, Qin Y. Malicious URL detection based on multiple feature fusion. Ruan Jian Xue
Bao/Journal of Software, 2021,32(9):2916−2934 (in Chinese). http://www.jos.org.cn/1000-9825/5983.htm
Malicious URL Detection Based on Multiple Feature Fusion

1
1
2
WU Sen-Yan , LUO Xi , WANG Wei-Ping , QIN Yan 1
1 (School of Computer Science and Engineering, Central South University, Changsha 410083, China)
2 (Department of Information Technology, Hunan Police Academy, Changsha 410083, China)
Abstract: With the popularity of Web applications, malicious webpages are increasingly harmful to users in the process of Web
browsing. The malicious URL mentioned in this paper refers that the corresponding webpage contains malicious codes that are harmful to
users. These malicious code exploits the vulnerabilities of browsers or plugins to attack users with download malware automatically.
Based on the statistics and analysis of amounts of living malicious URL, and considering the anti-detection technologies being more used
in malicious webpage such as the client environment detection and redirections, 25 features in three aspects are designed, namely, content
of webpage, parameters of JavaScript function, and Web session flows. And a detection method—HADMW is proposed based on these 25
features and machine learning. The experimental results suggest that HADMW can achieve 96.2% accuracy and 94.6% recall rate, and it
can detect malicious URL effectively. At the same time, compared with the detection results of open projects and security software,
HADMW achieves better results.
Key words: Web security; malicious URL detection; multiple feature fusion; machine learning

随着互联网的快速普及,用户在生活和工作中享受到了互联网带来的便利服务,同时也面临着恶意网页的
安全威胁.恶意 URL 指所对应的网页中,含有对用户造成危害的恶意代码.这些恶意代码在用户访问网页的过

∗ 基金项目: 国家自然科学基金(61672543); 网络犯罪侦查湖南省普通高校重点实验室开放课题(2017WLFZZC002)
Foundation item: National Natural Science Foundation of China (61672543); Open Research Fund of Key Laboratory of Network
Crime Investigation of Hunan Provincial Colleges (2017WLFZZC002)
收稿时间: 2019-06-19; 修改时间: 2019-09-06, 2019-10-10; 采用时间: 2019-11-25

287 288 289 290 291 292 293 294 295 296 297