Page 319 - 《软件学报》2020年第10期
P. 319

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
         Journal of Software,2020,31(10):3295−3308 [doi: 10.13328/j.cnki.jos.005800]   http://www.jos.org.cn
         ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563


                                                          ∗
         多节点系统异常日志流量模式检测方法

               1,2
                                       1,2
                       1
                               1
         王晓东 ,   赵一宁 ,   肖海力 ,   迟学斌 ,   王小宁    1
         1
          (中国科学院  计算机网络信息中心,北京  100190)
         2 (中国科学院大学,北京  100049)
         通讯作者:  赵一宁, E-mail: zhaoyn@sccas.cn

         摘   要:  随着国家高性能计算环境各个节点产生日志数量的不断增加,采用传统的人工方式进行异常日志分析已
         不能满足日常的分析需求.提出一种异常日志流量模式的定义方法:同一节点相同时间片内日志类型的有序排列代
         表了一种日志流量模式,并以该方法为出发点,实现了一个异常日志流量模式检测方法,用来自动挖掘异常日志流量
         模式.该方法从系统日志入手,根据日志内容的文本相似度进行自动分类.然后将相同时间片内日志各个类型出现的
         次数作为输入特征,基于主成分分析的异常检测方法对该输入进行异常检测,得到大量异常的日志类型序列.之后,
         使用基于最长公共子序列的距离度量对这些序列进行层次聚类,并将聚类结果进行自适应 K 项集算法,以得出不同
         异常日志流量模式的序列代表.将国家高性能计算环境半年产生的日志根据不同时间段(早、晚、夜)使用上述方法
         进行分析,得出了不同时间段的异常日志流量模式和相互关系.该方法也可以推广到其他分布式系统的系统日志中.
         关键词:  异常日志流量;主成分分析;层次聚类;最长公共子序列;自适应 K 项集算法
         中图法分类号: TP316

         中文引用格式:  王晓东,赵一宁,肖海力,迟学斌,王小宁.多节点系统异常日志流量模式检测方法.软件学报,2020,31(10):
         3295−3308. http://www.jos.org.cn/1000-9825/5800.htm
         英文引用格式: Wang XD, Zhao YN, Xiao HL, Chi XB, Wang XN. Multi-node system abnormal log flow mode detection method.
         Ruan Jian Xue Bao/Journal of Software, 2020,31(10):3295−3308 (in Chinese). http://www.jos.org.cn/1000-9825/5800.htm

         Multi-node System Abnormal Log Flow Mode Detection Method
                        1,2
                                                                 1,2
                                                    1
                                       1
         WANG Xiao-Dong ,   ZHAO Yi-Ning ,   XIAO Hai-Li ,  CHI Xue-Bin ,   WANG Xiao-Ning 1
         1 (Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China)
         2 (University of Chinese Academy of Sciences, Beijing 100049, China)
         Abstract:    With the increasing number of logs produced by nodes in CNGrid, traditional manual methods for abnormal log analysis can
         no longer  meet the need of daily  analysis.  This study proposed  a  method to define the  abnormal log traffic pattern:  The  orderly
         arrangement of log types in the same node and at the same time slice represents a log traffic pattern. Based on this method, a log traffic
         pattern detection method was implemented, which was applied in automatically mine of abnormal log traffic pattern. The method starts
         with system log and classifies automatically according to the text similarity of log content. Then, the frequency of each types of log in the
         same time slice is taken as the input feature, and the anomaly detection method based on principal component analysis (PCA) is used to
         detect  the  abnormal input,  and  a large number of  abnormal log type sequences  are obtained. A distance  metric based on the  longest
         common subsequence  is  used  to cluster these  sequences  by hierarchical clustering method. The clustering results are  used  with  the
         adaptive  K-itemset algorithm to  get the  deputies  of the abnormal log  flow modes. The above method was used to analyze  the  logs
         generated in the national high performance computing environment CNGrid in half a year according to different time periods (morning,

            ∗  基金项目:  国家重点研发计划(2018YFB0204002);  国家自然科学基金(61702477)
            Foundation  item: National Key Research and  Development  Program of China  (2018YFB0204002); National Natural  Science
         Foundation of China (61702477)
              收稿时间:   2018-06-08;  修改时间: 2018-09-10;  采用时间: 2018-12-27; jos 在线出版时间: 2019-11-06
             CNKI 网络优先出版: 2019-11-06 11:48:54, http://kns.cnki.net/kcms/detail/11.2560.TP.20191106.1148.002.html
   314   315   316   317   318   319   320   321   322   323   324