Page 114 - 《软件学报》2021年第11期
P. 114

软件学报 ISSN 1000-9825, CODEN RUXUEW                                       E-mail: jos@iscas.ac.cn
                 Journal of Software,2021,32(11):3440−3451 [doi: 10.13328/j.cnki.jos.006041]   http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                         Tel: +86-10-62562563


                                                ∗
                 噪音数据的属性选择算法

                      1
                                      1
                              1
                 许   航 ,   张师超 ,   吴兆江 ,   李佳烨  2
                 1
                 (中南大学  计算机学院,湖南  长沙  410083)
                 2
                 (广西师范大学  计算机科学与信息工程学院,广西  桂林  541004)
                 通讯作者:  张师超, E-mail: zhangsc@csu.edu.cn

                 摘   要:  正则化属性选择算法减小噪音数据影响的效果不佳,而且样本空间的局部结构几乎没有被考虑,在将样
                 本映射到属性子空间后,样本之间的联系与原空间不一致,导致数据挖掘算法的效果不能令人满意.提出一个抗噪音
                 属性选择方法,可以有效地解决传统算法的这两个缺陷.该方法首先采用自步学习的训练方式,这不仅能大幅度降低
                 离群点进入训练的可能性,而且有利于模型的快速收敛;然后,采用加入 l 2,1 正则项的回归学习器进行嵌入式属性选
                 择,兼顾“求得稀疏解”和“解决过拟合”,使模型更稳健;最后,融合局部保留投影的技术,将其投影矩阵转换成模型的
                 回归参数矩阵,在属性选择的同时保持样本之间的原有局部结构.采用一系列基准数据集合测试该算法,在 aCC 和
                 aRMSE 上的实验结果,表明了该属性选择方法的有效性.
                 关键词:  属性选择;自步学习;局部保留投影
                 中图法分类号: TP18

                 中文引用格式:  许航,张师超,吴兆江,李佳烨.噪音数据的属性选择算法.软件学报,2021,32(11):3440−3451.  http://www.jos.org.
                 cn/1000-9825/6041.htm
                 英文引用格式:  Xu H,  Zhang  SC, Wu ZJ,  Li  JY. Feature selection algorithm for noise  data.  Ruan  Jian Xue Bao/Journal of
                 Software, 2021,32(11):3440−3451 (in Chinese). http://www.jos.org.cn/1000-9825/6041.htm

                 Feature Selection Algorithm for Noise Data
                        1
                                                        1
                                         1
                 XU Hang ,   ZHANG Shi-Chao ,  WU Zhao-Jiang ,   LI Jia-Ye 2
                 1 (School of Computer Science and Engineering, Central South University, Changsha 410083, China)
                 2 (School of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China)
                 Abstract:    The regularization feature selection  algorithm is  not  effective in  reducing the  impact of noisy data.  Moreover, the local
                 structure of the sample space is hardly considered. After the samples are mapped to the feature subspace, the relationship between samples
                 is inconsistent with the original space, resulting in unsatisfactory results of the data mining algorithm. This study proposes an anti-noise
                 feature selection method that can effectively solve these two shortcomings of traditional algorithms. This method first uses a self-paced
                 learning training  method,  which not only greatly reduces the  possibility of outliers  entering  training, but  also  facilitates the rapid
                 convergence of the model. Then, a regression learner with regular terms is used to select the embedded features, taking into account the
                 “sparse solution” and “solving over-fitting” to make the model more robust. Finally, the technique of locality preserving projections is
                 integrated, and its projection matrix is transformed into the regression parameter matrix of the model, while maintaining the original local
                 structure between the samples while selecting the features. Some experiments are conducted for evaluating the algorithm with a series of
                 benchmark data sets. Experimental results show the effectiveness of the proposed algorithm in term of the aCC and aRMSE.
                 Key words:    feature selection; self-paced learning; locality preserving projection


                   ∗  基金项目:  国家自然科学基金(61836016, 61672177);  中央高校基本科研业务费专项资金(2019zzts964)
                      Foundation item:  National  Natural  Science Foundation of  China (61836016, 61672177);  Fundamental Research Funds for  the
                 Central Universities (2019zzts964)
                     收稿时间: 2019-12-26;  修改时间: 2020-01-17;  采用时间: 2020-03-27; jos 在线出版时间: 2020-12-02
   109   110   111   112   113   114   115   116   117   118   119