Page 188 - 《软件学报》2020年第11期
P. 188

3504                                Journal of Software  软件学报 Vol.31, No.11, November 2020

                    由于低维数据聚类和维度较高的数据聚类花费的平均时间跨度较大,直接用实际平均时间做比较,图示效
                 果不佳.因此,本文统一对聚类花费的平均时间取对数.其中,纵坐标代表各算法在真实数据集上运行的平均时
                 间(ms).
                    从图 9 可以看出,WKM      [12] 和 MWKM [20] 这两种算法具有较高的聚类效率,这正是基于模的聚类算法的一个
                 优势所在,只需考虑类属属性的模,而忽略其余类属符号的统计信息,这大大降低了算法时间;KKM                                  [11] 算法与
                 KSCC 相比没有属性加权的过程,时间也较低.
                 5    结   论

                    针对现有类属型数据子空间聚类方法线性度量对象间相似性的问题,本文提出了一种新的类属型数据核
                 子空间聚类方法,用于类属型数据无监督的统计学习.利用核的思想定义了一个新的相似性度量;然后基于该相
                 似性度量,提出一种核子空间聚类算法 KSCC 来优化目标函数.新方法不仅克服了线性相似性度量的不足,还能
                 够进行自动的属性加权.经过合成数据和真实数据集的实验验证,与现有其他类属型数据子空间聚类算法相比,
                 KSCC 算法在实验数据上的聚类质量获得较为明显的改善.
                    下一步的工作可分为两方面:一方面,对于类属型数据我们无法判断其分布形式,下一步寻找一种能自适应
                 学习出核函数形式(即自适应学习出核矩阵)的方法;另一方面,高维数据的聚类分析是数据挖掘的一类难点问
                 题,类属型数据中存在较多高维数据,比如序列数据,下一步对高维类属型数据进行聚类分析.

                 References:
                 [1]    Han  JW, Kamber M,  Pei  J, Worte; Fan  M, Meng  XF,  Trans. Data Mining: Concepts and Techniques.  3rd ed., Beijing: China
                     Machine Press, 2012 (in Chinese). [doi: 10.3969/j.issn.1674-6511.2008.03.043]
                 [2]    Chen LF, Wu T. Feature Reduction in Data Mining. Beijing: Science Press, 2016 (in Chinese).
                 [3]    Cai XY, Dai GZ, Yang LB. Survey on spectral clustering algorithms. Computer Science, 2008,35(7):14−18 (in Chinese with English
                     abstract). [doi: 10.3969/j.issn.1002-137X.2008.07.004]
                 [4]    Jain AK, Murty MN, Flynn PJ. Data clustering: A review. ACM Computing Surveys, 1999,31(3):264−323.
                 [5]    Perona P, Freeman W. A factorization approach to grouping. In: Proc. of the European Conf. on Computer Vision. 1998. 655−670.
                 [6]    Huang JZ, Ng MK, Rong H, et al. Automated variable weighting in k-means type clustering. IEEE Trans. on Pattern Analysis &
                     Machine Intelligence, 2005,27(5):657−668. [doi: 10.1109/TPAMI.2005.95]
                 [7]    Chen LF, Guo GD, Jiang QS. Adaptive algorithm for soft subspace clustering. Ruan Jian Xue Bao/Journal of Software, 2010,21(10):
                     2513−2523 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3763.htm [doi: 10.3724/SP.J.1001.2010.03763]
                 [8]    Ng MK, Li MJ, Huang JZ, et al. On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. on Pattern
                     Analysis & Machine Intelligence, 2007,29(3):503−507. [doi: 10.1109/TPAMI.2007.53]
                 [9]     Boriah S, Chandola V, Kumar V. Similarity measures for categorical data: A comparative evaluation. In: Proc. of the 2008 SIAM
                     Int’l Conf. on Data Mining. 2008. 243−254. [doi: 10.1137/1.9781611972788.22]
                [10]     Knippenberg RW. Orthogonalization of categorical data: How to fix a measurement problem in statistical distance metrics. Ssrn
                     Electronic Journal, 2013. [doi: 10.2139/ssrn.2357607]
                [11]     Kong R, Zhang GX, Shi ZS, et al. Kernel-based K-means clustering. Computer Engineering, 2004,30(11):12−13,80 (in Chinese
                     with English abstract). [doi: 10.3969/j.issn.1000-3428.2004.11.005]
                [12]     Chan E, Ching W,  Ng  M,  et  al. An optimization algorithm  for  clustering using  weighted dissimilarity  measures. Pattern
                     Recognition, 2004,37(5):943−952. [doi: 10.1016/j.patcog.2003.11.003]
                [13]     Cao F, Liang J, Li D, et al. A weighting k-modes algorithm for subspace clustering of categorical data. Neurocomputing, 2013,
                     108(5):23−30. [doi: 10.1016/j.neucom.2012.11.009]
                [14]     Chen L, Wang S, Wang K, et al. Soft subspace clustering of categorical data with probabilistic distance. Pattern Recognition, 2016,
                     51(C):322−332. [doi: 10.1016/j.patcog.2015.09.027]
                [15]     Huang Z, Ng MK.  A note  on  K-modes  clustering. Journal of  Classification, 2003,20(2):257−261. [doi: 10.1007/s00357-003-
                     0014-4]
   183   184   185   186   187   188   189   190   191   192   193