Page 338 - 《软件学报》2025年第10期
P. 338

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(10):4735−4752 [doi: 10.13328/j.cnki.jos.007306] [CSTR: 32375.14.jos.007306]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                        *
                 面向集值数据的孪生支持函数机

                 梁志贞,    闵玉寒,    丁世飞


                 (中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116)
                 通信作者: 梁志贞, E-mail: liang@cumt.edu.cn

                 摘 要: 孪生支持向量机 (twin support vector machine, TSVM) 能有效地处理交叉或异或等类型的数据. 然而, 当处
                 理集值数据时, TSVM     通常利用集值对象的均值、中值等统计信息. 不同于                 TSVM, 提出能直接处理集值数据的孪
                 生支持函数机     (twin support function machine, TSFM). 依据集值对象定义的支持函数, TSFM  在巴拿赫空间取得非
                 平行的超平面. 为了抑制集值数据中的离群点, TSFM              采用了弹球损失函数并引入了集值对象的权重. 考虑到                 TSFM
                 是无穷维空间的优化问题, 测度采用狄拉克测度的线性组合的形式, 这构建有限维空间的优化模型. 为了有效地求
                 解优化模型, 利用采样策略将模型转化成二次规划                (quadratic programming, QP) 问题并推导出二次规划问题的对偶
                 形式, 这为判断哪些采样点是支持向量提供了理论基础. 为了分类集值数据, 定义集值对象到巴拿赫空间的超平面
                 的距离并由此得出判别规则. 也考虑支持函数的核化以便取得数据的非线性特征, 这使得提出的模型可用于不定
                 核函数. 实验结果表明, TSFM      能获取交叉类型的集值数据的内在结构, 并且在离群点或集值对象包含少量高维事
                 例的情况下取得了良好的分类性能.
                 关键词: 支持函数; 采样策略; 核函数; 判决规则; 集值数据
                 中图法分类号: TP18

                 中文引用格式: 梁志贞, 闵玉寒, 丁世飞. 面向集值数据的孪生支持函数机. 软件学报, 2025, 36(10): 4735–4752. http://www.jos.org.
                 cn/1000-9825/7306.htm
                 英文引用格式: Liang ZZ, Min YH, Ding SF. Twin Support Function Machine for Set-valued Data. Ruan Jian Xue Bao/Journal of
                 Software, 2025, 36(10): 4735–4752 (in Chinese). http://www.jos.org.cn/1000-9825/7306.htm

                 Twin Support Function Machine for Set-valued Data
                 LIANG Zhi-Zhen, MIN Yu-Han, DING Shi-Fei
                 (School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China)
                 Abstract:  Twin  support  vector  machine  (TSVM)  can  effectively  tackle  data  such  as  cross  or  XOR  data.  However,  when  set-valued  data
                 are  handled,  TSVM  usually  makes  use  of  statistical  information  of  set-valued  objects  such  as  the  mean  and  the  median.  Unlike  TSVM,
                 this  study  proposes  twin  support  function  machine  (TSFM)  that  can  directly  deal  with  set-valued  data.  In  terms  of  support  functions
                 defined  for  set-valued  objects,  TSFM  obtains  nonparallel  hyperplanes  in  a  Banach  space.  To  suppress  outliers  in  set-valued  data,  TSFM
                 adopts  the  pinball  loss  function  and  introduce  the  weights  of  set-valued  objects.  Considering  that  TSFM  involves  optimization  problems  in
                 the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in
                 the  finite-dimensional  space  is  constructed.  To  solve  the  optimization  model  effectively,  this  study  employs  the  sampling  strategy  to
                 transform  the  model  into  quadratic  programming  (QP)  problems.  The  dual  formulations  of  the  QP  problems  are  derived,  which  provides
                 theoretical  foundations  for  determining  which  sampling  points  are  support  vectors.  To  classify  set-valued  data,  the  distance  from  the  set-
                 valued  object  to  the  hyperplane  in  a  Banach  space  is  defined,  and  the  decision  rule  is  derived  therefrom.  This  study  also  considers  the
                 kernelization  of  support  functions  to  capture  the  nonlinear  features  of  data,  which  makes  the  proposed  model  available  for  indefinite
                 kernels.  Experimental  results  demonstrate  that  TSFM  can  capture  the  intrinsic  structure  of  cross-plane  set-valued  data  and  obtain  good
                 classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.
                 Key words:  support function; sampling strategy; kernel function; decision rule; set-valued data


                 *    收稿时间: 2024-03-17; 修改时间: 2024-07-18; 采用时间: 2024-10-12; jos 在线出版时间: 2025-02-26
                  CNKI 网络首发时间: 2025-02-27
   333   334   335   336   337   338   339   340   341   342   343