Page 338 - 《软件学报》2025年第10期

P. 338

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(10):4735−4752 [doi: 10.13328/j.cnki.jos.007306] [CSTR: 32375.14.jos.007306] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

*
面向集值数据的孪生支持函数机

梁志贞, 闵玉寒, 丁世飞

(中国矿业大学计算机科学与技术学院, 江苏徐州 221116)
通信作者: 梁志贞, E-mail: liang@cumt.edu.cn

摘要: 孪生支持向量机 (twin support vector machine, TSVM) 能有效地处理交叉或异或等类型的数据. 然而, 当处
理集值数据时, TSVM 通常利用集值对象的均值、中值等统计信息. 不同于 TSVM, 提出能直接处理集值数据的孪
生支持函数机 (twin support function machine, TSFM). 依据集值对象定义的支持函数, TSFM 在巴拿赫空间取得非
平行的超平面. 为了抑制集值数据中的离群点, TSFM 采用了弹球损失函数并引入了集值对象的权重. 考虑到 TSFM
是无穷维空间的优化问题, 测度采用狄拉克测度的线性组合的形式, 这构建有限维空间的优化模型. 为了有效地求
解优化模型, 利用采样策略将模型转化成二次规划 (quadratic programming, QP) 问题并推导出二次规划问题的对偶
形式, 这为判断哪些采样点是支持向量提供了理论基础. 为了分类集值数据, 定义集值对象到巴拿赫空间的超平面
的距离并由此得出判别规则. 也考虑支持函数的核化以便取得数据的非线性特征, 这使得提出的模型可用于不定
核函数. 实验结果表明, TSFM 能获取交叉类型的集值数据的内在结构, 并且在离群点或集值对象包含少量高维事
例的情况下取得了良好的分类性能.
关键词: 支持函数; 采样策略; 核函数; 判决规则; 集值数据
中图法分类号: TP18

中文引用格式: 梁志贞, 闵玉寒, 丁世飞. 面向集值数据的孪生支持函数机. 软件学报, 2025, 36(10): 4735–4752. http://www.jos.org.
cn/1000-9825/7306.htm
英文引用格式: Liang ZZ, Min YH, Ding SF. Twin Support Function Machine for Set-valued Data. Ruan Jian Xue Bao/Journal of
Software, 2025, 36(10): 4735–4752 (in Chinese). http://www.jos.org.cn/1000-9825/7306.htm

Twin Support Function Machine for Set-valued Data
LIANG Zhi-Zhen, MIN Yu-Han, DING Shi-Fei
(School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China)
Abstract: Twin support vector machine (TSVM) can effectively tackle data such as cross or XOR data. However, when set-valued data
are handled, TSVM usually makes use of statistical information of set-valued objects such as the mean and the median. Unlike TSVM,
this study proposes twin support function machine (TSFM) that can directly deal with set-valued data. In terms of support functions
defined for set-valued objects, TSFM obtains nonparallel hyperplanes in a Banach space. To suppress outliers in set-valued data, TSFM
adopts the pinball loss function and introduce the weights of set-valued objects. Considering that TSFM involves optimization problems in
the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in
the finite-dimensional space is constructed. To solve the optimization model effectively, this study employs the sampling strategy to
transform the model into quadratic programming (QP) problems. The dual formulations of the QP problems are derived, which provides
theoretical foundations for determining which sampling points are support vectors. To classify set-valued data, the distance from the set-
valued object to the hyperplane in a Banach space is defined, and the decision rule is derived therefrom. This study also considers the
kernelization of support functions to capture the nonlinear features of data, which makes the proposed model available for indefinite
kernels. Experimental results demonstrate that TSFM can capture the intrinsic structure of cross-plane set-valued data and obtain good
classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.
Key words: support function; sampling strategy; kernel function; decision rule; set-valued data

* 收稿时间: 2024-03-17; 修改时间: 2024-07-18; 采用时间: 2024-10-12; jos 在线出版时间: 2025-02-26
CNKI 网络首发时间: 2025-02-27

333 334 335 336 337 338 339 340 341 342 343