Page 142 - 《软件学报》2020年第12期
P. 142
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(12):3808−3822 [doi: 10.13328/j.cnki.jos.005890] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
∗
自适应主动半监督学习方法
1
1
1
李延超 , 肖 甫 , 陈 志 , 李 博 2
1
(南京邮电大学 计算机学院 软件学院 网络空间安全学院,江苏 南京 210023)
2 (南京理工大学 计算机科学与工程学院,江苏 南京 210094)
通讯作者: 李延超, E-mail: yanchao@njupt.edu.cn
摘 要: 主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受 3 个限制:
(1) 一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未
标记样本;(2) 现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数
或差异性衡量;(3) 噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习
方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立
联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在
提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,
使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚
类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.
关键词: 主动学习;半监督学习;分类;聚类
中图法分类号: TP181
中文引用格式: 李延超,肖甫,陈志,李博.自适应主动半监督学习方法.软件学报,2020,31(12):3808−3822. http://www.jos.org.cn/
1000-9825/5890.htm
英文引用格式: Li YC, Xiao F, Chen Z, Li B. Adaptive active learning for semi-supervised learning. Ruan Jian Xue Bao/Journal
of Software, 2020,31(12):3808−3822 (in Chinese). http://www.jos.org.cn/1000-9825/5890.htm
Adaptive Active Learning for Semi-supervised Learning
1
1
1
LI Yan-Chao , XIAO Fu , CHEN Zhi , LI Bo 2
1
(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)
2
(School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China)
Abstract: Active learning algorithms attempt to overcome the labeling bottleneck by asking queries from a large collection of unlabeled
examples. Existing batch mode active learning algorithms suffer from three limitations: (1) the models with assumption on data are hard
in finding images that are both informative and representative; (2) the methods that are based on similarity function or optimizing certain
diversity measurement may lead to suboptimal performance and produce the selected set with redundant examples; (3) the problem of
noise labels has been an obstacle for active learning algorithms. This study proposes a novel batch mode active learning method based on
deep learning. The deep neural network generates the representations (embeddings) of labeled and unlabeled examples, and label cycle
mode is adopted by connecting the embeddings from labeled examples to those of unlabeled examples and back at the same class, which
considers both informativeness and representativeness of examples, as well as being robust to noisy labels. The proposed active learning
method is applied to semi-supervised classification and clustering. The submodular function is designed to reduce the redundancy of the
selected examples. Moreover, the query criteria of weighting losses are optimized in active learning, which automatically trade off the
∗ 基金项目: 国家自然科学基金(61932013); 江苏省自然科学基金(BK20200739); 江苏省 333 高层次人才培养工程(BRA2020065)
Foundation item: National Natural Science Foundation of China (61932013); Natural Science Foundation of Jiangsu Province of
China (BK20200739); Research Foundation of Jiangsu for 333 High Level Talents Training Project (BRA2020065)
收稿时间: 2019-07-07; 修改时间: 2019-07-28; 采用时间: 2019-09-16