Page 220 - 《软件学报》2021年第12期
P. 220

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
         Journal of Software,2021,32(12):3884−3900 [doi: 10.13328/j.cnki.jos.006118]   http://www.jos.org.cn
         ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563


                                                                ∗
         基于动态赋权近邻传播的数据增量采样方法

               1,2
                       1,2
                                1,2
         陈晓琪 ,   谢振平 ,   刘   渊 ,   詹千熠   1,2
         1
          (江南大学  人工智能与计算机学院,江苏  无锡  214122)
         2 (江苏省媒体设计与软件技术重点实验室(江南大学),江苏  无锡   214122)
         通讯作者:  谢振平, E-mail: xiezp@jiangnan.edu.cn

         摘   要:  数据采样是快速提取大规模数据集中有用信息的重要手段,为更好地应对越来越大规模的数据高效处理
         要求,借助近邻传播算法的优异性能,通过引入分层增量处理和样本点动态赋权策略,实现了一种能够非常有效地平
         衡处理效率和采样质量的新方法.其中的分层增量处理策略考虑将原始的大规模数据集进行分批处理后再综合;而
         样本点动态赋权则考虑在近邻传播过程中对样本点进行合理的动态赋权,以获得采样的数据空间上更好的全局一
         致性.实验中,分别使用人工数据集、UCI 标准数据集和图像数据集进行性能分析,结果表明:新方法与现有相关方法
         在采样划分质量上可达到同等水平,而计算效率则可实现大幅提升.进一步将新方法应用于深度学习的数据增强任
         务中,相应的实验结果表明:在原始数据增强方法上结合进高效增量采样处理后,在保持总训练数据集规模的情况
         下,所获得的模型性能可实现显著的提升.
         关键词:  数据采样;近邻传播;动态赋权;增量采样;数据增强
         中图法分类号: TP311

         中文引用格式:  陈晓琪,谢振平,刘渊,詹千熠.基于动态赋权近邻传播的数据增量采样方法.软件学报,2021,32(12):3884−3900.
         http://www.jos.org.cn/1000-9825/6118.htm
         英文引用格式: Chen XQ, Xie ZP, Liu Y, Zhan QY. Incremental data sampling method using affinity propagation with dynamic
         weighting. Ruan Jian Xue Bao/Journal of Software, 2021,32(12):3884−3900 (in Chinese). http://www.jos.org.cn/1000-9825/6118.htm

         Incremental Data Sampling Method Using Affinity Propagation with Dynamic Weighting
                     1,2
                                               1,2
                                    1,2
         CHEN Xiao-Qi ,  XIE Zhen-Ping ,   LIU Yuan ,   ZHAN Qian-Yi 1,2
         1
          (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China)
         2
          (Jiangsu Key Laboratory of Media Design and Software Technology (Jiangnan University), Wuxi 214122, China)
         Abstract:    Data sampling is an important manner to efficiently extract useful information from original huge datasets. In order to fit with
         the requirements of efficiently dealing with more and more large-scale data, a novel incremental data sampling method originated from
         affinity propagation method is proposed, in which two integrated algorithm strategies including hierarchical incremental processing and
         the  dynamic weighting  of  data  samples are  introduced. The  proposed method mainly can  balance the computational efficiency and
         sampling quality very well. For hierarchical incremental processing strategy, it firstly samples data items in batches and then composites
         samples by hierarchical way. For dynamic weighting of data samples strategy, it dynamically re-weights the preference to retain better
         global consistency of possible samples on data space in the incremental sampling procedure. In the experiments, artificial datasets, UCI
         datasets, and image datasets are used to analyze the sampling performance. The experimental results with several compared algorithms
         indicate that, the proposed method can gain similar sampling quality but with notably higher computational efficiency especially for more
         large-scale datasets.  This study further  applies the new  method to data  augmentation task in deep learning,  and the  corresponding
         experimental results show that the proposed method performs excellently. Concretely, if basic training dataset are processed by sampling

            ∗  基金项目:  国家自然科学基金(61872166);  江苏省“六大人才高峰”项目(XYDXX-161);  江苏省科技计划(BE2018056)
              Foundation item:  National  Natural  Science Foundation of China  (61872166); Six  Talent Peaks Project of Jiangsu Province
         (XYDXX-161); Science and Technology Planning Project of Jiangsu Province (BE2018056)
              收稿时间: 2019-08-01;  修改时间: 2019-11-23, 2020-06-15;  采用时间: 2020-07-03
   215   216   217   218   219   220   221   222   223   224   225