Page 156 - 《软件学报》2021年第11期
P. 156

软件学报 ISSN 1000-9825, CODEN RUXUEW                                       E-mail: jos@iscas.ac.cn
                 Journal of Software,2021,32(11):3482−3495 [doi: 10.13328/j.cnki.jos.006058]   http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                         Tel: +86-10-62562563


                                                                             ∗
                 跨域和跨模态适应学习的无监督细粒度视频分类

                 何相腾,   彭宇新


                 (北京大学  王选计算机研究所,北京  100080)
                 通讯作者:  彭宇新, E-mail: pengyuxin@pku.edu.cn

                 摘   要:  细粒度视频分类旨在识别粗粒度大类中的细粒度子类,是计算机视觉中一个极具挑战的任务.考虑到视
                 频数据的标注成本巨大,而图像的标注成本相对较小,且细粒度图像分类已经取得了较为显著的进展,一个自然的想
                 法是不用标注,以无监督的方式将细粒度图像分类中学习到的知识自适应地迁移到细粒度视频分类中.然而,来源不
                 同的图像和视频之间存在着域差异和模态差异,这导致细粒度图像分类的模型不能直接应用于细粒度视频分类.为
                 了实现无监督的细粒度视频分类,提出一种无监督辨识适应网络,能够将辨识性定位能力从细粒度图像分类迁移到
                 细粒度视频分类.进一步,提出一种渐进式伪标签策略来迭代地引导无监督辨识适应网络学习目标域视频的数据分
                 布.在 CUB-200-2011、Cars-196 图像数据集和 YouTube Birds、YouTube Cars 视频数据集上验证该方法跨域、跨模
                 态的适应能力,实验结果证明了该方法在无监督细粒度视频分类上的优势.
                 关键词:  细粒度视频分类;无监督辨识适应网络;域差异;模态差异;域适应
                 中图法分类号: TP181

                 中文引用格式:  何相腾,彭宇新.跨域和跨模态适应学习的无监督细粒度视频分类.软件学报,2021,32(11):3482−3495.  http://
                 www.jos.org.cn/1000-9825/6058.htm
                 英文引用格式: He XT, Peng YX. Unsupervised fine-grained video categorization via adaptation learning across domains and
                 modalities. Ruan Jian Xue Bao/Journal of Software, 2021,32(11):3482−3495 (in Chinese). http://www.jos.org.cn/1000-9825/6058.htm

                 Unsupervised Fine-grained Video  Categorization  via Adaptation Learning Across  Domains
                 and Modalities

                 HE Xiang-Teng,    PENG Yu-Xin
                 (Wangxuan Institute of Computer Technology, Peking University, Beijing 100080, China)

                 Abstract:    Fine-grained video categorization is a highly challenging task to discriminate similar subcategories that belong to the same
                 basic-level category. Due to the significant advances in fine-grained image categorization and expensive cost of labeling video data, it is
                 intuitive to adapt the knowledge learned from image to video in an unsupervised manner. However, there is a clear gap to directly apply
                 the  models learned from image to recognize  the fine-grained instances  in video, due to domain distinction  and  modality distinction
                 between image and video. Therefore, this study proposes the unsupervised discriminative adaptation network (UDAN), which transfers
                 the  ability of discrimination localization from image to video. A progressive pseudo labeling strategy is  adopted to iteratively guide
                 UDAN to approximate the distribution of the target video data. To verify the effectiveness of the proposed UDAN approach, adaptation
                 tasks  between image  and video  are performed,  adapting  the knowledge learned from  CUB-200-2011/Cars-196 datasets (image)  to
                 YouTube Birds/YouTube  Cars datasets (video).  Experimental results illustrate the  advantage of the proposed  UDAN  approach for
                 unsupervised fine-grained video categorization.
                 Key words:   fine-grained video  categorization; unsupervised discriminative  adaptation network; domain distinction;  modality
                          distinction; domain adaption


                   ∗  基金项目:  国家自然科学基金(61925201, 61771025)
                      Foundation item: National Natural Science Foundation of China (61925201, 61771025)
                     收稿时间: 2019-09-09;  修改时间: 2020-03-09;  采用时间: 2020-04-19
   151   152   153   154   155   156   157   158   159   160   161