Page 156 - 《软件学报》2021年第11期

P. 156

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2021,32(11):3482−3495 [doi: 10.13328/j.cnki.jos.006058] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
跨域和跨模态适应学习的无监督细粒度视频分类

何相腾, 彭宇新

(北京大学王选计算机研究所,北京 100080)
通讯作者: 彭宇新, E-mail: pengyuxin@pku.edu.cn

摘要: 细粒度视频分类旨在识别粗粒度大类中的细粒度子类,是计算机视觉中一个极具挑战的任务.考虑到视
频数据的标注成本巨大,而图像的标注成本相对较小,且细粒度图像分类已经取得了较为显著的进展,一个自然的想
法是不用标注,以无监督的方式将细粒度图像分类中学习到的知识自适应地迁移到细粒度视频分类中.然而,来源不
同的图像和视频之间存在着域差异和模态差异,这导致细粒度图像分类的模型不能直接应用于细粒度视频分类.为
了实现无监督的细粒度视频分类,提出一种无监督辨识适应网络,能够将辨识性定位能力从细粒度图像分类迁移到
细粒度视频分类.进一步,提出一种渐进式伪标签策略来迭代地引导无监督辨识适应网络学习目标域视频的数据分
布.在 CUB-200-2011、Cars-196 图像数据集和 YouTube Birds、YouTube Cars 视频数据集上验证该方法跨域、跨模
态的适应能力,实验结果证明了该方法在无监督细粒度视频分类上的优势.
关键词: 细粒度视频分类;无监督辨识适应网络;域差异;模态差异;域适应
中图法分类号: TP181

中文引用格式: 何相腾,彭宇新.跨域和跨模态适应学习的无监督细粒度视频分类.软件学报,2021,32(11):3482−3495. http://
www.jos.org.cn/1000-9825/6058.htm
英文引用格式: He XT, Peng YX. Unsupervised fine-grained video categorization via adaptation learning across domains and
modalities. Ruan Jian Xue Bao/Journal of Software, 2021,32(11):3482−3495 (in Chinese). http://www.jos.org.cn/1000-9825/6058.htm

Unsupervised Fine-grained Video Categorization via Adaptation Learning Across Domains
and Modalities

HE Xiang-Teng, PENG Yu-Xin
(Wangxuan Institute of Computer Technology, Peking University, Beijing 100080, China)

Abstract: Fine-grained video categorization is a highly challenging task to discriminate similar subcategories that belong to the same
basic-level category. Due to the significant advances in fine-grained image categorization and expensive cost of labeling video data, it is
intuitive to adapt the knowledge learned from image to video in an unsupervised manner. However, there is a clear gap to directly apply
the models learned from image to recognize the fine-grained instances in video, due to domain distinction and modality distinction
between image and video. Therefore, this study proposes the unsupervised discriminative adaptation network (UDAN), which transfers
the ability of discrimination localization from image to video. A progressive pseudo labeling strategy is adopted to iteratively guide
UDAN to approximate the distribution of the target video data. To verify the effectiveness of the proposed UDAN approach, adaptation
tasks between image and video are performed, adapting the knowledge learned from CUB-200-2011/Cars-196 datasets (image) to
YouTube Birds/YouTube Cars datasets (video). Experimental results illustrate the advantage of the proposed UDAN approach for
unsupervised fine-grained video categorization.
Key words: fine-grained video categorization; unsupervised discriminative adaptation network; domain distinction; modality
distinction; domain adaption

∗ 基金项目: 国家自然科学基金(61925201, 61771025)
Foundation item: National Natural Science Foundation of China (61925201, 61771025)
收稿时间: 2019-09-09; 修改时间: 2020-03-09; 采用时间: 2020-04-19

151 152 153 154 155 156 157 158 159 160 161