Page 10 - 《软件学报》2024年第4期
P. 10

1588                                                       软件学报  2024 年第 35 卷第 4 期

         Abstract:  With the development of technologies such  as big data,  computing,  and the Internet,  artificial intelligence techniques
         represented by machine learning and deep learning have achieved tremendous success. Particularly, the emergence of various large-scale
         models has greatly accelerated the application of artificial intelligence in various fields. However, the success of these techniques heavily
         relies on massive training data and abundant computing resources, which significantly limits their application in data or resource-scarce
         domains. Therefore, how to learn from limited samples, known as few-shot learning, has become a crucial research problem in the new
         wave of industrial transformation led by artificial intelligence. The most commonly used approach in few-shot learning is based on meta-
         learning. Such methods learn meta-knowledge for solving similar tasks by training on a series of related training tasks, which enables fast
         learning on new testing tasks using the  acquired  meta-knowledge. Although  these methods  have achieved  sound  results in few-shot
         classification tasks, they assume that the training and testing tasks come from the same distribution. This implies that a sufficient number
         of training tasks are required for the model to generalize the learned meta-knowledge to continuously changing testing tasks. However, in
         some real-world scenarios with truly limited data, ensuring an adequate number of training tasks is challenging. To address this issue, this
         study proposes a robust few-shot classification method based on diverse and authentic task generation (DATG). The method generates
         additional training tasks by  applying  Mixup  to a small  number  of existing  tasks, aiding the  model in learning. By  constraining the
         diversity and authenticity of the generated tasks, this method effectively improves the generalization of few-shot classification methods.
         Specifically, the base classes in the training set are firstly clustered to obtain different clusters and then tasks are selected from different
         clusters for  Mixup  to  increase  task diversity. Furthermore, performing inter-cluster tasks  Mixup  helps  alleviate the  learning of
         pseudo-discriminative features highly correlated with the categories. To ensure that the generated tasks do not deviate too much from the
         real distribution and mislead the model’s learning, the maximum mean discrepancy (MMD) between the generated tasks and real tasks is
         minimized,  thus ensuring the authenticity  of  the generated tasks. Finally,  it is  theoretically analyzed  why the inter-cluster task  Mixup
         strategy  can improve the  model’s  generalization performance.  Experimental results on  multiple datasets further demonstrate the
         effectiveness of the proposed method.
         Key words: few-shot learning; meta-learning; task Mixup; diversity; authenticity

             随着大数据、计算机、互联网等信息技术的不断进步,  以机器学习和深度学习为代表的人工智能技术得
         到了飞速发展,  在诸如图像分类、人机对弈、语音识别、知识问答、无人驾驶等应用场景取得了重大进展.  比
                                      [2]
                          [1]
                                                                                    [3]
         如,  残差网络 ResNet 在 ImageNet 数据集上的分类精度已经超过人类,  阿尔法狗(AlphaGo) 在围棋比赛中
                                                [4]
         战胜人类冠军,  自然语言领域的大模型 ChatGPT 可以像人类一样聊天交流.  然而,  目前深度学习算法的成功
         应用离不开海量训练数据和强大算力的支撑.  比如, ImageNet 的训练数据包含了 1 400 万张图片, AlphaGo 学
         习了 6  000 万盘棋局,  GPT-3 的训练数据高达 45TB,  需要数千个高端 GPU 同时进行训练,  这大大限制了深度
         学习模型在一些领域的应用.  比如:  由于专业性或者安全性问题,  医疗或军事领域的数据通常很难获得或者
         标注成本很高,  创建该领域的大规模训练数据集是十分困难的.  因此,  如何在只有少量有标签训练样本的情
         况下进行学习,  也就是小样本学习是以人工智能技术引领的新一轮产业变革中一个十分重要的研究问题.
                                                   [5]
             基于元学习的方法是小样本学习中主流的方法 ,  这类方法通过在大量类似的小样本任务上学习解决这
                                                                                        [6]
         类任务的元知识,  利用这些元知识帮助目标小样本任务进行快速学习.  比如:  代表性工作 MAML 希望学习
         的元知识是一个可以适用于不同小样本任务的初始点,  在新任务上只需要几步就可以达到最优;  原型网络                                    [7]
         希望学习一个可以适用于不同任务的度量空间,  在该空间中,  通过比较查询样本与每个类原型之间的距离进
         行分类.  沿袭这两种思路,  一系列基于元学习优化               [8−16] 和基于度量 [17−21] 的小样本学习方法被提了出来.  虽然
         这些方法在小样本分类任务上取得了不错的效果,  但这些方法隐含着一个假设:  训练任务和测试任务的分布
         是一致的.  这就要求我们的训练任务数量足够多,  可以有效地代表整体任务的分布.  但事实上,  由于某些领域
         训练样本数量的缺乏,  训练任务的数量也是难以保证的.  这就导致在训练任务上学习的元知识不一定能很好
         地适用于测试任务.  此外,  在小样本分类任务中,  训练任务和测试任务的类别通常是不相交的,  因此训练任务
         和测试任务之间很容易存在分布偏差,  进一步加剧了元知识迁移的难度.  为此,  一些文献通过加权的方式对
         训练任务的分布进行修正,  使其与测试任务的分布更加一致                     [22,23] .  但是由于事先我们很难得知测试任务的分
         布,  而且实际场景中测试任务可能会随着时间发生变化,  通过加权修正训练任务分布的方式并不一定有效.
         因此,  有文献提出通过任务扩充的方式来增加训练分布的多样性,  从而提高其泛化能力.  比如 Meta-MaxUp                            [24]
         组合不同的数据扩充方式来对支持集、查询集和任务进行扩充, MLTI                      [25] 通过对随机采取的任务的 Mixup 来扩
   5   6   7   8   9   10   11   12   13   14   15