Page 40 - 《软件学报》2024年第4期
P. 40
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software, 2024,35(4):1618−1650 [doi: 10.13328/j.cnki.jos.007011] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
∗
元强化学习研究综述
1,2
3
1,2
1,2
陈奕宇 , 霍 静 , 丁天雨 , 高 阳
1 (南京大学 计算机科学与技术系, 江苏 南京 210043)
2 (计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210043)
3 (Applied Sciences Group, Microsoft, Redmond, WA 98034, USA)
通信作者: 高阳, E-mail: gaoy@nju.edu.cn
摘 要: 近年来, 深度强化学习(deep reinforcement learning, DRL)已经在诸多序贯决策任务中取得瞩目成功, 但当
前, 深度强化学习的成功很大程度依赖于海量的学习数据与计算资源, 低劣的样本效率和策略通用性是制约其进
一步发展的关键因素. 元强化学习(meta-reinforcement learning, Meta-RL)致力于以更小的样本量适应更广泛的任
务, 其研究有望缓解上述限制从而推进强化学习领域发展. 以元强化学习工作的研究对象与适用场景为脉络, 对
元强化学习领域的研究进展进行了全面梳理: 首先, 对深度强化学习、元学习背景做基本介绍; 然后, 对元强化学
习作形式化定义及常见的场景设置总结, 并从元强化学习研究成果的适用范围角度展开介绍元强化学习的现有研
究进展; 最后, 分析了元强化学习领域的研究挑战与发展前景.
关键词: 元强化学习; 强化学习; 深度强化学习; 元学习
中图法分类号: TP18
中文引用格式: 陈奕宇, 霍静, 丁天雨, 高阳. 元强化学习研究综述. 软件学报, 2024, 35(4): 1618−1650. http://www.jos.org.
cn/1000-9825/7011.htm
英文引用格式: Chen YY, Huo J, Ding TY, Gao Y. Survey of Meta-reinforcement Learning Research. Ruan Jian Xue Bao/Journal of
Software, 2024, 35(4): 1618−1650 (in Chinese). http://www.jos.org.cn/1000-9825/7011.htm
Survey of Meta-reinforcement Learning Research
1,2
3
1,2
CHEN Yi-Yu , HUO Jing , DING Tian-Yu , GAO Yang
1,2
1 (Department of Computer Science and Technology, Nanjing University, Nanjing 210043, China)
2 (State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210043, China)
3 (Applied Sciences Group, Microsoft, Redmond, WA 98034, USA)
Abstract: In recent years, deep reinforcement learning (DRL) has achieved remarkable success in many sequential decision-making tasks.
However, the current success of deep reinforcement learning heavily relies on massive learning data and computing resources. The poor
sample efficiency and strategy generalization ability are the key factors restricting DRL’s further development. Meta-reinforcement
learning (Meta-RL) studies to adapt to a wider range of tasks with a smaller sample size. Related researches are expected to alleviate the
above limitations and promote the development of reinforcement learning. Taking the scope of research object and application range of
current research works, this study comprehensively combs the research progress in the field of meta-reinforcement learning. Firstly, a
basic introduction is given to deep reinforcement learning and the background of meta-reinforcement learning. Then, meta-reinforcement
learning is formally defined and common scene settings are summarized, and the current research progress of meta-reinforcement learning
is also introduced from the perspective of application range of the research results. Finally, the research challenges and potential future
development directions are discussed.
∗ 基金项目: 科技创新 2030—“新一代人工智能”重大项目(2021ZD0113303); 国家自然科学基金(62192783, 62276128)
本文由“绿色低碳机器学习研究与应用”专题特约编辑封举富教授、俞扬教授、刘淇教授推荐.
收稿时间: 2023-05-14; 修改时间: 2023-07-07; 采用时间: 2023-08-24; jos 在线出版时间: 2023-09-11
CNKI 网络首发时间: 2023-11-24