Page 71 - 《软件学报》2024年第4期
P. 71
陈奕宇 等: 元强化学习研究综述 1649
[167] Papoudakis G, Christianos F, Schäfer L, et al. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative
tasks. In: Proc. of the Advances in Neural Information Processing Systems Track on Datasets and Benchmarks. 2021.
[168] Li Q, Peng Z, Feng L, et al. MetaDrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE Trans.
on Pattern Analysis and Machine Intelligence, 2022.
[169] Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations. Proc. of the AAAI Conf. on
Artificial Intelligence, 2018, 32(1): 1495−1502.
[170] Samvelyan M, Rashid T, De Witt CS, et al. The StarCraft multi-agent challenge. In: Proc. of the Int’l Conf. on Autonomous Agents
and MultiAgent Systems. 2019. 2186−2188.
[171] Bergstrom CT, Godfrey-Smith P. On the evolution of behavioral heterogeneity in individuals and populations. Biology and
Philosophy, 1998, 13(2): 205−231.
[172] Brown N, Sandholm T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 2018, 359(6374):
418−424.
[173] Rosa M, Afanasjeva O, Andersson S, et al. BADGER: Learning to (learn [learning algorithms] through multi-agent
communication). arXiv:1912.01513. 2019.
[174] Zintgraf L, Devlin S, Ciosek K, et al. Deep interactive bayesian reinforcement learning via meta-learning. In: Proc. of the Int’l
Conf. on Autonomous Agents and MultiAgent Systems. 2021. 1712−1714.
[175] Huang J, Huang W, Wu D, et al. Meta actor-critic framework for multi-agent reinforcement learning. In: Proc. of the Int’l Conf. on
Artificial Intelligence and Pattern Recognition. Association for Computing Machinery, 2021. 636−643.
[176] Schäfer L, Christianos F, Storkey A, et al. Learning task embeddings for teamwork adaptation in multi-agent reinforcement
learning. 2022, 1−23.
[177] Harris K, Anagnostides I, Farina G, et al. Meta-learning in games. arXiv:2209.14110, 2022.
[178] Muglich D, Zintgraf L, De Witt CS, et al. Generalized beliefs for cooperative AI. In: Proc. of the Int’l Conf. on Machine Learning.
2022. 16062−16082.
[179] Yun WJ, Park J, Kim J. Quantum multi-agent meta reinforcement learning. Proc. of the AAAI Conf. on Artificial Intelligence, 2023,
37(9): 11087−11095.
[180] James S, Wohlhart P, Kalakrishnan M, et al. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-
canonical adaptation networks. In: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2019. 12627−12637.
[181] Zhao Z, Nagabandi A, Rakelly K, et al. MELD: Meta-reinforcement learning from images via latent state models. In: Proc. of the
Conf. on Robot Learning, Vol.155. 2020. 1246−1261.
[182] Yu T, Finn C, Xie A, et al. One-shot imitation from observing humans via domain-adaptive meta-learning. In: Proc. of the Int’l
Conf. on Learning Representations—Workshop Track Proceedings. 2018.
[183] Schoettler G, Nair A, Ojea JA, et al. Meta-reinforcement learning for robotic industrial insertion tasks. In: Proc. of the IEEE/RSJ
Int’l Conf. on Intelligent Robots and Systems. 2020. 9728−9735.
[184] Arndt K, Hazara M, Ghadirzadeh A, et al. Meta reinforcement learning for sim-to-real domain adaptation. In: Proc. of the IEEE
Int’l Conf. on Robotics and Automation. 2020. 2725−2731.
[185] Jang E, Irpan A, Khansari M, et al. BC-Z: Zero-shot task generalization with robotic imitation learning. In: Proc. of the Conf. on
Robot Learning. 2022. 991−1002.
[186] Harrison J, Sharma A, Calandra R, et al. Control adaptation via meta-learning dynamics. In: Proc. of the Workshop on Meta-
Learning at the Conf. on Neural Information Processing Systems. 2018.
[187] Ghadirzadeh A, Chen X, Poklukar P, et al. Bayesian meta-learning for few-shot policy adaptation across robotic platforms. In: Proc.
of the IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems. 2021(2): 1274−1280.
[188] Tiboni G, Arndt K, Kyrki V. DROPO: Sim-to-real transfer with offline domain randomization. arXiv:2201.08434, 2022.
[189] Bing Z, Koch A, Yao X, et al. Meta-reinforcement learning via language instructions. In: Proc. of the 2023 IEEE Int’l Conf. on
Robotics and Automation (ICRA). 2023. 5985−5991.
[190] Ross S, Bagnell JA. Agnostic system identification for model-based reinforcement learning. In: Proc. of the Int’l Conf. on Machine
Learning, Vol.2. 2012. 1703−1710.
[191] Yu W, Tan J, Liu CK, et al. Preparing for the unknown: learning a universal policy with online system identification. In: Proc. of
the Robotics: Science and Systems, Vol.13. 2017.
[192] Liang J, Saxena S, Kroemer O. Learning active task-oriented exploration policies for bridging the sim-to-real gap. In: Proc. of the
Robotics: Science and Systems. 2020.
[193] Farid K, Sakr N. Few-shot system identification for reinforcement learning. In: Proc. of the Asia-Pacific Conf. on Intelligent Robot
Systems. IEEE, 2021. 1−7.
[194] Zhu Y, Mottaghi R, Kolve E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proc. of
the IEEE Int’l Conf. on Robotics and Automation. 2017. 3357−3364.