Page 71 - 《软件学报》2024年第4期
P. 71

陈奕宇  等:  元强化学习研究综述                                                               1649


        [167]     Papoudakis G, Christianos F, Schäfer L, et al. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative
             tasks. In: Proc. of the Advances in Neural Information Processing Systems Track on Datasets and Benchmarks. 2021.
        [168]     Li Q, Peng Z, Feng L, et al. MetaDrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE Trans.
             on Pattern Analysis and Machine Intelligence, 2022.
        [169]     Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations. Proc. of the AAAI Conf. on
             Artificial Intelligence, 2018, 32(1): 1495−1502.
        [170]     Samvelyan M, Rashid T, De Witt CS, et al. The StarCraft multi-agent challenge. In: Proc. of the Int’l Conf. on Autonomous Agents
             and MultiAgent Systems. 2019. 2186−2188.
        [171]     Bergstrom CT, Godfrey-Smith P. On the evolution of behavioral  heterogeneity in individuals and  populations.  Biology and
             Philosophy, 1998, 13(2): 205−231.
        [172]     Brown N, Sandholm T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 2018, 359(6374):
             418−424.
        [173]     Rosa M,  Afanasjeva  O, Andersson S,  et al. BADGER:  Learning  to (learn [learning algorithms] through multi-agent
             communication). arXiv:1912.01513. 2019.
        [174]     Zintgraf  L, Devlin S, Ciosek K, et al. Deep interactive bayesian reinforcement learning via meta-learning. In: Proc. of the Int’l
             Conf. on Autonomous Agents and MultiAgent Systems. 2021. 1712−1714.
        [175]     Huang J, Huang W, Wu D, et al. Meta actor-critic framework for multi-agent reinforcement learning. In: Proc. of the Int’l Conf. on
             Artificial Intelligence and Pattern Recognition. Association for Computing Machinery, 2021. 636−643.
        [176]     Schäfer  L,  Christianos  F,  Storkey  A,  et al. Learning task embeddings for teamwork adaptation in multi-agent reinforcement
             learning. 2022, 1−23.
        [177]     Harris K, Anagnostides I, Farina G, et al. Meta-learning in games. arXiv:2209.14110, 2022.
        [178]     Muglich D, Zintgraf L, De Witt CS, et al. Generalized beliefs for cooperative AI. In: Proc. of the Int’l Conf. on Machine Learning.
             2022. 16062−16082.
        [179]    Yun WJ, Park J, Kim J. Quantum multi-agent meta reinforcement learning. Proc. of the AAAI Conf. on Artificial Intelligence, 2023,
             37(9): 11087−11095.
        [180]     James S,  Wohlhart  P,  Kalakrishnan  M,  et al. Sim-to-real  via sim-to-sim:  Data-efficient robotic grasping via randomized-to-
             canonical adaptation networks. In: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2019. 12627−12637.
        [181]     Zhao Z, Nagabandi A, Rakelly K, et al. MELD: Meta-reinforcement learning from images via latent state models. In: Proc. of the
             Conf. on Robot Learning, Vol.155. 2020. 1246−1261.
        [182]     Yu T, Finn C, Xie A, et al. One-shot imitation from observing humans via domain-adaptive meta-learning. In: Proc. of the Int’l
             Conf. on Learning Representations—Workshop Track Proceedings. 2018.
        [183]     Schoettler G, Nair A, Ojea JA, et al. Meta-reinforcement learning for robotic industrial insertion tasks. In: Proc. of the IEEE/RSJ
             Int’l Conf. on Intelligent Robots and Systems. 2020. 9728−9735.
        [184]     Arndt K, Hazara M, Ghadirzadeh A, et al. Meta reinforcement learning for sim-to-real domain adaptation. In: Proc. of the IEEE
             Int’l Conf. on Robotics and Automation. 2020. 2725−2731.
        [185]     Jang E, Irpan A, Khansari M, et al. BC-Z: Zero-shot task generalization with robotic imitation learning. In: Proc. of the Conf. on
             Robot Learning. 2022. 991−1002.
        [186]     Harrison  J, Sharma  A, Calandra R,  et al. Control adaptation via meta-learning dynamics.  In:  Proc. of the  Workshop on Meta-
             Learning at the Conf. on Neural Information Processing Systems. 2018.
        [187]    Ghadirzadeh A, Chen X, Poklukar P, et al. Bayesian meta-learning for few-shot policy adaptation across robotic platforms. In: Proc.
             of the IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems. 2021(2): 1274−1280.
        [188]     Tiboni G, Arndt K, Kyrki V. DROPO: Sim-to-real transfer with offline domain randomization. arXiv:2201.08434, 2022.
        [189]     Bing Z, Koch A, Yao X, et al. Meta-reinforcement learning via language instructions. In: Proc. of the 2023 IEEE Int’l Conf. on
             Robotics and Automation (ICRA). 2023. 5985−5991.
        [190]     Ross S, Bagnell JA. Agnostic system identification for model-based reinforcement learning. In: Proc. of the Int’l Conf. on Machine
             Learning, Vol.2. 2012. 1703−1710.
        [191]     Yu W, Tan J, Liu CK, et al. Preparing for the unknown: learning a universal policy with online system identification. In: Proc. of
             the Robotics: Science and Systems, Vol.13. 2017.
        [192]     Liang J, Saxena S, Kroemer O. Learning active task-oriented exploration policies for bridging the sim-to-real gap. In: Proc. of the
             Robotics: Science and Systems. 2020.
        [193]     Farid K, Sakr N. Few-shot system identification for reinforcement learning. In: Proc. of the Asia-Pacific Conf. on Intelligent Robot
             Systems. IEEE, 2021. 1−7.
        [194]     Zhu Y, Mottaghi R, Kolve E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proc. of
             the IEEE Int’l Conf. on Robotics and Automation. 2017. 3357−3364.
   66   67   68   69   70   71   72   73   74   75   76