Page 66 - 《软件学报》2024年第4期

P. 66

1644 软件学报 2024 年第 35 卷第 4 期

[14] Finn C, Levine S. Meta-learning: From few-shot learning to rapid reinforcement learning. In: Proc. of the Int’l Conf. on Machine
Learning. 2019.
[15] Vanschoren J. Meta-learning: A survey. 2018: 1−29.
[16] Huisman M, Van Rijn JN, Plaat A. A survey of deep meta-learning. Artificial Intelligence Review, 2021, 54(6): 4483−4541.
[17] Li FC, Liu Y, Wu PX, et al. A survey on recent advances in meta-learning. Chinese Journal of Computers, 2021, 44(2): 422−446
(in Chinese with English abstract).
[18] Tan XY, Zhang Z. Review on meta reinforcement learning. Journal of Nanjing University of Aeronautics & Astronautics, 2021,
53(5): 653−663 (in Chinese with English abstract).
[19] Yadav P, Mishra A, Lee J, et al. A survey on deep reinforcement learning-based approaches for adaptation and generalization.
arXiv:2202.08444, 2022.
[20] Levine S. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv:1805.00909, 2018.
[21] Zhao CY, Lai J. Survey on meta reinforcement learning. Application Research of Computers, 2023, 40(1): 1−10 (in Chinese with
English abstract).
[22] Beck J, Vuorio R, Liu EZ, et al. A survey of meta-reinforcement learning. arXiv:2301.08028, 2023.
[23] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):
529−533.
[24] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning. In: Proc. of the Int’l Conf. on
Machine Learning, Vol.48. 2016. 1995−2003.
[25] Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps. In: Proc. of the AAAI Fall Symp. 2015. 29−37.
[26] Willia RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992,
8(3−4): 229−256.
[27] Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proc. of the Int’l Conf. on Learning
Representations. 2016.
[28] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
[29] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic
actor. In: Proc. of the Int’l Conf. on Machine Learning, Vol.80. 2018. 1856−1865.
[30] Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proc. of the Int’l Conf. on
Machine Learning. 2018. 1587−1596.
[31] Finn C. Learning to Learn with Gradients. Berkeley: University of California, 2018.
[32] Rakelly K, Zhou A, Finn C, et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: Proc. of
the Int’l Conf. on Machine Learning. 2019. 5331−5340.
[33] Fakoor R, Chaudhari P, Soatto S, et al. Meta-Q-learning. In: Proc. of the Int’l Conf. on Learning Representations. 2020.
[34] Zintgraf L, Schulze S, Lu C, et al. VariBAD: Variational bayes-adaptive deep RL via meta-learning. The Journal of Machine
Learning Research, 2021, 22(1): 13198−13236.
[35] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. arXiv:1803.02999, 2018.
[36] Gordon J, Bronskill J, Nowozin S, et al. Meta-learning probabilistic inference for prediction. In: Proc. of the Int’l Conf. on
Learning Representations. 2018.
[37] Santoro A, Bartunov S, Botvinick M, et al. Meta-learning with memory-augmented neural networks. In: Proc. of the Int’l Conf. on
Machine Learning. 2016. 1842−1850.
[38] Ramalho T, Garnelo M. Adaptive posterior learning: few-shot learning with a surprise-based memory module. In: Proc. of the Int’l
Conf. on Learning Representations. 2019.
[39] Qiao S, Liu C, Shen W, et al. Few-shot image recognition by predicting parameters from activations. In: Proc. of the IEEE/ CVF
Conf. on Computer Vision and Pattern Recognition. IEEE, 2018. 7229−7238.
[40] Gidaris S, Komodakis N, Paristech P, et al. Dynamic few-shot visual learning without forgetting. In: Proc. of the IEEE Computer
Society Conf. on Computer Vision and Pattern Recognition, Vol.9. 2018. 4367−4375.
[41] Koch G. Siamese Neural Networks for One-Shot Image Recognition. University of Toronto, 2015.
[42] Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proc. of the Advances in Neural Information
Processing Systems (NIPS). 2016. 3637−3645.
[43] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proc. of the Advances in Neural Information
Processing Systems, Vol.30. 2017. 4077−4087.
[44] Varghese NV, Mahmoud QH. A survey of multi-task deep reinforcement learning. Electronics, 2020, 9(9): 1363.
[45] Khetarpal K, Riemer M, Rish I, et al. Towards continual reinforcement learning: A review and perspectives. Journal of Artificial
Intelligence Research, 2022, 75: 1401−1476.
[46] Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing, 2018, 312: 135−153.
[47] Zhou K, Liu Z, Qiao Y, et al. Domain generalization in vision: A survey. arXiv:2103.02503, 2021.

61 62 63 64 65 66 67 68 69 70 71