Page 264 - 《软件学报》2025年第5期

P. 264

2164 软件学报 2025 年第 36 卷第 5 期

型构造了一个以 DDA-STGConv 为基本单元的平行多尺度网络, 通过构造跨域邻接矩阵同时捕捉空域和时域的动
态判别特征, 利用注意力机制探索节点之间的关联关系, 增强节点间的信息交互. 此外, 通过设计图拓扑聚合函数,
构造基于不同图拓扑结构的平行多尺度子网络模块, 实现不同尺度节点特征的聚合, 有效提取骨架关节点局部和
全局的特征信息. 同时, 设计多尺度特征交叉融合模块 (MFEB), 加强平行网络间多尺度信息的交互, 增强网络模
型特征表示的能力. 在两大人体姿态估计数据集上与目前主流的方法进行对比, 实验结果表明所提出的网络模型
获得较好的姿态估计结果. PMST-GNet 模型的灵活性可为行为识别, 运动预测及场景理解等领域的工作提供技术
支持.

References:
[1] Zhang Y, Wen GZ, Mi SY, Zhang ML, Geng X. Overview on 2D human pose estimation based on deep learning. Ruan Jian Xue
Bao/Journal of Software, 2022, 33(11): 4173–4191 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6390.htm [doi:
10.13328/j.cnki.jos.006390]
[2] Ding J, Shu XB, Huang P, Yao YZ, Song Y. Multimodal and multi-granularity graph convolutional networks for elderly daily activity
recognition. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2350–2364 (in Chinese with English abstract). http://www.jos.org.cn/
1000-9825/6439.htm [doi: 10.13328/j.cnki.jos.006439]
[3] Yang HH, Liu HX, Zhang YM, Wu XJ. FMR-GNet: Forward mix-hop spatial-temporal residual graph network for 3D pose estimation.
Chinese Journal of Electronics, 2024, 33(6): 1–14.
[4] Moon G, Lee KM. I2L-MeshNet: Image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB
image. In: Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 752–768. [doi: 10.1007/978-3-030-
58571-6_44]
[5] Zhao L, Peng X, Tian Y, Kapadia M, Metaxas DN. Semantic graph convolutional networks for 3D human pose regression. In: Proc. of
the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 3420–3430. [doi: 10.1109/
CVPR.2019.00354]
[6] Liu JF, Rojas J, Li YH, Liang ZJ, Guan YS, Xi N, Zhu HF. A graph attention spatio-temporal convolutional network for 3D human pose
estimation in video. In: Proc. of the 2021 IEEE Int’l Conf. on Robotics and Automation. Xi’an: IEEE, 2021. 3374–3380. [doi: 10.1109/
ICRA48506.2021.9561605]
[7] Cai YJ, Ge LH, Liu J, Cai JF, Cham TJ, Yuan JS, Thalmann NM. Exploiting spatial-temporal relationships for 3D pose estimation via
graph convolutional networks. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 2272–2281.
[doi: 10.1109/ICCV.2019.00236]
[8] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning
Representations. Toulon: OpenReview.net, 2017.
[9] Wu YP, Kong DH, Wang SF, Li JH, Yin BC. HPGCN: Hierarchical poselet-guided graph convolutional network for 3D pose estimation.
Neurocomputing, 2022, 487: 243–256. [doi: 10.1016/j.neucom.2021.11.007]
[10] Wang WG, Shen JB, Jia YD. Review of visual attention detection. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 416–439 (in
Chinese with English abstract). http://www.jos.org.cn/1000-9825/5636.htm [doi: 10.13328/j.cnki.jos.005636]
[11] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
[12] Pavllo D, Feichtenhofer C, Grangier D, Auli M. 3D human pose estimation in video with temporal convolutions and semi-supervised
training. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019.
7745–7754. [doi: 10.1109/CVPR.2019.00794]
[13] Sun K, Xiao B, Liu D, Wang JD. Deep high-resolution representation learning for human pose estimation. In: Proc. of the 2019
IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5686–5696. [doi: 10.1109/CVPR.2019.00584]
[14] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in
natural environments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325–1339. [doi: 10.1109/TPAMI.2013.
248]
[15] Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu WP, Theobalt C. Monocular 3D human pose estimation in the wild using
improved CNN supervision. In: Proc. of the 2017 Int’l Conf. on 3D Vision. Qingdao: IEEE, 2017. 506–516. [doi: 10.1109/3DV.2017.
00064]

259 260 261 262 263 264 265 266 267 268 269