Page 264 - 《软件学报》2025年第5期
P. 264

2164                                                       软件学报  2025  年第  36  卷第  5  期


                 型构造了一个以      DDA-STGConv  为基本单元的平行多尺度网络, 通过构造跨域邻接矩阵同时捕捉空域和时域的动
                 态判别特征, 利用注意力机制探索节点之间的关联关系, 增强节点间的信息交互. 此外, 通过设计图拓扑聚合函数,
                 构造基于不同图拓扑结构的平行多尺度子网络模块, 实现不同尺度节点特征的聚合, 有效提取骨架关节点局部和
                 全局的特征信息. 同时, 设计多尺度特征交叉融合模块                (MFEB), 加强平行网络间多尺度信息的交互, 增强网络模
                 型特征表示的能力. 在两大人体姿态估计数据集上与目前主流的方法进行对比, 实验结果表明所提出的网络模型
                 获得较好的姿态估计结果. PMST-GNet 模型的灵活性可为行为识别, 运动预测及场景理解等领域的工作提供技术
                 支持.


                 References:
                  [1]  Zhang  Y,  Wen  GZ,  Mi  SY,  Zhang  ML,  Geng  X.  Overview  on  2D  human  pose  estimation  based  on  deep  learning.  Ruan  Jian  Xue
                     Bao/Journal of Software, 2022, 33(11): 4173–4191 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6390.htm [doi:
                     10.13328/j.cnki.jos.006390]
                  [2]  Ding J, Shu XB, Huang P, Yao YZ, Song Y. Multimodal and multi-granularity graph convolutional networks for elderly daily activity
                     recognition. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2350–2364 (in Chinese with English abstract). http://www.jos.org.cn/
                     1000-9825/6439.htm [doi: 10.13328/j.cnki.jos.006439]
                  [3]  Yang HH, Liu HX, Zhang YM, Wu XJ. FMR-GNet: Forward mix-hop spatial-temporal residual graph network for 3D pose estimation.
                     Chinese Journal of Electronics, 2024, 33(6): 1–14.
                  [4]  Moon G, Lee KM. I2L-MeshNet: Image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB
                     image. In: Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 752–768. [doi: 10.1007/978-3-030-
                     58571-6_44]
                  [5]  Zhao L, Peng X, Tian Y, Kapadia M, Metaxas DN. Semantic graph convolutional networks for 3D human pose regression. In: Proc. of
                     the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 3420–3430. [doi: 10.1109/
                     CVPR.2019.00354]
                  [6]  Liu JF, Rojas J, Li YH, Liang ZJ, Guan YS, Xi N, Zhu HF. A graph attention spatio-temporal convolutional network for 3D human pose
                     estimation in video. In: Proc. of the 2021 IEEE Int’l Conf. on Robotics and Automation. Xi’an: IEEE, 2021. 3374–3380. [doi: 10.1109/
                     ICRA48506.2021.9561605]
                  [7]  Cai YJ, Ge LH, Liu J, Cai JF, Cham TJ, Yuan JS, Thalmann NM. Exploiting spatial-temporal relationships for 3D pose estimation via
                     graph convolutional networks. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 2272–2281.
                     [doi: 10.1109/ICCV.2019.00236]
                  [8]  Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning
                     Representations. Toulon: OpenReview.net, 2017.
                  [9]  Wu YP, Kong DH, Wang SF, Li JH, Yin BC. HPGCN: Hierarchical poselet-guided graph convolutional network for 3D pose estimation.
                     Neurocomputing, 2022, 487: 243–256. [doi: 10.1016/j.neucom.2021.11.007]
                 [10]  Wang WG, Shen JB, Jia YD. Review of visual attention detection. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 416–439 (in
                     Chinese with English abstract). http://www.jos.org.cn/1000-9825/5636.htm [doi: 10.13328/j.cnki.jos.005636]
                 [11]  Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
                     31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
                 [12]  Pavllo D, Feichtenhofer C, Grangier D, Auli M. 3D human pose estimation in video with temporal convolutions and semi-supervised
                     training.  In:  Proc.  of  the  2019  IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern  Recognition  (CVPR).  Long  Beach:  IEEE,  2019.
                     7745–7754. [doi: 10.1109/CVPR.2019.00794]
                 [13]  Sun  K,  Xiao  B,  Liu  D,  Wang  JD.  Deep  high-resolution  representation  learning  for  human  pose  estimation.  In:  Proc.  of  the  2019
                     IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5686–5696. [doi: 10.1109/CVPR.2019.00584]
                 [14]  Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in
                     natural environments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325–1339. [doi: 10.1109/TPAMI.2013.
                     248]
                 [15]  Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu WP, Theobalt C. Monocular 3D human pose estimation in the wild using
                     improved CNN supervision. In: Proc. of the 2017 Int’l Conf. on 3D Vision. Qingdao: IEEE, 2017. 506–516. [doi: 10.1109/3DV.2017.
                     00064]
   259   260   261   262   263   264   265   266   267   268   269