Page 475 - 《软件学报》2024年第4期
P. 475

何建航 等: 基于人体和场景上下文的多人            3D  姿态估计                                        2053


                 噪声传播, 提高模型的相对姿态恢复能力. 自底向上分支从鸟瞰平面而并非图像平面提取场景上下文获得仿三维空间
                 人体位置布局, 通过融合人体和场景上下文可靠预测人体绝对深度. 在                    MuPoTS-3D  和  Human3.6M  数据集上进行了
                 广泛的对比实验, 结果表明: 较当前先进模型, 本文多人场景               3D  姿态估计模型    HSC-Pose 性能更优, 而且计算复杂度
                 明显降低. 但是本文模型       HSC-Pose 的可预测深度目前限于       100 m, 下一步工作将增强预测更远目标姿态的模型泛化
                 能力. 此外, 本文工作目前限于单帧图像, 今后将着力基于多视角和多帧图像的多人场景                        3D  人体姿态估计研究.


                 References:
                  [1]  Yang B, Li HP, Zeng H. Three-dimensional human pose estimation based on video. Journal of Beijing University of Aeronautics and
                     Astronautics, 2019, 45(12): 2463–2469 (in  Chinese  with  English  abstract). [doi: 10.13700/j.bh.1001-5965.2019.0384]
                  [2]  Guo  Y,  Ma  LC,  Li  Z,  Wang  X,  Wang  F.  Monocular  3D  multi-person  pose  estimation  via  predicting  factorized  correction  factors.
                     Computer Vision and Image Understanding, 2021, 213: 103278. [doi: 10.1016/j.cviu.2021.103278]
                  [3]  Moon G, Chang JY, Lee KM. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image.
                     In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 10132–10141. [doi: 10.1109/iccv.2019.01023]
                  [4]  Zhen JN, Fang Q, Sun JM, Liu WT, Jiang W, Bao HJ, Zhou XW. SMAP: Single-shot multi-person absolute 3D pose estimation. In: Proc.
                     of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 550–566. [doi: 10.1007/978-3-030-58555-6_33]
                  [5]  Dabral R, Gundavarapu NB, Mitra R, Sharma A, Ramakrishnan G, Jain A. Multi-person 3D human pose estimation from monocular
                     images. In: Proc. of the 2019 Int’l Conf. on 3D Vision. Los Alamitos: IEEE Computer Society, 2019. 405–414. [doi: 10.1109/3dv.2019.
                     00052]
                  [6]  Wang ZT, Nie XC, Qu XC, Chen YP, Liu S. Distribution-aware single-stage models for multi-person 3D pose estimation. In: Proc. of the
                     2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 13086–13195. [doi: 10.1109/CVPR
                     52688.2022.01275]
                  [7]  Cheng Y, Wang B, Yang B, Tan RT. Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks. In:
                     Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 7645–7655. [doi: 10.1109/
                     cvpr46437.2021.00756]
                  [8]  Wang  C,  Li  JF,  Liu  WT,  Qian  C,  Lu  C.  HMOR:  Hierarchical  multi-person  ordinal  relations  for  monocular  multi-person  3D  pose
                     estimation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 242–259. [doi: 10.1007/978-3-030-58580-
                     8_15]
                  [9]  Tian Z, Chen H, Shen CH. DirectPose: Direct end-to-end multi-person pose estimation. arXiv:1911.07451, 2019.
                 [10]  Lin JH, Lee GH. HDNet: Human depth estimation for multi-person camera-space localization. In: Proc. of the 16th European Conf. on
                     Computer Vision. Glasgow: Springer, 2020. 633–648. [doi: 10.1007/978-3-030-58523-5_37]
                 [11]  Zou ZM, Tang W. Modulated graph convolutional network for 3D human pose estimation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on
                     Computer Vision. Montreal: IEEE, 2021. 11457–11467. [doi: 10.1109/ICCV48922.2021.01128]
                 [12]  Reading C, Harakeh A, Chae J, Waslander SL. Categorical depth distribution network for monocular 3D object detection. In: Proc. of the
                     2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8551–8560. [doi: 10.1109/cvpr46437.2021.
                     00845]
                 [13]  He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision
                     and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/cvpr.2016.90]
                 [14]  Geng ZG, Sun K, Xiao B, Zhang ZX, Wang JD. Bottom-up human pose estimation via disentangled keypoint regression. In: Proc. of the
                     2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 14671–14681. [doi: 10.1109/cvpr46437.
                     2021.01444]
                 [15]  Jaderberg  M,  Simonyan  K,  Zisserman  A,  Kavukcuoglu  K.  Spatial  transformer  networks.  In:  Proc.  of  the  28th  Int ’l  Conf.  on  Neural
                     Information Processing Systems. Montreal: MIT, 2015. 2017–2025.
                 [16]  He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE,
                     2017. 2980–2988. [doi: 10.1109/iccv.2017.322]
                 [17]  Xu XX, Zou Q, Lin X. Adaptive hypergraph neural network for multi-person pose estimation. In: Proc. of the 36th AAAI Conf. on
                     Artificial Intelligence. AAAI, 2022. 2955–2963. [doi: 10.1609/aaai.v36i3.20201]
                 [18]  Hu  J,  Shen  L,  Sun  G.  Squeeze-and-excitation  networks.  In:  Proc.  of  the  2018  IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern
                     Recognition. Salt Lake City: IEEE, 2018. 7132–7141. [doi: 10.1109/cvpr.2018.00745]
                 [19]  Mao  WA,  Tian  Z,  Wang  XL,  Shen  CH.  FCPose:  Fully  convolutional  multi-person  pose  estimation  with  dynamic  instance-aware
   470   471   472   473   474   475   476   477   478   479   480