Page 475 - 《软件学报》2024年第4期

P. 475

何建航等: 基于人体和场景上下文的多人 3D 姿态估计 2053

噪声传播, 提高模型的相对姿态恢复能力. 自底向上分支从鸟瞰平面而并非图像平面提取场景上下文获得仿三维空间
人体位置布局, 通过融合人体和场景上下文可靠预测人体绝对深度. 在 MuPoTS-3D 和 Human3.6M 数据集上进行了
广泛的对比实验, 结果表明: 较当前先进模型, 本文多人场景 3D 姿态估计模型 HSC-Pose 性能更优, 而且计算复杂度
明显降低. 但是本文模型 HSC-Pose 的可预测深度目前限于 100 m, 下一步工作将增强预测更远目标姿态的模型泛化
能力. 此外, 本文工作目前限于单帧图像, 今后将着力基于多视角和多帧图像的多人场景 3D 人体姿态估计研究.

References:
[1] Yang B, Li HP, Zeng H. Three-dimensional human pose estimation based on video. Journal of Beijing University of Aeronautics and
Astronautics, 2019, 45(12): 2463–2469 (in Chinese with English abstract). [doi: 10.13700/j.bh.1001-5965.2019.0384]
[2] Guo Y, Ma LC, Li Z, Wang X, Wang F. Monocular 3D multi-person pose estimation via predicting factorized correction factors.
Computer Vision and Image Understanding, 2021, 213: 103278. [doi: 10.1016/j.cviu.2021.103278]
[3] Moon G, Chang JY, Lee KM. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image.
In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 10132–10141. [doi: 10.1109/iccv.2019.01023]
[4] Zhen JN, Fang Q, Sun JM, Liu WT, Jiang W, Bao HJ, Zhou XW. SMAP: Single-shot multi-person absolute 3D pose estimation. In: Proc.
of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 550–566. [doi: 10.1007/978-3-030-58555-6_33]
[5] Dabral R, Gundavarapu NB, Mitra R, Sharma A, Ramakrishnan G, Jain A. Multi-person 3D human pose estimation from monocular
images. In: Proc. of the 2019 Int’l Conf. on 3D Vision. Los Alamitos: IEEE Computer Society, 2019. 405–414. [doi: 10.1109/3dv.2019.
00052]
[6] Wang ZT, Nie XC, Qu XC, Chen YP, Liu S. Distribution-aware single-stage models for multi-person 3D pose estimation. In: Proc. of the
2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 13086–13195. [doi: 10.1109/CVPR
52688.2022.01275]
[7] Cheng Y, Wang B, Yang B, Tan RT. Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks. In:
Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 7645–7655. [doi: 10.1109/
cvpr46437.2021.00756]
[8] Wang C, Li JF, Liu WT, Qian C, Lu C. HMOR: Hierarchical multi-person ordinal relations for monocular multi-person 3D pose
estimation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 242–259. [doi: 10.1007/978-3-030-58580-
8_15]
[9] Tian Z, Chen H, Shen CH. DirectPose: Direct end-to-end multi-person pose estimation. arXiv:1911.07451, 2019.
[10] Lin JH, Lee GH. HDNet: Human depth estimation for multi-person camera-space localization. In: Proc. of the 16th European Conf. on
Computer Vision. Glasgow: Springer, 2020. 633–648. [doi: 10.1007/978-3-030-58523-5_37]
[11] Zou ZM, Tang W. Modulated graph convolutional network for 3D human pose estimation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on
Computer Vision. Montreal: IEEE, 2021. 11457–11467. [doi: 10.1109/ICCV48922.2021.01128]
[12] Reading C, Harakeh A, Chae J, Waslander SL. Categorical depth distribution network for monocular 3D object detection. In: Proc. of the
2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8551–8560. [doi: 10.1109/cvpr46437.2021.
00845]
[13] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision
and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/cvpr.2016.90]
[14] Geng ZG, Sun K, Xiao B, Zhang ZX, Wang JD. Bottom-up human pose estimation via disentangled keypoint regression. In: Proc. of the
2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 14671–14681. [doi: 10.1109/cvpr46437.
2021.01444]
[15] Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K. Spatial transformer networks. In: Proc. of the 28th Int ’l Conf. on Neural
Information Processing Systems. Montreal: MIT, 2015. 2017–2025.
[16] He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE,
2017. 2980–2988. [doi: 10.1109/iccv.2017.322]
[17] Xu XX, Zou Q, Lin X. Adaptive hypergraph neural network for multi-person pose estimation. In: Proc. of the 36th AAAI Conf. on
Artificial Intelligence. AAAI, 2022. 2955–2963. [doi: 10.1609/aaai.v36i3.20201]
[18] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern
Recognition. Salt Lake City: IEEE, 2018. 7132–7141. [doi: 10.1109/cvpr.2018.00745]
[19] Mao WA, Tian Z, Wang XL, Shen CH. FCPose: Fully convolutional multi-person pose estimation with dynamic instance-aware

470 471 472 473 474 475 476 477 478 479 480