Page 304 - 《软件学报》2025年第4期

P. 304

1710 软件学报 2025 年第 36 卷第 4 期

(4) MVS 公开数据集的数量有限且评价指标不一致. 现有的实拍数据集由于受到设备限制, 可用于训练测试
的数量有限同时场景也有局限性, 无法支撑具有更强泛化性的通用模型训练, 也无法获得如天空等难以重建区域
的标签. 而合成数据集虽然降低了采样的成本, 但无法真实地反映自然图像的光照效果和噪声. 同时, 多种类型的
数据集也造成了不同数据集下的评估指标不一致的问题. 如果能够创建更具有全面性、真实性和多样性的大规模
数据集供模型训练和测试, 预期三维重建的性能指标会进一步提升.
随着计算机技术的快速发展和模型的不断完善, 这些年也涌现出如 Transformer 等功能更强大的神经网络结
构. NeRF 和 GS 的出现也使场景表征技术得到了跨越式发展, 与之结合的多视角三维重建具有巨大的发展潜力.
此外, 如何创建更为广泛、通用的标准数据集并设立一致的评估指标也是之后发展的方向. 可以预见, 未来多视图
立体视觉发展会更加成熟, 也必将应用于更多领域.

References:
[1] Peng Y, Wang AD, Wang TT, Li JL, Wang ZQ, Zhao Y, Wang ZL, Zhao Z. Three-dimensional reconstruction of carp brain tissue and
brain electrodes for biological control. Journal of Biomedical Engineering, 2020, 37(5): 885–891 (in Chinese with English abstract). [doi:
10.7507/1001-5515.201911011]

[2] Nicholson DT, Chalk C, Funnell WRJ, Daniel SJ. Can virtual reality improve anatomy education? A randomised controlled study of a
computer-generated three-dimensional anatomical ear model. Medical Education, 2006, 40(11): 1081–1087. [doi: 10.1111/j.1365-2929.
2006.02611.x]
[3] Qu YF, Huang JY, Zhang X. Rapid 3D reconstruction for image sequence acquired from UAV camera. Sensors, 2018, 18(1): 225. [doi:
10.3390/s18010225]
[4] Carvajal-Ramírez F, Navarro-Ortega AD, Agüera-Vega F, Martínez-Carricondo P, Mancini F. Virtual reconstruction of damaged
archaeological sites based on unmanned aerial vehicle photogrammetry and 3D modelling. Study case of a southeastern Iberia production
area in the Bronze Age. Measurement, 2019, 136: 225–236. [doi: 10.1016/j.measurement.2018.12.092]
[5] Gao ZP, Zhai GT, Deng HW, Yang XK. Extended geometric models for stereoscopic 3D with vertical screen disparity. Displays, 2020,
65: 101972. [doi: 10.1016/j.displa.2020.101972]
[6] Wang X, Wang C, Liu B, Zhou XQ, Zhang L, Zheng J, Bai X. Multi-view stereo in the deep learning era: A comprehensive review.
Displays, 2021, 70: 102102. [doi: 10.1016/j.displa.2021.102102]
®
[7] Furukawa Y, Hernández C. Multi-view stereo: A tutorial. Foundations and Trends in Computer Graphics and Vision, 2015, 9(1–2):
1–148. [doi: 10.1561/0600000052]
[8] Žbontar J, LeCun Y. Computing the stereo matching cost with a convolutional neural network. In: Proc. of the 2015 IEEE Conf. on
Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1592–1599. [doi: 10.1109/CVPR.2015.7298767]
[9] Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. In: Proc. of the 2015 IEEE Conf. on
Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 4353–4361. [doi: 10.1109/CVPR.2015.7299064]
[10] Han XF, Leung T, Jia YQ, Sukthankar R, Berg AC. MatchNet: Unifying feature and metric learning for patch-based matching. In: Proc.
of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 3279–3286. [doi: 10.1109/CVPR.2015.
7298948]
[11] Murphy K, Schölkopf B, Žbontar J, LeCun Y. Stereo matching by training a convolutional neural network to compare image patches. The
Journal of Machine Learning Research, 2016, 17(1): 2287–2318.
[12] Güney F, Geiger A. Displets: Resolving stereo ambiguities using object knowledge. In: Proc. of the 2015 IEEE Conf. on Computer Vision
and Pattern Recognition. Boston: IEEE, 2015. 4165–4175. [doi: 10.1109/CVPR.2015.7299044]
[13] Luo WJ, Schwing AG, Urtasun R. Efficient deep learning for stereo matching. In: Proc. of the 2016 IEEE Conf. on Computer Vision and
Pattern Recognition. Las Vegas: IEEE, 2016. 5695–5703. [doi: 10.1109/CVPR.2016.614]
[14] Ji MQ, Gall J, Zheng HT, Liu YB, Fang L. SurfaceNet: An end-to-end 3D neural network for multiview stereopsis. In: Proc. of the 2017
IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2326–2334. [doi: 10.1109/ICCV.2017.253]
[15] Yao Y, Luo ZX, Li SW, Fang T, Quan L. MVSNet: Depth inference for unstructured multi-view stereo. In: Proc. of the 15th European
Conf. on Computer Vision. Munich: Springer, 2018. 785–801. [doi: 10.1007/978-3-030-01237-3_47]
[16] Khot T, Agrawal S, Tulsiani S, Mertz C, Lucey S, Hebert M. Learning unsupervised multi-view stereopsis via robust photometric
consistency. arXiv:1905.02706, 2019.
[17] Chen AP, Xu ZX, Zhao FQ, Zhang XS, Xiang FB, Yu JY, Su H. MVSNeRF: Fast generalizable radiance field reconstruction from multi-

299 300 301 302 303 304 305 306 307 308 309