Page 140 - 《软件学报》2024年第4期

P. 140

1718 软件学报 2024 年第 35 卷第 4 期

and complexity, making it difficult to achieve efficient and low-latency real-time microscopic 3D shape reconstruction. In response to this
situation, this study proposes a grouping parallelism lightweight real-time microscopic 3D shape reconstruction method GPLWS-Net. The
GPLWS-Net constructs a lightweight backbone network based on a U-shaped network and accelerates the 3D shape reconstruction process
with parallel group-querying. In addition, the neural network structure is re-parameterized to avoid the accuracy loss of reconstructing the
microstructure. Furthermore, to supplement the lack of existing microscopic 3D reconstruction datasets, this study publicly releases a set
of multi-focus microscopic 3D reconstruction dataset called Micro 3D. The label data uses multi-modal data fusion to obtain a high-
precision 3D structure of the scene. The results show that the GPLWS-Net network can not only guarantee the reconstruction accuracy, but
also reduce the average time of 39.15% in the three groups of public datasets and 50.55% in the Micro 3D dataset compared with the other
five types of deep learning-based methods, which can achieve real-time 3D shape reconstruction of complex microscopic scenes.
Key words: microscopic 3D shape reconstruction; lightweight neural network; group parallelism

微观三维形貌重建作为三维重建领域的重要分支, 广泛应用于精密制造质量控制、新材料结构分析、生
[1]
物观测鉴别等领域 . 现有的微观三维形貌重建方法包括主动光学与被动光学两大类: 典型的主动光学方法
包括激光共聚焦与白光干涉等, 但这类方法需要昂贵的硬件设备支撑, 难以进行大规模工业应用; 被动光学
以多聚焦图像三维形貌重建为代表, 主要通过微米级光学成像技术从多聚焦图像序列中恢复场景的三维结
[2]
构, 较高的重建效率与较低的硬件成本使其广受学术与工业界关注 .
[3]
现有的多聚焦图像三维形貌重建主要分为模型设计与数据驱动两大类 . 模型设计类方法旨在通过设计
聚焦测量算子评价图像序列的聚焦水平, 然后选择图像序列中聚焦水平最大值所在帧聚合为场景的深度信
息. 因此, 聚焦测量算子设计的优劣是决定模型类设计方法是否有效的关键, 而现有的聚焦测量算子更擅于
解决富纹理场景的重建问题, 无法实现弱纹理或低对比度场景的精确重建, 其场景偏向性导致模型设计类方
法普遍缺乏良好的场景适应性. 数据驱动类方法以基于深度学习的多聚焦图像三维形貌重建为代表, 可直接
[4]
通过多聚焦图像序列学习得到场景的深度信息 . 但现有的深度学习类方法主要围绕宏观场景展开, 由于宏
观场景通常具有低分辨率与稀疏采样的特点, 加之这类场景的数据规模较小, 针对这类深度网络模型的研究
通常难以解决微观场景高分辨率稠密数据产生的计算负担和受限资源条件下网络推理时间增多等问题.
[5]
现阶段, 构建更深更大的卷积神经网络(CNNs)逐渐成为多聚焦图像三维形貌重建领域的发展趋势 . 目
前, 主流的深度网络模型通常有上百层卷积操作和数千个通道进行运算, 这些网络的运算量(FLOPs)通常达到
数百万甚至几千万次, 从输入图像序列到三维结构的一次推理过程往往需要较长时间. 图 1 为 5 种先进的深
[5]
[5]
[6]
度学习多聚焦图像三维形貌重建算法 FVNet(2022/CVPR) , DFVNet(2022/CVPR) , DDFF(2018/ACCV) ,
[8]
[7]
DefocusNet(2020/CVPR) 和 AiFDepthNet(2021/ICCV) 分别在 128×128×10, 256×256×10, 512×512×10 与
1024×1024×10 这 4 种不同尺度的输入数据中运算耗时比较. 由图 1 可知, 上述所有方法的推理耗时均随着输
入数据量的增加而增多. 这种高耗时导致其在解决高分辨率稠密数据的微观场景重建问题时会出现推理时间
增大与计算复杂度增加等问题, 因此迫切需要从网络模型的轻量化角度探索实时微观三维形貌重建新模型.

图 1 5 种典型的深度学习多聚焦图像三维形貌重建算法在不同尺度输入数据中的运算耗时结果

135 136 137 138 139 140 141 142 143 144 145