Page 194 - 《软件学报》2025年第5期
P. 194

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(5):2094−2113 [doi: 10.13328/j.cnki.jos.007187] [CSTR: 32375.14.jos.007187]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                                 *
                 基于双向拟合掩码重建的多模态自监督点云表示学习

                 程浩喆,    祝继华,    史鹏程,    胡乃文,    谢奕凡,    李仕奇


                 (西安交通大学 软件学院, 陕西 西安 710049)
                 通信作者: 祝继华, E-mail: zhujh@xjtu.edu.cn

                 摘 要: 点云自监督表示学习以无标签预训练的方式, 探索三维拓扑几何空间结构关系并捕获特征表示, 可应用至
                 点云分类、分割以及物体探测等下游任务. 为提升预训练模型的泛化性和鲁棒性, 提出基于双向拟合掩码重建的
                 多模态自监督点云表示学习方法, 主要由             3  部分构成: (1) 逆密度尺度指导下的“坏教师”模型通过基于逆密度噪声
                 表示和全局特征表示的双向拟合策略, 加速掩码区域逼近真值. (2) 基于                    StyleGAN  的辅助点云生成模型以局部几
                 何信息为基础, 生成风格化点云并与掩码重建结果在阈值约束下融合, 旨在抵抗重建过程噪声对表示学习的不良
                 影响. (3) 多模态教师模型以增强三维特征空间多样性及防止模态信息崩溃为目标, 依靠三重特征对比损失函数,
                 充分汲取点云-图像-文本样本空间中所蕴含的潜层信息. 所提出的方法在                       ModelNet、ScanObjectNN  和  ShapeNet
                 这  3  种点云数据集上进行微调任务测试. 实验结果表明, 预训练模型在点云分类、线性支持向量机分类、小样本
                 分类、零样本分类以及部件分割等点云识别任务上的效果达到领先水平.
                 关键词: 三维点云; 自监督表示学习; 多模态特征; 密度尺度; 生成对抗网络
                 中图法分类号: TP18

                 中文引用格式: 程浩喆, 祝继华, 史鹏程, 胡乃文, 谢奕凡, 李仕奇. 基于双向拟合掩码重建的多模态自监督点云表示学习. 软件学
                 报, 2025, 36(5): 2094–2113. http://www.jos.org.cn/1000-9825/7187.htm
                 英文引用格式: Cheng HZ, Zhu JH, Shi PC, Hu NW, Xie YF, Li SQ. Multi-modal Self-supervised Point Cloud Representation Learning
                 Based on Bidirectional Fit Mask Reconstruction. Ruan Jian Xue Bao/Journal of Software, 2025, 36(5): 2094–2113 (in Chinese). http://
                 www.jos.org.cn/1000-9825/7187.htm

                 Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit
                 Mask Reconstruction

                 CHENG Hao-Zhe, ZHU Ji-Hua, SHI Peng-Cheng, HU Nai-Wen, XIE Yi-Fan, LI Shi-Qi
                 (School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China)
                 Abstract:  Point  cloud  self-supervised  representation  learning  is  conducted  in  an  unlabeled  pre-training  manner,  exploring  the  structural
                 relationships  of  3D  topological  geometric  spaces  and  capturing  feature  representations.  This  approach  can  be  applied  to  downstream  tasks,
                 such  as  point  cloud  classification,  segmentation,  and  object  detection.  To  enhance  the  generalization  and  robustness  of  the  pretrained
                 models,  this  study  proposes  a  multi-modal  self-supervised  method  for  learning  point  cloud  representations.  The  method  is  based  on
                 bidirectional  fit  mask  reconstruction  and  comprises  three  main  components:  (1)  The  “bad  teacher”  model,  guided  by  the  inverse  density
                 scale,  employs  a  bidirectional  fit  strategy  that  utilizes  inverse  density  noise  representation  and  global  feature  representation  to  expedite  the
                 convergence of the mask region towards the true value. (2) The StyleGAN-based auxiliary point cloud generation model, grounded in local
                 geometric  information,  generates  stylized  point  clouds  and  fuses  them  with  mask  reconstruction  results  while  adhering  to  threshold
                 constraints.  The  objective  is  to  mitigate  the  adverse  effects  of  noise  on  representation  learning  during  the  reconstruction  process.  (3)  The
                 multi-modal  teacher  model  aims  to  enhance  the  diversity  of  the  3D  feature  space  and  prevent  the  collapse  of  modal  information.  It  relies
                 on  the  triple  feature  contrast  loss  function  to  fully  extract  the  latent  information  contained  in  the  point  cloud-image-text  sample  space.  The


                 *    基金项目: 陕西省重点研发项目  (2021GY-025, 2021GXLHZ-097)
                  收稿时间: 2023-11-02; 修改时间: 2024-03-15; 采用时间: 2024-03-26; jos 在线出版时间: 2024-09-11
                  CNKI 网络首发时间: 2024-09-12
   189   190   191   192   193   194   195   196   197   198   199