Page 212 - 《软件学报》2025年第5期

P. 212

2112 软件学报 2025 年第 36 卷第 5 期

the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 543–560. [doi: 10.1007/978-3-031-19824-3_32]
[27] Jing LL, Zhang L, Tian YL. Self-supervised feature learning by cross-modality and cross-view correspondences. In: Proc. of the 2021
IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops. Nashville: IEEE, 2021. 1581–1591. [doi: 10.1109/CVPRW
53098.2021.00174]
[28] Zhang RR, Wang LH, Qiao Y, Gao P, Li HS. Learning 3D representations from 2D pre-trained models via image-to-point masked
autoencoders. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023.
21769–21780. [doi: 10.1109/CVPR52729.2023.02085]
[29] Wang ZY, Yu XM, Rao YM, Zhou J, Lu JW. P2P: Tuning pre-trained image models for point cloud analysis with point-to-pixel
prompting. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans, 2022. 14388–14402.
[30] Dong RP, Qi ZK, Zhang LF, Zhang JB, Sun JJ, Ge Z, Yi L, Ma KS. Autoencoders as cross-modal teachers: Can pretrained 2D image
Transformers help 3D representation learning? In: Proc. of the 11th Int’l Conf. on Learning Representations. Kigali: OpenReview.net,
2023.
[31] Zhang RR, Guo ZY, Zhang W, Li KC, Miao XP, Cui B, Qiao Y, Gao P, Li HS. PointCLIP: Point cloud understanding by CLIP. In: Proc.
of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 8552–8562. [doi: 10.1109/
CVPR52688.2022.00836]
[32] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning
transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021.
8748–8763.
[33] Zhu XY, Zhang RR, He BW, Guo ZY, Zeng ZY, Qin ZP, Zhang SH, Gao P. PointCLIP V2: Prompting CLIP and GPT for powerful 3D
open-world learning. In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 2639–2650. [doi: 10.1109/
ICCV51070.2023.00249]
[34] Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information
Processing Systems. Vancouver: Curran Associates Inc., 2020. 159.
[35] Qi ZK, Dong RP, Fan GF, Ge Z, Zhang XY, Ma KS, Yi L. Contrast with reconstruct: Contrastive 3D representation learning guided by
generative pretraining. In: Proc. of the 40th Int’l Conf. on Machine Learning. Honolulu: JMLR.org, 2023. 1171.
[36] Chen HN, Zhu YY, Zhao JQ, Tian Q. 3D shape recognition based on multimodal relation modeling. Ruan Jian Xue Bao/Journal of
Software, 2024, 35(5): 2208–2219 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/7026.htm [doi: 10.13328/j.cnki.
jos.007026]
[37] Xie CL, Wang CX, Zhang B, Yang H, Chen D, Wen F. Style-based point generator with adversarial rendering for point cloud completion.
In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 4619–4628. [doi: 10.1109/
CVPR46437.2021.00459]
[38] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proc. of the 2019 IEEE/CVF Conf.
on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 4401–4410. [doi: 10.1109/CVPR.2019.00453]
[39] Zhou LQ, Du YL, Wu JJ. 3D shape generation and completion through point-voxel diffusion. In: Proc. of the 2021 IEEE/CVF Int’l Conf.
on Computer Vision. Montreal: IEEE, 2021. 5826–5835. [doi: 10.1109/ICCV48922.2021.00577]
[40] Pan L, Chen XY, Cai ZG, Zhang JZ, Zhao HY, Yi S, Liu ZW. Variational relational point completion network. In: Proc. of the 2021
IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8524–8533. [doi: 10.1109/CVPR46437.2021.
00842]
[41] Bardes A, Ponce J, LeCun Y. VICReg: Variance-invariance-covariance regularization for self-supervised learning. In: Proc. of the 10th
Int’l Conf. on Learning Representations. OpenReview.net, 2022.
[42] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S,
Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on
Learning Representations. OpenReview.net, 2021.
[43] Ma X, Qin C, You HX, Ran HX, Fu Y. Rethinking network design and local geometry in point cloud: A simple residual MLP framework.
In: Proc. of the 10th Int’l Conf. on Learning Representations. OpenReview.net, 2022.
[44] Qian GC, Li YC, Peng HW, Mai JJ, Hammoud H, Elhoseiny M, Ghanem B. PointNeXt: Revisiting PointNet++ with improved training
and scaling strategies. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans, 2022. 23192–23204.
[45] Sanghi A. Info3D: Representation learning on 3D objects using mutual information maximization and contrastive learning. In: Proc. of
the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 626–642. [doi: 10.1007/978-3-030-58526-6_37]
[46] Gadelha M, RoyChowdhury A, Sharma G, Kalogerakis E, Cao LL, Learned-Miller E, Wang R, Maji S. Label-efficient learning on point
clouds using approximate convex decompositions. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020.
473–491. [doi: 10.1007/978-3-030-58607-2_28]

207 208 209 210 211 212 213 214 215 216 217