Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu1,*    Lingdong Kong1,2,*    Jun Cen3    Runnan Chen4    Wenwei Zhang1,5
Liang Pan5    Kai Chen1    Ziwei Liu5
1Shanghai AI Laboratory   2National University of Singapore   3The Hong Kong University of Science and Technology   4The University of Hong Kong  
5S-Lab, Nanyang Technological University  
pipeline image

Abstract

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: (i) Scalability: VFMs are directly distilled into point clouds, eliminating the need for annotations in either 2D or 3D during pretraining. (ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment stages, facilitating cross-modal representation learning. (iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear probing, surpassing random initialization by 36.9% mIoU and outperforming prior arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.


2D-3D Correspondence

pipeline image


Framework

pipeline image


Cosine Similarity

pipeline image


Linear Probing

pipeline image


Downstream Task

pipeline image



BibTeX

@article{liu2023segment,
    title = {Segment Any Point Cloud Sequences by Distilling Vision Foundation Models},
    author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    journal = {arXiv preprint arXiv:2306.09347}, 
    year = {2023},
}