Table 1. Comparisons of state-of-the-art LiDAR pretraining methods pretrained on nuScenes and fine-tuned on nuScenes, SemanticKITTI, and Waymo Open datasets, respectively, with specified data portions. LP denotes linear probing with frozen backbones. All scores are given in percentage (%).
Method | Venue | Backbone (2D) | Backbone (3D) | Expert | nuScenes | KITTI | Waymo | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | |||||
Random | - | - | - | - | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 |
PPKT | arXiv'21 | ResNet-50 | MinkUNet | Single | 35.90 | 37.80 | 53.74 | 60.25 | 67.14 | 74.52 | 44.00 | 47.60 |
SLidR | CVPR'22 | ResNet-50 | MinkUNet | Single | 38.80 | 38.30 | 52.49 | 59.84 | 66.91 | 74.79 | 44.60 | 47.12 |
ST-SLidR | CVPR'23 | ResNet-50 | MinkUNet | Single | 40.48 | 40.75 | 54.69 | 60.75 | 67.70 | 75.14 | 44.72 | 44.93 |
TriCC | CVPR'23 | ResNet-50 | MinkUNet | Single | 38.00 | 41.20 | 54.10 | 60.40 | 67.60 | 75.60 | 45.90 | - |
Seal | NeurIPS'23 | ResNet-50 | MinkUNet | Single | 44.95 | 45.84 | 55.64 | 62.97 | 68.41 | 75.60 | 46.63 | 49.34 |
CSC | CVPR'24 | ResNet-50 | MinkUNet | Single | 46.00 | 47.00 | 57.00 | 63.30 | 68.60 | 75.70 | 47.20 | - |
HVDistill | IJCV'24 | ResNet-50 | MinkUNet | Single | 39.50 | 42.70 | 56.60 | 62.90 | 69.30 | 76.60 | 49.70 | - |
SLidR | CVPR'22 | ViT-S | MinkUNet | Single | 44.70 | 41.16 | 53.65 | 61.47 | 66.71 | 74.20 | 44.67 | 47.57 |
+LiMoE | Ours | ViT-S | MinkUNet | Multi | 45.80 | 46.82 | 57.54 | 63.85 | 68.61 | 75.64 | 46.81 | 48.81 |
Seal | NeurIPS'23 | ViT-S | MinkUNet | Single | 45.16 | 44.27 | 55.13 | 62.46 | 67.64 | 75.58 | 46.51 | 48.67 |
SuperFlow | ECCV'24 | ViT-S | MinkUNet | Single | 46.44 | 47.81 | 59.44 | 64.47 | 69.20 | 76.54 | 47.97 | 49.94 |
+LiMoE | Ours | ViT-S | MinkUNet | Multi | 48.20 | 49.60 | 60.54 | 65.65 | 71.39 | 77.27 | 49.53 | 51.42 |
SLidR | CVPR'22 | ViT-B | MinkUNet | Single | 45.35 | 41.64 | 55.83 | 62.68 | 67.61 | 74.98 | 45.50 | 48.32 |
+LiMoE | Ours | ViT-B | MinkUNet | Multi | 46.56 | 46.89 | 58.09 | 63.87 | 69.02 | 75.87 | 47.96 | 49.50 |
Seal | NeurIPS'23 | ViT-B | MinkUNet | Single | 46.59 | 45.98 | 57.15 | 62.79 | 68.18 | 75.41 | 47.24 | 48.91 |
SuperFlow | ECCV'24 | ViT-B | MinkUNet | Single | 47.66 | 48.09 | 59.66 | 64.52 | 69.79 | 76.57 | 48.40 | 50.20 |
+LiMoE | Ours | ViT-B | MinkUNet | Multi | 49.07 | 50.23 | 61.51 | 66.17 | 71.56 | 77.81 | 50.30 | 51.77 |
SLidR | CVPR'22 | ViT-L | MinkUNet | Single | 45.70 | 42.77 | 57.45 | 63.20 | 68.13 | 75.51 | 47.01 | 48.60 |
+LiMoE | Ours | ViT-L | MinkUNet | Multi | 47.43 | 46.92 | 58.41 | 64.54 | 69.69 | 76.32 | 48.25 | 50.23 |
Seal | NeurIPS'23 | ViT-L | MinkUNet | Single | 46.81 | 46.27 | 58.14 | 63.27 | 68.67 | 75.66 | 47.55 | 50.02 |
SuperFlow | ECCV'24 | ViT-L | MinkUNet | Single | 48.01 | 49.95 | 60.72 | 65.09 | 70.01 | 77.19 | 49.07 | 50.67 |
+LiMoE | Ours | ViT-L | MinkUNet | Multi | 49.35 | 51.41 | 62.07 | 66.64 | 71.59 | 77.85 | 50.69 | 51.93 |
Table 2. Domain generalization study of different LiDAR pretraining methods pretrained on the nuScenes dataset and fine-tuned on a collection of seven different LiDAR semantic segmentation datasets, respectively, with specific data portions. All scores are given in percentage (%).
Method | Venue | ScribbleKITTI | RELLIS-3D | SemanticPOSS | SemanticSTF | SynLiDAR | DAPS-3D | Synth4D | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | 1% | 10% | ||
Random | - | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 | 20.22 | 66.87 |
PPKT | arXiv'21 | 36.50 | 51.67 | 49.71 | 54.33 | 50.18 | 56.00 | 50.92 | 54.69 | 37.57 | 46.48 | 78.90 | 84.00 | 61.10 | 62.41 |
SLidR | CVPR'22 | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 | 63.10 | 62.67 |
+LiMoE | Ours | 41.48 | 53.41 | 51.28 | 55.21 | 53.14 | 56.42 | 53.16 | 55.51 | 43.72 | 49.57 | 81.70 | 85.76 | 64.69 | 66.79 |
Seal | NeurIPS'23 | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 | 64.50 | 66.96 |
SuperFlow | ECCV'24 | 42.70 | 54.00 | 52.83 | 55.71 | 54.41 | 57.33 | 54.72 | 56.57 | 44.85 | 51.38 | 82.43 | 86.21 | 65.31 | 69.43 |
+LiMoE | Ours | 43.95 | 55.96 | 53.74 | 56.67 | 55.42 | 57.83 | 55.60 | 57.31 | 45.79 | 52.27 | 83.24 | 86.68 | 66.54 | 71.07 |
@article{xu2025limoe, title={LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes}, author={Xu, Xiang and Kong, Lingdong and Shuai, Hui and Pan, Liang and Liu, Ziwei and Liu, Qingshan}, journal={arXiv preprint arXiv:2501.04004}, year={2025} }