Figure 1. Well-calibrated 3D scene understanding models are anticipated to deliver low uncertainties when predictions are accurate and high uncertainties when predictions are inaccurate. Existing 3D models (UnCal) and prior calibration methods struggled to provide proper uncertainty estimates. Our proposed depth-aware scaling (DeptS) is capable of outputting accurate estimates, highlighting its potential for real-world usage. The plots shown are the point-wise expected calibration error (ECE) rates. The colormap goes from dark to light, denoting low and high error rates, respectively. Best viewed in colors.
Table 1. The expected calibration error (ECE, the lower the better) of state-of-theart 3D scene understanding models on nuScenes and SemanticKITTI. UnCal, TempS, LogiS, DiriS, MetaC, and DeptS denote the uncalibrated, temperature, logistic, Dirichlet, meta, and our proposed depth-aware scaling calibrations, respectively.
Method | Modal | UnCal | TempS | LogiS | DiriS | MetaC | DeptS | UnCal | TempS | LogiS | DiriS | MetaC | DeptS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RangeNet++ | Range 🔵 | 4.57% | 2.74% | 2.79% | 2.73% | 2.78% | 2.61% | 4.01% | 3.12% | 3.16% | 3.59% | 2.38% | 2.33% |
SalsaNext | Range 🔵 | 3.27% | 2.59% | 2.58% | 2.57% | 2.52% | 2.42% | 5.37% | 4.29% | 4.31% | 4.11% | 3.35% | 3.19% |
FIDNet | Range 🔵 | 4.89% | 3.35% | 2.89% | 2.61% | 4.55% | 4.33% | 5.89% | 4.04% | 4.15% | 3.82% | 3.25% | 3.14% |
CENet | Range 🔵 | 4.44% | 2.47% | 2.53% | 2.58% | 2.70% | 2.44% | 5.95% | 3.93% | 3.79% | 4.28% | 3.31% | 3.09% |
RangeViT | Range 🔵 | 2.52% | 2.50% | 2.57% | 2.56% | 2.46% | 2.38% | 5.47% | 3.16% | 4.84% | 8.80% | 3.14% | 3.07% |
RangeFormer | Range 🔵 | 2.44% | 2.40% | 2.41% | 2.44% | 2.27% | 2.15% | 3.99% | 3.67% | 3.70% | 3.69% | 3.55% | 3.30% |
FRNet | Range 🔵 | 2.27% | 2.24% | 2.22% | 2.28% | 2.22% | 2.17% | 3.46% | 3.53% | 3.54% | 3.49% | 2.83% | 2.75% |
PolarNet | BEV 🔴 | 4.21% | 2.47% | 2.54% | 2.59% | 2.56% | 2.45% | 2.78% | 3.54% | 3.71% | 3.70% | 2.67% | 2.59% |
MinkUNet18 | Voxel 🟡 | 2.45% | 2.34% | 2.34% | 2.42% | 2.29% | 2.23% | 3.04% | 3.01% | 3.08% | 3.30% | 2.69% | 2.63% |
MinkUNet34 | Voxel 🟡 | 2.50% | 2.38% | 2.38% | 2.53% | 2.32% | 2.24% | 4.11% | 3.59% | 3.62% | 3.63% | 2.81% | 2.73% |
Cylinder3D | Voxel 🟡 | 3.19% | 2.58% | 2.62% | 2.58% | 2.39% | 2.29% | 5.49% | 4.36% | 4.48% | 4.42% | 3.40% | 3.09% |
SpUNet18 | Voxel 🟡 | 2.58% | 2.41% | 2.46% | 2.59% | 2.36% | 2.25% | 3.77% | 3.47% | 3.44% | 3.61% | 3.37% | 3.21% |
SpUNet34 | Voxel 🟡 | 2.60% | 2.52% | 2.47% | 2.66% | 2.41% | 2.29% | 4.41% | 4.33% | 4.34% | 4.39% | 4.20% | 4.11% |
RPVNet | Fusion 🟢 | 2.81% | 2.70% | 2.73% | 2.79% | 2.68% | 2.60% | 4.67% | 4.12% | 4.23% | 4.26% | 4.02% | 3.75% |
2DPASS | Fusion 🟢 | 2.74% | 2.53% | 2.51% | 2.51% | 2.62% | 2.46% | 2.32% | 2.35% | 2.45% | 2.30% | 2.73% | 2.27% |
SPVCNN18 | Fusion 🟢 | 2.57% | 2.44% | 2.49% | 2.54% | 2.40% | 2.31% | 3.46% | 2.90% | 3.07% | 3.41% | 2.36% | 2.32% |
SPVCNN34 | Fusion 🟢 | 2.61% | 2.49% | 2.54% | 2.61% | 2.37% | 2.28% | 3.61% | 3.03% | 3.07% | 3.10% | 2.99% | 2.86% |
CPGNet | Fusion 🟢 | 3.33% | 3.11% | 3.17% | 3.15% | 3.07% | 2.98% | 3.93% | 3.81% | 3.83% | 3.78% | 3.70% | 3.59% |
GFNet | Fusion 🟢 | 2.88% | 2.71% | 2.70% | 2.73% | 2.55% | 2.41% | 3.07% | 3.01% | 2.99% | 3.05% | 2.88% | 2.73% |
UniSeg | Fusion 🟢 | 2.76% | 2.61% | 2.63% | 2.65% | 2.45% | 2.37% | 3.93% | 3.73% | 3.78% | 3.67% | 3.51% | 3.43% |
KPConv | Point ⚫ | 3.37% | 3.27% | 3.34% | 3.32% | 3.28% | 3.20% | 4.97% | 4.88% | 4.90% | 4.91% | 4.78% | 4.68% |
PIDS1.25× | Point ⚫ | 3.46% | 3.40% | 3.43% | 3.41% | 3.37% | 3.28% | 4.77% | 4.65% | 4.66% | 4.64% | 4.57% | 4.49% |
PIDS2.0× | Point ⚫ | 3.53% | 3.47% | 3.49% | 3.51% | 3.34% | 3.27% | 4.91% | 4.83% | 4.72% | 4.89% | 4.66% | 4.47% |
PTv2 | Point ⚫ | 2.42% | 2.34% | 2.46% | 2.55% | 2.48% | 2.19% | 4.95% | 4.78% | 4.71% | 4.94% | 4.69% | 4.62% |
WaffleIron | Point ⚫ | 4.01% | 2.65% | 3.06% | 2.59% | 2.54% | 2.46% | 3.91% | 2.57% | 2.86% | 2.67% | 2.58% | 2.51% |
Table 2. The expected calibration error (ECE) of MinkUNet under eight domainshift scenarios from nuScenes-C and SemanticKITTI-C in the Robo3D benchmark. UnCal, TempS, LogiS, DiriS, MetaC, and DeptS denote the uncalibrated, temperature, logistic, Dirichlet, meta, and our depth-aware scaling calibrations, respectively.
Type | nuScenes-C | SemanticKITTI-C | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
UnCal | TempS | LogiS | DiriS | MetaC | DeptS | UnCal | TempS | LogiS | DiriS | MetaC | DeptS | |
Clean | 2.45% | 2.34% | 2.34% | 2.42% | 2.29% | 2.23% | 3.04% | 3.01% | 3.08% | 3.30% | 2.69% | 2.63% |
Fog | 5.52% | 5.42% | 5.49% | 5.43% | 4.77% | 4.72% | 12.66% | 12.55% | 12.67% | 12.48% | 11.08% | 10.94% |
Wet Ground | 2.63% | 2.54% | 2.54% | 2.64% | 2.55% | 2.52% | 3.55% | 3.46% | 3.54% | 3.72% | 3.33% | 3.28% |
Snow | 13.79% | 13.32% | 13.53% | 13.59% | 11.37% | 11.31% | 7.10% | 6.96% | 6.95% | 7.26% | 5.99% | 5.63% |
Motion Blur | 9.54% | 9.29% | 9.37% | 9.01% | 8.32% | 8.29% | 11.31% | 11.16% | 11.24% | 12.13% | 9.00% | 8.97% |
Beam Missing | 2.58% | 2.48% | 2.49% | 2.57% | 2.53% | 2.47% | 2.87% | 2.83% | 2.84% | 2.98% | 2.83% | 2.79% |
Crosstalk | 13.64% | 13.00% | 12.97% | 13.44% | 9.98% | 9.73% | 4.93% | 4.83% | 4.86% | 4.81% | 3.54% | 3.48% |
Incomplete Echo | 2.44% | 2.33% | 2.33% | 2.42% | 2.32% | 2.21% | 3.21% | 3.19% | 3.25% | 3.48% | 2.84% | 2.19% |
Cross Sensor | 4.25% | 4.15% | 4.20% | 4.28% | 4.06% | 3.20% | 3.15% | 3.13% | 3.18% | 3.43% | 3.17% | 2.96% |
Average | 6.78% | 6.57% | 6.62% | 6.67% | 5.74% | 5.56% | 6.10% | 6.01% | 6.07% | 6.29% | 5.22% | 5.03% |
@inproceedings{kong2025calib3d, author = {Lingdong Kong and Xiang Xu and Jun Cen and Wenwei Zhang and Liang Pan and Kai Chen and Ziwei Liu}, title = {Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision}, pages = {1965-1978}, year = {2025}, }