The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications. Existing large-scale 3D perception datasets often contain data that are meticulously cleaned. Such configurations, however, cannot reflect the reliability of perception models during the deployment stage. In this work, we present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios against natural corruptions that occur in real-world environments. Specifically, we consider eight corruption types stemming from severe weather conditions, external disturbances, and internal sensor failure. We uncover that, although promising results have been progressively achieved on standard benchmarks, state-of-the-art 3D perception models are at risk of being vulnerable to corruptions. We draw key observations on the use of data representations, augmentation schemes, and training strategies, that could severely affect the model's performance. To pursue better robustness, we propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency. We hope our benchmark and approach could inspire future research in designing more robust and reliable 3D perception models.
We simulate eight corruption types from three categories: 1) Severe weather conditions, such as fog, rain, and snow; 2) External disturbances that are caused by motion blur or result in the missing of LiDAR beams; and 3) Internal sensor failure, including LiDAR crosstalk, possible incomplete echo, and cross-sensor scenarios. Each corruption is further split into three levels (light, moderate, and heavy) based on its severity.
Benchmarking results of 34 LiDAR-based detection and segmentation models on the six robustness sets (SemanticKITTI-C, KITTI-C, nuScenes-C, and WOD-C) in Robo3D. Figures from top to bottom: the task-specific accuracy (mAP, mIoU, NDS, mAPH) vs. [first row] mean corruption error (mCE), [second row] mean resilience rate (mRR), and [third row] sensitivity analysis among different corruption types.
Coming soon!
The 3D object detection realization of the density-insensitive training framework. The “full” and “partial” point clouds are fed into the teacher branch and student branch, respectively, for feature learning, while the latter is generated by randomly masking the original point cloud. To encourage cross-density consistency, we calculate the completion and confirmation losses which measure the distances of the teacher’s prediction (BEV map) and the student’s prediction (BEV map) between the other branch’s outputs.
The 3D semantic segmentation realization of the density-insensitive training framework. The “full” and “partial” point clouds are fed into the teacher branch and student branch, respectively, for feature learning, while the latter is generated by randomly masking the original point cloud. To encourage cross-density consistency, we calculate the completion and confirmation losses which measure the distances of the sub-sampled teacher’s prediction and interpolated student’s prediction between the other branch’s outputs.
@article{kong2023robo3d,
title = {Robo3D: Towards Robust and Reliable 3D Perception against Corruptions},
author = {Kong, Lingdong and Liu, Youquan and Li, Xin and Chen, Runnan and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
journal = {arXiv preprint arXiv:2303.17597},
year = {2023},
}