Figure 2. Laser partition example. We group
points whose inclinations ϕ are within the
same inclination range into the same area.
Leveraging the Spatial Prior for SSL
The distribution of real-world objects/backgrounds has a strong correlation
to their spatial positions in LiDAR scans, as shown in Fig. 1 (a). Objects and backgrounds inside a specified spatial area
of a LiDAR point cloud follow similar patterns, e.g., the close-range area is most likely road while the
long-range area consists of building, vegetation, etc. In another word, there exists a spatial area a ∈ A
where LiDAR points and semantic labels inside the area (denoted as Xin and Yin, respectively) will
have relatively low variations. Formally, the conditional entropy H(Xin, Yin|A) is smaller.
In this work, we propose to encourage the segmentation model to make
confident and consistent predictions at a predefined area, regardless of the data outside the area. The
predefined area set A determines the “strength” of the prior.
Figure 3. Illustration for LiDAR point partition.
Laser Beam Partition & Mixing
LiDAR sensors have a fixed number of laser beams which are
emitted isotropically around the ego-vehicle with predefined inclination angles. To obtain
a proper set of spatial areas A, we propose to partition the LiDAR point cloud based on laser beams.
We mix the aforementioned laser partitioned areas
A from two scans in an intertwining way, i.e.,
one takes from odd-indexed areas A1 = {a1, a3, ...}
and the other takes from even-indexed areas A2 =
{a2, a4, ...}, so that each area’s neighbor will be from
the other scan.
We find that the laser partition & mixing effectively “excites” a strong spatial prior in the LiDAR data; it significantly outperforms other partition choices, including: random
points (MixUp-like partition), random areas (CutMix-like partition), and other heuristics
like azimuth α (sensor horizontal direction) or radius r (sensor range directon) partitions.
Figure 4. Framework overview. Labeled scan is fed into the Student net to compute the supervised
loss (w/ ground-truth). Unlabeled scan and the generated pseudo-label are mixed with
the labeled scan and its labels via LaserMix to produce mixed data, which is then fed into the Student
net to compute the mix loss. Additionally, we adopt the EMA update for the Teacher net
and compute the mean teacher loss over Student net’s and Teacher net’s predictions.
Figure 5. Qualitative results from LiDAR top view and range view. The correct and incorrect
predictions are painted in green and red to highlight the difference. Best viewed in color.