I am a Ph.D. candidate in the Department of Computer Science at the National University of Singapore, advised by Prof. Wei Tsang Ooi, Prof. Benoit Cottereau, and Dr. Lai Xing Ng. I also collaborate closely with Prof. Ziwei Liu from Nanyang Technological University, Singapore.
I am an intern at Apple, working with Dr. Afshin Dehghan and Dr. Josh Susskind.
My research focuses include spatial intelligence, multimodal large language models, and 3D/4D world modeling and evaluations.
I am the recipient of the National Scholarship (Ministry of Education, 2019), Research Achievement Award (NUS Computing, 2023), Dean's Graduate Research Excellence Award (NUS Computing, 2024), DAAD AInet Fellowship (DAAD, 2025), and Apple Scholars in AI/ML Ph.D. Fellowship (Apple, 2025).
I have been fortunate to collaborate with Apple Machine Learning Research, NVIDIA Research, ByteDance AI Lab, OpenMMLab, MMLab@NTU, and Motional.
Apple.
* equal contributions ‡ project lead § corresponding author

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

AadSFormer: Adaptive Serialized Transformers for Monocular Semantic Scene Completion from Indoor Environments

Veila: Scaling Diffusion Models for Panoramic LiDAR Point Cloud Generation from a Single Image

See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining

Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks

SafeMap: Robust HD Map Construction from Incomplete Observations

EventFly: Event Camera Perception from Ground to the Sky

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

FRNet: Frustum-Range Networks for Scalable LiDAR-Based Semantic Segmentation

NUC-Net: Non-Uniform Cylindrical Partition Networks for Efficient LiDAR Semantic Segmentation

Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

Is Your HD Map Constructor Reliable under Sensor Corruptions?

Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds

OpenESS: Event-Based Scene Understanding with Open Vocabularies

Multi-Space Alignments Towards Universal LiDAR Segmentation

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks

Towards Label-Free Scene Understanding by Vision Foundation Models

Rethinking Range View Representation for LiDAR Segmentation

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP

ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation

Benchmarking 3D Robustness to Common Corruptions and Sensor Failure