I am a Ph.D. candidate in the Department of Computer Science at the National University of Singapore, advised by Prof. Wei Tsang Ooi, Prof. Benoit Cottereau, and Dr. Lai Xing Ng. I also collaborate closely with Prof. Ziwei Liu from Nanyang Technological University, Singapore.
I am an intern at Apple, working with Dr. Afshin Dehghan and Dr. Josh Susskind.
My research focuses include spatial intelligence, multimodal large language models, and 3D/4D world modeling and evaluations.
I am the recipient of the National Scholarship (Ministry of Education, 2019), Research Achievement Award (NUS Computing, 2023), Dean's Graduate Research Excellence Award (NUS Computing, 2024), DAAD AInet Fellowship (DAAD, 2025), and Apple Scholars in AI/ML Ph.D. Fellowship (Apple, 2025).
I have been fortunate to collaborate with Apple Machine Learning Research, NVIDIA Research, ByteDance AI Lab, OpenMMLab, MMLab@NTU, and Motional.
Apple.
* equal contributions ‡ project lead § corresponding author
Don't Overthink with Pixels: Efficient Reasoning for Segmentation
Open-o3 Video: Grounded Video Reasoning with Spatio-Temporal Evidence
U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing
AadSFormer: Adaptive Serialized Transformers for Monocular Semantic Scene Completion from Indoor Environments
EventDrive: Event Cameras for Vision-Language Driving Intelligence
AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models
Veila: Scaling Diffusion Models for Panoramic LiDAR Point Cloud Generation from a Single Image
See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting
Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining
Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks
SafeMap: Robust HD Map Construction from Incomplete Observations
EventFly: Event Camera Perception from Ground to the Sky
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving
FRNet: Frustum-Range Networks for Scalable LiDAR-Based Semantic Segmentation
NUC-Net: Non-Uniform Cylindrical Partition Networks for Efficient LiDAR Semantic Segmentation
Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation
Is Your LiDAR Placement Optimized for 3D Scene Understanding?
Is Your HD Map Constructor Reliable under Sensor Corruptions?
Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds
OpenESS: Event-Based Scene Understanding with Open Vocabularies
Multi-Space Alignments Towards Universal LiDAR Segmentation
Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks
Towards Label-Free Scene Understanding by Vision Foundation Models
Rethinking Range View Representation for LiDAR Segmentation
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP
ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation
Benchmarking 3D Robustness to Common Corruptions and Sensor Failure