Lingdong Kong


        

     

I am a Ph.D. candidate in the Department of Computer Science at the National University of Singapore, advised by Prof. Wei Tsang Ooi, Prof. Benoit Cottereau, and Dr. Lai Xing Ng. I also collaborate closely with Prof. Ziwei Liu from Nanyang Technological University, Singapore.

I am an intern at Apple, working with Dr. Afshin Dehghan and Dr. Josh Susskind.

My research focuses include spatial intelligence, multimodal large language models, and 3D/4D world modeling and evaluations.

I am the recipient of the National Scholarship (Ministry of Education, 2019), Research Achievement Award (NUS Computing, 2023), Dean's Graduate Research Excellence Award (NUS Computing, 2024), DAAD AInet Fellowship (DAAD, 2025), and Apple Scholars in AI/ML Ph.D. Fellowship (Apple, 2025).

I have been fortunate to collaborate with Apple Machine Learning Research, NVIDIA Research, ByteDance AI Lab, OpenMMLab, MMLab@NTU, and Motional.

News

Research Highlights

Industrial Experience

2026

Apple

Apple Scholar in AI/ML
2026

Xiaomi EV

Research Intern
  • Embodied foundation models for robotics and autonomous driving.
  • Mentor: Dr. Long Chen
2025

TikTok Global E-Commerce

Machine Learning Engineer Intern
  • Semantic retrieval from multi-lingual multi-condition query.
  • Mentor: Tianshu Yang and Zhihui Zhang
2024

NVIDIA Research

Research Intern
  • Vision-language-action models for autonomous systems.
  • Mentor: Dr. Boris Ivanovic.
2023

Shanghai AI Laboratory

Research Intern
  • 3D/4D generation and scene understanding.
  • Mentor: Dr. Liang Pan and Prof. Yu Qiao.
2023

OpenMMLab

Research Intern
  • MMDetection3D: open-source 3D perception toolbox and benchmark.
  • Mentor: Dr. Wenwei Zhang and Dr. Kai Chen.
2022

ByteDance AI Lab

Research Scientist Intern
  • Action recognition, video understanding, domain adaptation.
  • Mentor: Dr. Pengfei Wei.
2021

Motional

Autonomous Vehicle Intern
  • LiDAR-based 3D semantic segmentation for autonomous vehicles.
  • Mentor: Dr. Venice Erin Liong.
Expand ▾

Recent Publications

* equal contributions     ‡ project lead     § corresponding author

Don't Overthink with Pixels: Efficient Reasoning for Segmentation

Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang

Open-o3 Video: Grounded Video Reasoning with Spatio-Temporal Evidence

Jiahao Meng, Xiangtai Li, Haochen Wang, Yue Tan, Tao Zhang, Lingdong Kong, Yunhai Tong, Anran Wang, Zhiyang Teng, Yujing Wang, Zhuochen Wang

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Lingdong Kong*,‡, ..., The WorldBench Team
Oral Presentation (0.8% = 141/16092)

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, et al.
Highlight

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Wei Chow*, Linfeng Li*, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, et al.

AadSFormer: Adaptive Serialized Transformers for Monocular Semantic Scene Completion from Indoor Environments

Xuzhi Wang, Xinran Wu, Song Wang, Lingdong Kong§, Ziping Zhao

ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps

Sicheng Feng*, Song Wang*, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang,

EventDrive: Event Cameras for Vision-Language Driving Intelligence

Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi

AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models

Tianyi Yan, Tao Tang, Xingtai Cui, Yongkang Li, ..., Lingdong Kong, et al.

Inverse Weather Editing: From Entangled Generative Effects to Decomposed Weather Control

Chenghao Qian, Lingdong Kong, Rui Song, Wenjing Li, Gustav Markkula, et al.
Preprint, 2026

Artic4D: Skeleton-Native Motion Diffusion Across Heterogeneous Rigs

Wei Chow, Lingdong Kong, Pengyu Li, Yongyuan Liang, Linfeng Li, Xian Sun, Jiayi Ji, Yicong Li, Leigang Qu, Tat-Seng Chua
Preprint, 2026

VEF: Velocity-Field Distillation for Capability Synthesis in Flow-Matching Models

Wei Chow, Xu Zelin, Bo Dong, Lixue Gong, Yongyuan Liang, Lingdong Kong, et al.
Preprint, 2026

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong*, Xian Sun*, Wei Chow*, Linfeng Li, Kevin Qinghong Lin, Xuan Billy Zhang, Song Wang, Rong Li, Qing Wu, Wei Gao, Yingshuo Wang, et al.
Preprint, 2026

3D and 4D World Modeling: A Survey

Lingdong Kong*,‡, Wesley Yang*, Jianbiao Mei*, Youquan Liu*, Ao Liang*, Dekai Zhu*, Dongyue Lu*, Wei Yin*, Xiaotao Hu, Mingkai Jia, Junyuan Deng, et al.
Preprint, 2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Meng Chu*, Xuan Billy Zhang*, Kevin Qinghong Lin*, Lingdong Kong*, Jize Zhang*, Teng Tu*, Weijian Ma*, Ziqi Huang, Senqiao Yang, Wei Huang, et al.
Preprint, 2026

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

Youquan Liu, Weidong Yang, Ao Liang, Xiang Xu, Lingdong Kong, Yang Wu, et al.
Preprint, 2026

OneVL: One-Step Latent Reasoning with Vision-Language Explanation

Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, et al.
Preprint, 2026

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Lingdong Kong*,‡, ..., The WorldBench Team
Preprint, 2026

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments

Yifei Dong, Fengyi Wu, Qi He, Lingdong Kong, Li Heng, Minghan Li, et al.
Preprint, 2026

FLUX: Accelerating Cross-Embodiment Generative Navigation Policies via Rectified Flow and Static-to-Dynamic Learning

Zeying Gong, Yangyi Zhong, Yiyi Ding, Tianshuai Hu, ..., Lingdong Kong, et al.
Preprint, 2026

NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation

Tianshuai Hu, Zeying Gong, Lingdong Kong, Ao Liang, Yiyi Ding, Qi Zeng, et al.
Preprint, 2026

Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

Yifei Dong, Fengyi Wu, Guangyu Chen, Lingdong Kong, Xu Zhu, Qiyu Hu, et al.
Preprint, 2026

Language-Conditioned World Modeling for Visual Navigation

Yifei Dong, Fengyi Wu, Yilong Dai, Lingdong Kong, Guangyu Chen, Xu Zhu, Qiyu Hu, Tianyu Wang, Johnalbert Garnica, Feng Liu, Siyu Huang, Qi Dai, et al.
Preprint, 2026

Semantic-Aware, Physics-Informed, Geometry-Grounded Weather Synthesis

Chenghao Qian, Nedko Savov, Lingdong Kong, Yeying Jin, Rui Song, Wenjing Li, Zhun Zhong, Jiaqi Ma, Gustav Markkula, Luc Van Gool
Preprint, 2026

OneDay: Towards Full-Day Personalized Embodied Agents

Lingfeng Zhang, Kangyuan Zhou, Xiaoshuai Hao, Lingdong Kong, Lei Zhou, Yingbo Tang, Xinyu Zheng, Hangjun Ye, Xiaojun Liang, Jinglin Xu, et al.
Preprint, 2026

Learning to Remove Lens Flare in Event Camera

Haiqian Han, Lingdong Kong, Jianing Li, Chengtao Zhu, Jiacheng Lyu, Lai Xing Ng, Xiangyang Ji, Wei Tsang Ooi, Benoit R. Cottereau
Preprint, 2026

Veila: Scaling Diffusion Models for Panoramic LiDAR Point Cloud Generation from a Single Image

Youquan Liu, Lingdong Kong, Weidong Yang, Ao Liang, Jianxiong Gao, et al.

RewardMap: Tackling Sparse Rewards in Fine-Grained Visual Reasoning via Multi-Stage Reinforcement Learning

Sicheng Feng*, Kaiwen Tuo*, Song Wang, Lingdong Kong, Jianke Zhu, Huan Wang

LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, et al.
Oral Presentation (4.0% = 920/22977)

La La LiDAR: Large-Scale Layout Generation from LiDAR Data

Youquan Liu, Lingdong Kong, Weidong Yang, Xin Li, Ao Liang, Runnan Chen, Ben Fei, Tongliang Liu

See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

Dongyue Lu, Ao Liang, Tianxin Huang, Xiao Fu, Yuyang Zhao, Baorui Ma, Liang Pan, Wei Yin, Lingdong Kong, Wei Tsang Ooi, Ziwei Liu

Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining

Xiang Xu*, Lingdong Kong*, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

Zeying Gong, Rong Li, Tianshuai Hu, Ronghe Qiu, Lingdong Kong, et al.

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Lingdong Kong, Dongyue Lu, Ao Liang, Rong Li, Yuhao Dong, et al.
Spotlight (3.2% = 688/21575)

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Jialong Zuo, Yongtai Deng, Lingdong Kong, Jingkang Yang, Rui Jin, Yiwei Zhang, Nong Sang, Liang Pan, Ziwei Liu, Changxin Gao

3EED: Ground Everything Everywhere in 3D

Rong Li, Yuhao Dong, Tianshuai Hu, Ao Liang, Youquan Liu, Dongyue Lu, Liang Pan, Lingdong Kong, Junwei Liang, Ziwei Liu

MERIT: Multilingual Semantic Retrieval with Interleaved Condition Query

Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, et al.

SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Dekai Zhu, Yixuan Hu, Youquan Liu, Dongyue Lu, Lingdong Kong, Slobodan Ilic

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

Dongyue Lu, Lingdong Kong, Gim Hee Lee, Camille Simon Chane, Wei Tsang Ooi

Perspective-Invariant 3D Object Detection

Ao Liang*, Lingdong Kong*,‡, Dongyue Lu*, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, et al.

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu

MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang, Lingdong Kong§, Ziping Zhao

SafeMap: Robust HD Map Construction from Incomplete Observations

Xiaoshuai Hao, Lingdong Kong, Rong Yin, Pengwei Wang, Jing Zhang, Yunfeng Diao, Shu Zhao

EventFly: Event Camera Perception from Ground to the Sky

Lingdong Kong, Dongyue Lu, Xiang Xu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau

LiMoE: Mixture of LiDAR Data Representation Learners from Automotive Scenes

Xiang Xu*, Lingdong Kong*, Hui Shuai, Liang Pan, Ziwei Liu, Qingshan Liu

GEAL: Generalizable 3D Object Affordance Learning with Cross-Modal Consistency

Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong, Xulei Yang, Junwei Liang

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, et al.

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu
Spotlight (5.1% = 580/11372)

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

Lingdong Kong*, Xiang Xu*, Jun Cen, Wenwei Zhang, Kai Chen, Ziwei Liu
Oral Presentation (4.9% = 120/2458)

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Lingdong Kong*, Xiang Xu*, Youquan Liu*, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

FRNet: Frustum-Range Networks for Scalable LiDAR-Based Semantic Segmentation

Xiang Xu, Lingdong Kong, Hui Shuai, Qingshan Liu

NUC-Net: Non-Uniform Cylindrical Partition Networks for Efficient LiDAR Semantic Segmentation

Xuzhi Wang, Wei Feng, Lingdong Kong, Liang Wan

Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

Jingyi Xu, Weidong Yang, Lingdong Kong, Youquan Liu, et al.

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

Ye Li, Lingdong Kong, Hanjiang Hu, Xiaohao Xu, Xiaonan Huang
Spotlight (2.5% = 388/15671)

Is Your HD Map Constructor Reliable under Sensor Corruptions?

Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong§, Jing Zhang

4D Contrastive Superflows are Dense 3D Representation Learners

Xiang Xu*, Lingdong Kong*, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds

Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Tai Wang, Xinge Zhu, Yuexin Ma

OpenESS: Event-Based Scene Understanding with Open Vocabularies

Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi
Highlight (2.8% = 324/11532)

Multi-Space Alignments Towards Universal LiDAR Segmentation

Youquan Liu*, Lingdong Kong*, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks

Fangzhou Hong, Lingdong Kong, Hui Zhou, Xingge Zhu, Hongsheng Li,
Ziwei Liu

Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving

Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, et al.

RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu*, Lingdong Kong*, Jun Cen, Runnan Chen, Wenwei Zhang, et al.
Spotlight (3.0% = 378/12343)

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, et al.

Towards Label-Free Scene Understanding by Vision Foundation Models

Runnan Chen, Youquan Liu, Lingdong Kong, Nenglun Chen, Xinge Zhu, Yuexin Ma, Tongliang Liu, Wenping Wang

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

Lingdong Kong*, Youquan Liu*, Xin Li*, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

Rethinking Range View Representation for LiDAR Segmentation

Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

Youquan Liu, Runnan Chen, Xin Li, Lingdong Kong, Yuchen Yang, et al.

LaserMix for Semi-Supervised LiDAR Semantic Segmentation

Lingdong Kong*, Jiawei Ren*, Liang Pan, Ziwei Liu
Highlight (2.5% = 235/9155)

CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP

Runnan Chen, Youquan Liu, Lingdong Kong, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang

ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation

Lingdong Kong, Niamul Quader, Venice Erin Liong

Benchmarking 3D Robustness to Common Corruptions and Sensor Failure

Lingdong Kong*, Youquan Liu*, Xin Li*, Runnan Chen, Wenwei Zhang, et al.
Best Workshop Paper Award

Tech Reports


The RoboWorld Challenge: Embodied World Modeling for Robotics and Autonomous Driving

Lingdong Kong, Shaoyuan Xie, Xiaoshuai Hao, Zeying Gong, Yifei Dong, et al.
Technical Report, 2026

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Lingdong Kong, Shaoyuan Xie, Zeying Gong, Ye Li, Meng Chu, Ao Liang, et al.
Technical Report, 2025

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Kai Chen, et al.
Technical Report, 2024

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, et al.
Technical Report, 2023

Workshop Organizers

Academic Services

Conference Reviewer

  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • IEEE/CVF International Conference on Computer Vision (ICCV)
  • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  • European Conference on Computer Vision (ECCV)
  • Conference on Neural Information Processing Systems (NeurIPS)
  • International Conference on Learning Representations (ICLR)
  • International Conference on Machine Learning (ICML)
  • IEEE International Conference on Robotics and Automation (ICRA)
  • IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • AAAI Conference on Artificial Intelligence (AAAI)

Journal Reviewer

  • International Journal of Computer Vision (IJCV)
  • International Journal of Robotics Research (IJRR)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • IEEE Transactions on Intelligent Vehicles (TIV)
  • IEEE Transactions on Intelligent Transportation Systems (TITS)
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Transactions on Multimedia (TMM)
  • IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • IEEE Robotics and Automation Letters (RA-L)
  • ISPRS Journal of Photogrammetry and Remote Sensing (P&RS)