|
Research
I'm working on World Models for Physical AI, exploring the scalable recipe for Robotics.
|
 |
Cosmos 3: Omnimodal World Models for Physical AI
NVIDIA. Contributed to Action Modality.
arXiv preprint, 2026.
paper /
project /
code
Cosmos 3 is an omnimodal world model for Physical AI that unifies understanding, generation, simulation, and action across language, images, video, audio, and robot actions in a single architecture.
|
 |
MLP Splatting: Object-Centric Neural Fields
Shinjeong Kim*, Yuzhou Cheng*, Xin Kong*, Paul H. J. Kelly, Andrew J. Davison
arXiv preprint, 2026.
paper /
project
MLP-Splatting decomposes scenes into a few object-centric light-field primitives, each an independent compact MLP with localized spatial support, enabling photorealistic novel-view synthesis and interactive object-level editing from RGB supervision alone.
|
 |
KV-Tracker: Real-Time Pose Tracking with Transformers
Marwan Taher, Ignacio Alzugaray, Kirill Mazur, Xin Kong, Andrew J. Davison
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR),
2026.
paper /
project /
video
KV-Tracker caches key-value pairs from multi-view geometry transformers to enable real-time 6-DoF pose tracking and online scene reconstruction from monocular RGB, achieving up to 15× speedup and ~27 FPS without drift or catastrophic forgetting.
|
 |
CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis
Xin Kong, Daniel Watson, Yannick Strümpler, Michael Niemeyer,
Federico Tombari
arXiv preprint, 2025.
paper
CausNVS is an autoregressive diffusion model for next novel view synthesis with relative pose encoded attention (CaPE) and efficient KV cache inference, towards real-time world modelling, AR streaming and interactive online generation.
|
 |
EscherNet: A Generative Model for Scalable View Synthesis Star
Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher,
Xiaojuan Qi, Andrew J. Davison
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR),
2024. Seattle WA, USA. Oral (0.78%)
paper /
project /
code /
video /
demo
EscherNet is a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with the camera positional encoding (CaPE), allowing continuous relative camera control between an arbitrary number of reference and target views.
|
 |
vMAP: Vectorised Object Mapping for Neural Field SLAM Star
Xin Kong, Shikun Liu, Marwan Taher, Andrew J. Davison
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR),
2023. Vancouver, Canada.
paper /
project /
video /
code
We present vMAP, an object-level real-time mapping system, with each object
represented by a separate MLP neural field model, and object models are optimised in parallel via vectorised training.
|
 |
Efficient Pedestrian Following by Quadruped Robots
Guangyao Zhai, Zhen Zhang, Xin Kong, Yong Liu.
IEEE International Conference on Robotics and Automation (ICRA), Workshop on
Legged Robots, 2021. Xi'an, China. (Best Extended Abstract Award Finalist)
paper /
video /
certificate
We use a quadruped robot to complete a pedestrian-following task in
challenging scenarios. The whole system consists of two modules: the perception and planning module,
relying on the onboard sensors.
|
 |
SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure
Lin Li, Xin Kong, Xiangrui Zhao, Yong Liu.
IEEE International Conference on Robotics and Automation (ICRA), 2021. Xi'an, China.
paper /
video
We present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM,
named SA-LOAM, which leverages semantics in odometry as well as loop closure detection.
|
 |
HR-Depth : High Resolution Self-Supervised Monocular Depth Estimation Star
Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, etc.
The 35th AAAI Conference on Artificial Intelligence (AAAI), 2021. Virtual.
paper /
code
Based on theoretical and empirical evidence, we present HR-Depth, for
high-resolution self-supervised monocular depth estimation.
|
 |
Semantic Graph Based Place Recognition for 3D Point Clouds Star
Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Yong Liu, etc.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020. Las Vegas, USA.
paper /
code /
video /
presentation
We propose a novel semantic graph based approach for large-scale place
recognition in 3D point clouds. A novel semantic graph representation and a fast and effective graph
similarity network is presented.
|
 |
PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud
Xin Kong, Guangyao Zhai, Baoquan Zhong, Yong Liu.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019. Macau, China.
paper /
video
We propose a framework to achieve point-wise semantic segmentation for 3D LiDAR point clouds.
|
 |
Zero123-hf: a diffusers implementation of zero123 Star
Xin Kong
code
A Hugggingface Diffusers (merged)
implementation of original Zero-1-to-3. Zero-1-to-3
is a large-scale diffusion models that can control the camera perspective, enabling zero-shot novel
view synthesis and 3D reconstruction from a single image.
|
 |
Awesome Point Cloud Place Recognition Star
Xin Kong, Lin Li
code
A list of papers about point cloud based place recognition,
also known as loop closure detection in SLAM.
|
 |
ICRA 2018 DJI RoboMaster AI Challenge
Team: I Hiter. Xingguang Zhong, Xin Kong, Xiaoyang Lyu, Le Qi, Hao Huang, Linrui Tian, Songwei Li
IEEE International Conference on Robotics and Automation (ICRA), 2018. Brisbane, Australia.
Global Champion /
Ranking: 1st/21 /
Certificate /
Video /
Rules
Our team built two fully automatic robots, including
machinery, circuit, control and algorithm. I was responsible for visual servo, localization, navigation and decision-making of robots.
|
 |
2017 & 2018 RoboMaster Robotics Competition
Team: I Hiter. Wei Chen, Yufei Liu, Xin Kong, Xiaoyang Lyu, etc.
China University Robot Competition (全国大学生机器人大赛), 2017 & 2018. Shenzhen, China.
First Prize /
Ranking: 4th/200+ /
Certificate /
Highlights
Our team built more than 10 complex automatic or semi-automatic robots.
I was responsible for visual servo, which involves computer vision, RGB-D camera calibration, machine learning,
multithreaded programming, ballistic model modeling, etc.
|
 |
2017 The Mathematical Contest in Modeling (MCM)
Shengqi Li, Xin Kong, Shuaishuai Liu
The Consortium for Mathematics and Its Applications (COMAP), 2017. Online.
Meritorious Winner (Top 10%) /
Paper /
Problems
Our team modeled the practical problems (Managing The Zambezi River) proposed by COMAP into mathematical
models. Through background research, reasonable assumptions and optimization analysis, a solution to the problem was obtained.
|
 |
2016 The Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM)
Shengqi Li, Xin Kong, Shuaishuai Liu
China Society for Industrial and Applied Mathematics (CSIAM), 2016. Online.
National Second Prize /
Paper /
Problems
Our team modeled the practical problems (Mooring System Design) proposed by CSIAM into mathematical
models. Through background research, reasonable assumptions and optimization analysis, a solution to the problem was obtained.
|
 |
2016 The ABU Asia-Pacific Robot Contest (ABU Robocon)
Team: HITCRT. Jingyang Wu, Kuan Xu, Xin Kong, etc.
Asia-Pacific Broadcasting Union, 2016. Zoucheng, China.
National First Prize /
Certificate
I was a echelon member of the vision group to help the official team members with Ubuntu
environment building, camera calibration, and computer vision algorithm testing. Thanks to my seniors for their careful guidance!
|
 |
Automatic Dustbin Robot based on Kinect v2
Team: HITCRT. Xingguang Zhong, Xin Kong, Chen Yao, Yide Liu, etc.
National Innovation Training Program, 2016. Harbin, China.
Bronze Prize of University Zuguang Cup
Our team designed an automatic dustin robot that can catch objects. I was in charge
of Kinect development, RGB-D camera calibration, moving object tracking, and trajectory prediction.
|
 |
Book Sterilizer based on Automatic Page Turning Device
Xin Kong, Dai Gao, Yiqiu Ding, Jiaming Cui, Jingda Du
College Training Program, 2015. Harbin, China.
National Invention Patent /
University-level First Prize
Our team designed and implemented an automatic book sterilizer to protect books
by cleaning up the bacteria and dust in books. Patent No. ZL 2015103334672.
|
|
Honors
May. 2021, Sun Youxian (Academician of the Chinese Academy of Engineering) Scholarship.
Nov. 2018, Academic Scholarship - Zhejiang University.
May. 2018, Outstanding Graduate - Harbin Institute of Technology.
May. 2018, 3rd Prize of Innovation Scholarship - Ministry of Industry and Information Technology.
Nov. 2016, 8841 Impact Scholarship - Harbin Institute of Technology.
|
|
About Me
Skills:PyTorch/TensorFlow/JAX, TPU/GPU Training, Python/C++, Linux, ROS, OpenCV/PCL, Matlab
Languages: Chinese: Native. English: Professional Proficiency.
|
「Talk is cheap. Show me the code.」
Last update: 2024.02.06. Thanks.
|
|