Xin Kong (孔 昕)

I'm a PhD student (Oct. 2021 - 2025), working with Prof. Andrew Davison at Dyson Robotics Lab at Imperial College London.

I obtained my Master degree (Sep. 2018 - Jun. 2021) with thesis "Deep Point Cloud Semantic Segmentation and its Application in Robotics" in the College of Control Science and Engineering at Zhejiang University, supervised by Prof. Yong Liu. Prior to ZJU, I obtained a B.Eng (Sep.2014 - Jun. 2018) from Harbin Institute Of Technology.

I was a research intern (May. 2020 - Sep. 2020) at YouTu Lab of Tencent (Shenzhen, China). In my undergraduate study, I was a team member of computer vision group in Harbin Institute of Technology Competition Robotics Team (HITCRT), named I Hiter (Harbin, China).

Email / Google Scholar / Github / LinkedIn / Twitter

Research

I'm interested in computer vision and robotics. Currently, I'm working on 3D generative diffusion model, neural SLAM, scalable 3D scene representation, also interested in bringing large visual/language priors into 3D vision and robotics to achieve high-level intelligence. My research objective is to build intelligent robots capable of continuously learning and perceiving the real world as humans do, which is hard but worthwhile.

Publications
PontTuset

EscherNet: A Generative Model for Scalable View Synthesis Star
Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher,
Xiaojuan Qi, Andrew J. Davison
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024. Seattle WA, USA. Oral (0.78%)
paper / project / code / video / demo

EscherNet is a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with the camera positional encoding (CaPE), allowing continuous relative camera control between an arbitrary number of reference and target views.

PontTuset

vMAP: Vectorised Object Mapping for Neural Field SLAM Star
Xin Kong, Shikun Liu, Marwan Taher, Andrew J. Davison
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023. Vancouver, Canada.
paper / project / video / code

We present vMAP, an object-level real-time mapping system, with each object represented by a separate MLP neural field model, and object models are optimised in parallel via vectorised training.

PontTuset

RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network Star
Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, Yong Liu, etc.
IEEE Robotics and Automation Letters (RA-L), presented at IEEE International Conference on Robotics and Automation (ICRA), 2022. Philadelphia (PA), USA.
paper / code / video

We propose a rotation invariant neural network structure that can detect reverse loop closures even training data is all in the same direction. Our network is lightweight and can operate more than 8000 FPS on an i7-9700 CPU.

PontTuset

SSC: Semantic Scan Context for Large-Scale Place Recognition Star
Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, Yong Liu, etc.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. Prague, Czech Republic.
paper / code / video

We propose a novel semantic-based global descriptor for 3D point cloud place recognition. A two-step global semantic ICP is presented to obtain the 3D related pose (x,y,yaw) and further improve the descriptor matching accuracy.

PontTuset

Efficient Pedestrian Following by Quadruped Robots
Guangyao Zhai, Zhen Zhang, Xin Kong, Yong Liu.
IEEE International Conference on Robotics and Automation (ICRA), Workshop on Legged Robots, 2021. Xi'an, China. (Best Extended Abstract Award Finalist)
paper / video / certificate

We use a quadruped robot to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the onboard sensors.

PontTuset

SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure
Lin Li, Xin Kong, Xiangrui Zhao, Yong Liu.
IEEE International Conference on Robotics and Automation (ICRA), 2021. Xi'an, China.
paper / video

We present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection.

PontTuset

PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network
Jinhao Cui, Hao Zou, Xin Kong, Xuemeng Yang, etc.
IEEE International Conference on Robotics and Automation (ICRA), 2021. Xi'an, China.
paper / video

We present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM-oriented compression, aiming to select a compact subset of points with high priority to maintain localization accuracy.

PontTuset

HR-Depth : High Resolution Self-Supervised Monocular Depth Estimation Star
Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, etc.
The 35th AAAI Conference on Artificial Intelligence (AAAI), 2021. Virtual.
paper / code

Based on theoretical and empirical evidence, we present HR-Depth, for high-resolution self-supervised monocular depth estimation.

PontTuset

Semantic Graph Based Place Recognition for 3D Point Clouds Star
Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Yong Liu, etc.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020. Las Vegas, USA.
paper / code / video / presentation

We propose a novel semantic graph based approach for large-scale place recognition in 3D point clouds. A novel semantic graph representation and a fast and effective graph similarity network is presented.

PontTuset

PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud
Xin Kong, Guangyao Zhai, Baoquan Zhong, Yong Liu.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019. Macau, China.
paper / video

We propose a framework to achieve point-wise semantic segmentation for 3D LiDAR point clouds.

Competitions & Projects
PontTuset

Zero123-hf: a diffusers implementation of zero123 Star
Xin Kong
code

A Hugggingface Diffusers (merged) implementation of original Zero-1-to-3. Zero-1-to-3 is a large-scale diffusion models that can control the camera perspective, enabling zero-shot novel view synthesis and 3D reconstruction from a single image.

PontTuset

Awesome Point Cloud Place Recognition Star
Xin Kong, Lin Li
code

A list of papers about point cloud based place recognition, also known as loop closure detection in SLAM.

PontTuset

自动驾驶三维点云分割 (3D Semantic Segmentation of Point Clouds in autonomous driving scene)
Team: 试一下PointNet. Xin Kong, Chang Zhou, Baoquan Zhong.
CCF大数据与计算智能大赛 (CCF Big Data & Computing Intelligence Contest) (BDCI), 2018. Hangzhou, China.
Ranking: 9th/1408 / Certificate / Code

We split a 3D scene into multi grids by sliding 3D bbox and use PointNet++ as backbone to semantically segment 3D point clouds.

PontTuset

ICRA 2018 DJI RoboMaster AI Challenge
Team: I Hiter. Xingguang Zhong, Xin Kong, Xiaoyang Lyu, Le Qi, Hao Huang, Linrui Tian, Songwei Li
IEEE International Conference on Robotics and Automation (ICRA), 2018. Brisbane, Australia.
Global Champion / Ranking: 1st/21 / Certificate / Video / Rules

Our team built two fully automatic robots, including machinery, circuit, control and algorithm. I was responsible for visual servo, localization, navigation and decision-making of robots.

PontTuset

2017 & 2018 RoboMaster Robotics Competition
Team: I Hiter. Wei Chen, Yufei Liu, Xin Kong, Xiaoyang Lyu, etc.
China University Robot Competition (全国大学生机器人大赛), 2017 & 2018. Shenzhen, China.
First Prize / Ranking: 4th/200+ / Certificate / Highlights

Our team built more than 10 complex automatic or semi-automatic robots. I was responsible for visual servo, which involves computer vision, RGB-D camera calibration, machine learning, multithreaded programming, ballistic model modeling, etc.

PontTuset

2017 The Mathematical Contest in Modeling (MCM)
Shengqi Li, Xin Kong, Shuaishuai Liu
The Consortium for Mathematics and Its Applications (COMAP), 2017. Online.
Meritorious Winner (Top 10%) / Paper / Problems

Our team modeled the practical problems (Managing The Zambezi River) proposed by COMAP into mathematical models. Through background research, reasonable assumptions and optimization analysis, a solution to the problem was obtained.

PontTuset

2016 The Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM)
Shengqi Li, Xin Kong, Shuaishuai Liu
China Society for Industrial and Applied Mathematics (CSIAM), 2016. Online.
National Second Prize / Paper / Problems

Our team modeled the practical problems (Mooring System Design) proposed by CSIAM into mathematical models. Through background research, reasonable assumptions and optimization analysis, a solution to the problem was obtained.

PontTuset

2016 The ABU Asia-Pacific Robot Contest (ABU Robocon)
Team: HITCRT. Jingyang Wu, Kuan Xu, Xin Kong, etc.
Asia-Pacific Broadcasting Union, 2016. Zoucheng, China.
National First Prize / Certificate

I was a echelon member of the vision group to help the official team members with Ubuntu environment building, camera calibration, and computer vision algorithm testing. Thanks to my seniors for their careful guidance!

PontTuset Automatic Dustbin Robot based on Kinect v2
Team: HITCRT. Xingguang Zhong, Xin Kong, Chen Yao, Yide Liu, etc.
National Innovation Training Program, 2016. Harbin, China.
Bronze Prize of University Zuguang Cup

Our team designed an automatic dustin robot that can catch objects. I was in charge of Kinect development, RGB-D camera calibration, moving object tracking, and trajectory prediction.

PontTuset

Book Sterilizer based on Automatic Page Turning Device
Xin Kong, Dai Gao, Yiqiu Ding, Jiaming Cui, Jingda Du
College Training Program, 2015. Harbin, China.
National Invention Patent / University-level First Prize

Our team designed and implemented an automatic book sterilizer to protect books by cleaning up the bacteria and dust in books. Patent No. ZL 2015103334672.

Honors

May. 2021, Sun Youxian (Academician of the Chinese Academy of Engineering) Scholarship.

Nov. 2018, Academic Scholarship - Zhejiang University.

May. 2018, Outstanding Graduate - Harbin Institute of Technology.

May. 2018, 3rd Prize of Innovation Scholarship - Ministry of Industry and Information Technology.

Nov. 2016, 8841 Impact Scholarship - Harbin Institute of Technology.

About Me

Skills: Python / C / C ++ / Matlab, PyTorch / TensorFlow, Linux, ROS, OpenCV, PCL, Boost

Languages: Chinese: Native. English: IELTS: 7.


「Talk is cheap. Show me the code.」

Last update: 2024.02.06. Thanks.