Skip to Main content Skip to Navigation

2D-3D scene understanding for autonomous driving

Abstract : In this thesis, we address the challenges of label scarcity and fusion of heterogeneous 3D point clouds and 2D images. We adopt the strategy of end-to-end race driving where a neural network is trained to directly map sensor input (camera image) to control output, which makes this strategy independent from annotations in the visual domain. We employ deep reinforcement learning where the algorithm learns from reward by interaction with a realistic simulator. We propose new training strategies and reward functions for better driving and faster convergence. However, training time is still very long which is why we focus on perception to study point cloud and image fusion in the remainder of this thesis. We propose two different methods for 2D-3D fusion. First, we project 3D LiDAR point clouds into 2D image space, resulting in sparse depth maps. We propose a novel encoder-decoder architecture to fuse dense RGB and sparse depth for the task of depth completion that enhances point cloud resolution to image level. Second, we fuse directly in 3D space to prevent information loss through projection. Therefore, we compute image features with a 2D CNN of multiple views and then lift them all to a global 3D point cloud for fusion, followed by a point-based network to predict 3D semantic labels. Building on this work, we introduce the more difficult novel task of cross-modal unsupervised domain adaptation, where one is provided with multi-modal data in a labeled source and an unlabeled target dataset. We propose to perform 2D-3D cross-modal learning via mutual mimicking between image and point cloud networks to address the source-target domain shift. We further showcase that our method is complementary to the existing uni-modal technique of pseudo-labeling.
Document type :
Complete list of metadata

Cited literature [162 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, August 25, 2020 - 10:53:09 AM
Last modification on : Wednesday, June 8, 2022 - 12:50:05 PM
Long-term archiving on: : Tuesday, December 1, 2020 - 7:03:09 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02921424, version 1


Maximilian Jaritz. 2D-3D scene understanding for autonomous driving. Machine Learning [cs.LG]. Université Paris sciences et lettres, 2020. English. ⟨NNT : 2020UPSLM007⟩. ⟨tel-02921424⟩



Record views


Files downloads