作者(外文):Wang, Ting-Wei
論文名稱(外文):Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model
指導教授(外文):Lai, Shang-Hong
口試委員(外文):Hsu, Chiou-Ting
Chen, Hwann-Tzong
Chen, Yi-Ting
外文關鍵詞:pedestrian crossing intention predictionpedestrian action predictionpedestrian protectionAdvanced Driver Assistance Systemscomputer vision in automotivedeep learning in automotive
The popularity of autonomous driving and advanced driver assistance systems can potentially reduce thousands of car accidents and casualties. In particular, pedestrian prediction and protection is an urgent development priority for such systems. Prediction of pedestrians' intentions of crossing the road or their actions based on computer vision can help such systems to assess the risk of pedestrians in front of vehicles in advance. Relevant research has been continuously reported in recent years.
However, we believe that previous works have not fully exploited all the available information to make the prediction.
We propose a multi-modal pedestrian crossing intention prediction framework based on the Transformer model to address the above issues. Our method exploits the excellent sequential modeling ability and the parallelization advantage of the Transformer enabling the model to perform stably and smoothly in this task. We also represent traffic environment information in a novel way, allowing such information can be fully exploited. Moreover, We uses lifted 3D human pose data and 3D head orientation data for pedestrians, allowing the model to understand pedestrian posture better. Finally, our experimental results show the proposed system provides state-of-the-art accuracy on benchmarking datasets.
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 6
2.1 Sequential Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Exploration of Novel Inputs . . . . . . . . . . . . . . . . . . . . . 7
2.3 The Rise of the Transformer Model . . . . . . . . . . . . . . . . . 12
3 Proposed Method 13
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Module Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Feature Pre-processing Module . . . . . . . . . . . . . . . 14
3.2.2 Prediction Module . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Experiments 35
4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Implementation Details . . . . . . . . . . . . . . . . . . . . 37
4.2 Evaluation of the Proposed Model . . . . . . . . . . . . . . . . . . 41
4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 Importance of Input Data . . . . . . . . . . . . . . . . . . . 44
4.3.2 Comparison of Different Fusion Methods . . . . . . . . . . 47
4.3.3 Comparison of Traffic Awareness Feature Fusions . . . . . 48
4.4 Qualitative Justification . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Discussion of Failure Case and Future Directions . . . . . . 53
5 Conclusions 58
References 59
