基於單階段光流預測複數未來物體軌跡之方法__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.198) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳宇雯
作者(外文):	Chen, Yu-Wen
論文名稱(中文):	基於單階段光流預測複數未來物體軌跡之方法
論文名稱(外文):	S2F2: Single-Stage Flow Forecasting for Future Multiple Object Trajectories Prediction
指導教授(中文):	李濬屹
指導教授(外文):	Lee, Chun-Yi
口試委員(中文):	陳煥宗邱維辰
口試委員(外文):	Chen, Hwann-Tzong Chiu, Wei-Chen
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	109062557
出版年(民國):	111
畢業學年度:	110
語文別:	英文
論文頁數:	29
中文關鍵詞:	光流預測、複數未來物體軌跡預測、機器學習、深度學習、物體偵測、物體追蹤
外文關鍵詞:	Multiple trajectory forecasting、)ptical flow estimation、single-stage forecasting framework
相關次數:	推薦:0 點閱:973 評分: 下載:0 收藏:0

本論文提出了一個名為 S2F2 的單階段預測複數未來行人軌跡的方法。S2F2的輸入是未處理的影片，不需要任何額外資訊，就能同時做到多物體偵測(Multiple Object Detection)，多物體追蹤(Multiple Object Tracking)，和多個物體未來位置預測(Multiple Object Forecasting)，最後輸出為每個物體的位置，他的獨特 ID (Identification)，和他的未來位置。S2F2的架構主要可以分成兩部分。(1) 資訊抽取模組 (context feature extractor)，主要負責抽取圖片有用的資訊，和 (2) 預測未來模組 (forecasting module)，主要負責整合之前到現在的資訊，用來預測未來位置。兩個部分的輸出結果會進行處理以生成行人的最終預測軌跡。之前針對未來行人軌跡的研究大多可以分成兩階段。因為其兩階段的特性，在輸入場面物體增加的情況下，需要的時間和計算量也會隨之增加。本論文利用預測的光流，將這兩個階段整合預測，解決了在有多個物體的情況下，需要進行多次預測的問題。因此不論場面的物體數量多寡，計算量都保持一致。為了公平的將 S2F2 與其他方法進行比較，我們設計了一個 StaticMOT 數據集，該數據集排除了涉及相機移動的影片。實驗結果表明 S2F2 能夠贏過兩種傳統的軌跡預測算法和一個最近利用深度學習的兩階段模型，並且能同時保持物體偵測和追蹤 (MOT) 的準確度。

In this thesis, we present a single-stage framework, named S2F2, for forecasting multiple human trajectories from raw video images by predicting future optical flows. S2F2 differs from the previous two-stage approaches in that it performs detection, Re-ID, and forecasting of multiple pedestrians at the same time. The architecture of S2F2 consists of two primary parts: (1) a context feature extractor responsible for extracting a shared latent feature embedding for performing detection and Re-ID, and (2) a forecasting module responsible for extracting a shared latent feature embedding for forecasting. The outputs of the two parts are then processed to generate the final predicted trajectories of pedestrians. Unlike previous approaches, the computational burden of S2F2 remains consistent even if the number of pedestrians grows. In order to fairly compare S2F2 against the other approaches, we designed a StaticMOT dataset that excludes video sequences involving egocentric motions. The experimental results demonstrate that S2F2 is able to outperform two conventional trajectory forecasting algorithms and a recent learning-based two-stage model, while maintaining tracking performance on par with the contemporary MOT models.

Abstract (Chinese) I
Abstract II
Acknowledgements III
Contents IV
List of Figures VI List of Tables VIII
1 Introduction P.1
2 Related Work P.4
3 Methodology P.6
3.1 ProblemFormulation P.6
3.2 OverviewoftheS2F2Framework P.6
3.3 ContextFeatureExtractor P.7
3.4 ForecastingModule P.8
3.4.1 GRUEncoderBlock P.9
3.4.2 FutureFlowDecoderBlock P.9
3.5 Online Association with Forecasting Refinement P.10
3.6 TrainingObjective P.11
4 Experimental Results P.12
4.1 Data Curation for Forecasting without Camera Movement P.12
4.2 TrajectoryForecastingResults P.13
4.2.1 Baselines P.13
4.2.2 ForecastingMetrics P.14
4.2.3 QuantitativeResults P.14
4.2.4 QualitativeResults P.16
4.3 MultipleObjectTrackingResults P.17
5 Ablation Studies P.19
5.1 InferenceSpeed P.19
5.2 GRUEncoderOptimization P.19
5.3 Effectiveness of the Forecasting Refinement for Online Association. P.22
6 Conclusion P.24
7 Bibliography P.25

[1] Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, pages 1–19, 2021.
[2] Oliver Styles, Victor Sanchez, and Tanaya Guha. Multiple object forecasting: Predicting future object locations in diverse environments. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 690–699, 2020.
[3] Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018.
[4] Junaid Ahmed Ansari and Brojeshwar Bhowmick. Simple means faster: Real- time human motion forecasting in monocular first person videos on cpu. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10319–10326. IEEE, 2020.
[5] Hai-Yan Yao, Wang-Gen Wan, and Xiang Li. End-to-end pedestrian trajec- tory forecasting with transformer network. ISPRS International Journal of Geo-Information, 11(1):44, 2022.
[6] Boris Ivanovic and Marco Pavone. The trajectron: Probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2375– 2384, 2019. [7] Osama Makansi, Ozgun Cicek, Kevin Buchicchio, and Thomas Brox. Multi- modal future localization and emergence prediction for objects in egocentric view with a reachability prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4354–4363, 2020.
[8] Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang. Towards real-time multi-object tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 107–122. Springer, 2020.
[9] Xingyi Zhou, Vladlen Koltun, and Philipp Kr ̈ahenbu ̈hl. Tracking objects as points. In European Conference on Computer Vision, pages 474–490. Springer, 2020.
[10] Pavel Tokmakov, Jie Li, Wolfram Burgard, and Adrien Gaidon. Learning to track with object permanence. arXiv preprint arXiv:2103.14258, 2021.
[11] Bing Shuai, Andrew G Berneshawi, Davide Modolo, and Joseph Tighe. Multi- object tracking with siamese track-rcnn. arXiv preprint arXiv:2004.07786, 2020.
[12] Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, and Ling Shao. Anchor-free person search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7690–7699, 2021.
[13] Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016.
[14] Oily Styles, Arun Ross, and Victor Sanchez. Forecasting pedestrian trajectory with machine-annotated training data. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 716–721. IEEE, 2019.
[15] Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. Fu- ture person localization in first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7593–7602, 2018.
[16] Amir Rasouli, Iuliia Kotseruba, and John K Tsotsos. Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 206–213, 2017.
[17] Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. In Computer graphics forum, volume 26, pages 655–664. Wiley Online Library, 2007.
[18] Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision, pages 549–565. Springer, 2016.
[19] A. Ess, B. Leibe, K. Schindler, , and L. van Gool. A mobile vision system for robust multi-person tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE Press, June 2008.
[20] Yuan Liu, Ruoteng Li, Yu Cheng, Robby T Tan, and Xiubao Sui. Object tracking using spatio-temporal networks for future prediction location. In European Conference on Computer Vision, pages 1–17. Springer, 2020.
27
[21] Lukas Neumann and Andrea Vedaldi. Pedestrian and ego-vehicle trajectory prediction from monocular camera. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 10204–10212, 2021.
[22] SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: A machine learning ap- proach for precipitation nowcasting. In Advances in neural information pro- cessing systems, pages 802–810, 2015.
[23] Rico Jonschkowski, Austin Stone, Jonathan T Barron, Ariel Gordon, Kurt Konolige, and Anelia Angelova. What matters in unsupervised optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 557–572. Springer, 2020.
[24] Zhichao Yin and Jianping Shi. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1983–1992, 2018.
[25] Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J Black. Competitive collaboration: Joint unsuper- vised learning of depth, camera motion, optical flow and motion segmenta- tion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12240–12249, 2019.
[26] Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, and Wei Xu. Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8071–8081, 2019.
[27] Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo,Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by asso- ciating every detection box. arXiv preprint arXiv:2110.06864, 2021.
[28] Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
[29] A. Milan, L. Leal-Taix ́e, I. Reid, S. Roth, and K. Schindler. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831 [cs], March 2016. arXiv: 1603.00831.
[30] P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taix ́e. Mot20: A benchmark for multi object track- ing in crowded scenes. arXiv:2003.09003[cs], March 2020. arXiv: 2003.09003.
[31] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文