針對通道方面的自適性特徵融合__國立清華大學博碩士論文全文影像系統

帳號：guest(3.149.238.29) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	林暘竣
作者(外文):	Lin, Yang-Jiun
論文名稱(中文):	針對通道方面的自適性特徵融合
論文名稱(外文):	Channel-wise Adaptive Feature-level Fusion
指導教授(中文):	林嘉文
指導教授(外文):	Lin, Chia-Wen
口試委員(中文):	康立威蔡文錦彭文孝
口試委員(外文):	Kang, Li-Wei Tsai, Wen-Chin Peng, Wen-Xiao
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	104061525
出版年(民國):	107
畢業學年度:	107
語文別:	英文
論文頁數:	36
中文關鍵詞:	多模型融合、行人偵測、自適性閘門網路
外文關鍵詞:	Multimodal fusion、Pedestrian detection、Adaptive gating network
相關次數:	推薦:0 點閱:776 評分: 下載:20 收藏:0

在電腦視覺領域中，物件偵測近年來變成相當熱門的研究領域，到目前為止的眾多論文中，基於深度學習（Deep Learning）的方法和架構取得了相當亮眼的結果，對於一般的視覺影像，其甚至達到了接近人臉的觀察能力，然而，實際的環境是非常多變的，由於光線變化、移動致使的影像模糊或感測元件自身的限制，感測元件會記錄到許多雜訊，因此造成系統誤判。為了解決這樣的問題，許多基於多模型融合（Multimodal Fusion）的方法被提出，因為使用多種不同的感測器，可以達到互補的能力。在許多方法中，基於混合深度專家（a mixture of deep experts），並讓網路學習如何對最終結果作權值相加，以在多感測器中做出正確的選擇，這樣的決策融合（Decision Fusion）方法在多變環境下，可以篩選出較可靠的感測器，而達到不錯的成果，但它在物件偵測中，需要對所有可能為物件的區域（Region of interest）去計算權值，若應用於單步感測（SSD）和區域全捲積網路（R-FCN），這種盡量省略小區域次網路以減少運行時間的物件感測方法來說，反而使它們失去了原有的特點。
在本篇的做法中，我們將這種自適性權值相加的作法應用於特徵融合（Feature Fusion）中，並對深度神經網路輸出的特徵圖中的各個通道（channels of feature maps）作權值相加，避免了對所有可能為物件區域的計算，經過實驗，我們的方法雖然本身是特徵融合，但卻有了決策融合的優點，證明這樣的方法是可行的。

Object detection has been a hot research topic for many years to date. Among various tasks which we have achieved so far, recent state-of-the-art object detection through deep learning architecture methods have proven to be very robust to many kinds of image scenario. However, in reality, sensor noise like lighting changes or motion blur might confuse our system to make wrong decisions. To tackle these challenges, many methods based on sensor fusion have been proposed. Among them, Mees et al. [1] proposed an adaptive decision fusion approach for object detection that learns weighting the predictions of different sensor modalities. They achieved excellent performance that made system select better sensor in varied environment. However, they have to compute weights for each proposal which may cost more time, and it is unreasonable to apply the approach to other object detection methods such as SSD [2] and RFCN [3] because they save the time which applying a costly per-region subnetwork hundreds of times.
In this paper, we aim to learn weighting the channels of intermediate feature maps instead of weighting final predictions of different sensor so that we don’t need to compute weights for each proposal. Compared with normal feature-level fusion methods, our method can also handle the problem of high sensor noise. We test our methods on InOutDoor RGBD People Dataset created by Mees et al. and demonstrated that our method is workable.

摘要-----------------------------------ii
Abstract-------------------------------iii
Chapter 1 Introduction-----------------5
Chapter 2 Related Work-----------------10
Chapter 3 Proposed method--------------18
Chapter 4 Experiments and Discussion---23
Chapter 5 Conclusion-------------------33
References-----------------------------34

1. O. Mees, A. Eitel, W. Burgard. “Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments,” IROS, 2016.
2. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.- Y. Fu, and A. C. Berg. “Ssd: Single shot multibox detector,” ECCV, 2016.
3. J. Dai, Y. Li, K. He, and J. Sun. “R-fcn: Object detection via region-based fully convolutional networks,” arXiv preprint arXiv:1605.06409, 2016.
4. S. Ren, K. He, R. Girshick, and J. Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks,” NIPS, 2015.
5. J. Long, E. Shelhamer, T. Darrell. “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015.
6. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You only look once: Unified, real-time object detection,” arXiv preprint arXiv:1506.02640, 2015.
7. A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard. “Multimodal deep learning for robust RGB-D object recognition,” IROS, 2015.
8. S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. “Learning rich features from rgb-d images for object detection and segmentation,” ECCV, 2014.
9. J. Schlosser, C. K. Chow, and Z. Kira. “Fusing lidar and images for pedestrian detection using convolutional neural networks,” ICRA, 2016.
10. J. Wagner, V. Fischer, M. Herman, and S. Behnke. “Multispectral pedestrian detection using deep fusion convolutional neural networks,” ESANN, 2016.
11. X. Chen, H. Ma, J. Wang, B. Li, T. Xia. “Multi-View 3D Object Detection Network for Autonomous Driving,” CVPR, 2017.
12. A. Geiger, P. Lenz, and R. Urtasun. “Are we ready for autonomous driving? the kitti vision benchmark suite,” CVPR, 2012.
13. S. D. Jain, B. Xiong, K. Grauman. “FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos,” CVPR, 2017.
14. H. Wang, Y. Wang, Q. Zhang, S. Xiang, and C. Pan. “Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation,” CVPR, 2017
15. S. J. Park, K. S. Hong, S. Lee. “RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation,” ECCV, 2017.
16. A. Valada, J. Vertens, A. Dhall, W. Burgard. “AdapNet: Adaptive Semantic Segmentation in Adverse Environmental Conditions,” ICRA, 2017.
17. K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385, 2015.
18. T. Zahavy, S. Mannor, A. Magnani, A. Krishnan. “IS A PICTURE WORTH A THOUSAND WORDS? A DEEP MULTI-MODAL FUSION ARCHITECTURE FOR PRODUCT CLASSIFICATION IN E-COMMERCE,” ICLR, 2017.
19. A. Frome, G. S Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. “Devise: A deep visual-semantic embedding model,” Advances in neural information processing systems, 2013.
20. S. Poria, E. Cambria, N. Howard, G. B. Huang, and Amir Hussain. “Fusing audio, visual and textual clues for sentiment analysis from multimodal content,” Neurocomputing 174:50–59, 2016.
21. A. Krizhevsky, I. Sutskever, and G. E Hinton. “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, 2012.
22. J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. “Efﬁcient object localization using convolutional networks,” CVPR, 2015, pp. 648–656.
23. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. “Adaptive mixtures of local experts,” Neural computation, 1991.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文