基於影片中移動資訊改善跨域人體目標分割__國立清華大學博碩士論文全文影像系統

帳號：guest(3.136.18.93) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳玉亭
作者(外文):	Chen, Yu-Ting
論文名稱(中文):	基於影片中移動資訊改善跨域人體目標分割
論文名稱(外文):	Leveraging Motion Priors in Videos for Improving Human Segmentation across Domains
指導教授(中文):	孫民
指導教授(外文):	Sun, Min
口試委員(中文):	陳煥宗陳祝嵩
口試委員(外文):	Chen, Hwann-Tzong Chen, Chu-Song
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	105061528
出版年(民國):	107
畢業學年度:	106
語文別:	英文
論文頁數:	38
中文關鍵詞:	主動學習、域適應學習、人體目標分割
外文關鍵詞:	Active learning、Domain Adaptation、Human segmentation
相關次數:	推薦:0 點閱:685 評分: 下載:9 收藏:0

儘管深度學習近期在語意分割 (sementic segmentation) 已發展成熟，但當一個訓練好的模型遇到真實情況下的測資，因實際資料的特徵分布與訓練集有偏差將造成測試效果不如預期。近年來一些域適應學習 (Domain Adaptation) 與主動學習 (Active Learning) 被提出用來解決上述問題。然而極少的珊究強調利用影片的資訊來輔助解決模型跨域表現不佳的情況。
在這篇論文中，我們提出一個一個弱監督式主動學習的方法來改善人體目標分割 (human segmentation)，並利用影片中容易取得的移動資訊 (motion prior)。在我們固定攝影機的情況下，使用光流(Optical Flow) 得到影片中像素移動的資訊，可將之轉換為前景與背景的分割區塊，前景相當於人體目標的分割區。我們提出以強化學習訓練得到一個基於記憶綱路的決策模型，去挑選較佳的前景分割塊。被挑選出的分割塊通常代表有較正蒞的分割邊界，將被當成訓練目標並直接用來微調模型參數。在評估模型方面，我們蒐集了一個監視攝影機畫面的資料庫，以及在現有公開的資料庫-UrbanStreet 做測試。我們提出的方法改善模型在跨域 (含多場景與多攝影光模態) 的表現。最後，我們的方法可與現有的域適應學習算法結合，協同訓練後達到更好的跨域表現。

Despite many advances in deep-learning based semantic segmentation, performance drop due to distribution mismatch is often encountered in the real world. Recently, a few domain adaptations and active learning approaches have been proposed to mitigate the performance drop. However, very little attention has been made toward leveraging information in videos which are naturally captured in most camera systems.
In this work, we propose to leverage “motion prior” in videos for improving human segmentation in a weakly-supervised active learning setting. By extracting motion information using optical flow in videos, we can extract candidate foreground motion segments (referred to as motion prior) potentially corresponding to human segments. We propose to learn a memory-network-based policy model to select strong candidate segments (referred to as strong motion prior) through reinforcement learning. The selected segments have high precision and are directly used to finetune the model. In a newly collected surveillance camera dataset and a publicly available UrbanStreet dataset, our proposed method improves the performance of human segmentation across multiple scenes and modalities (i.e., RGB to Infrared (IR)). Last but not least, our method is empirically complementary to existing domain adaptation approaches such that additional performance gain is achieved by combining our weakly-supervised active learning approach with domain adaptation approaches.

摘要 ii
Abstract iii
誌謝 iv
1 Introduction 1
1.1 Motivation and Problem Description 1
1.2 Main Contribution 4
1.3 Thesis Structure 4
2 Related Work 5
2.1 Human Segmentation 5
2.2 Motion Segmentation 6
2.3 Active learning 6
2.4 Domain Adaptation 7
3 Preliminaries 8
3.1 U-Net for Semantic Segmentation 8
3.2 Adversarial Domain Adaptation 9
3.2.1 Global and Class-wise Domain Shift 10
3.2.2 Global Adversarial Domain Adaptation 10
3.2.3 Class-wise Adversarial Domain Adaptation 11
3.3 Policy Gradient 12
4 Surveillance Datasets 14
4.1 Cross-domains Settings 15
4.2 Data Collection Details 16
5 Policy-based Active Learning in Cross-domain Setting 17
5.1 Motion Priors from Video Frames 17
5.2 Motion Priors Selection 18
5.2.1 Network Architecture 19
5.2.2 Reinforcement Learning 20
5.2.3 Patch-based Selection. 21
5.2.4 Inference on Target Domain. 22
5.3 Combined with Adversarial Domain Adaptation 23
5.3.1 Fine-tuning in Both Domains 23
5.3.2 Full Optimization Problem 24
6 Experiments 25
6.1 Introduction 25
6.1.1 Additional Dataset 25
6.1.2 Motion Analysis 26
6.2 Implementation Details 26
6.3 Weakly-supervised Active Learning with Cross-Domain Setting 27
6.4 Combined with adversarial Domain Adaptation 31
7 Conclusion 34
References 35

[1] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” arXiv preprint arXiv:1612.01925, 2016. ix, 17, 26
[2] B. Settles, “Active learning literature survey,” 2010. 1
[3] O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” in ICLR, 2018. 1, 6
[4] K. Fragkiadaki, W. Zhang, G. Zhang, and J. Shi, “Two-granularity tracking: Mediating trajectory and detection graphs for tracking under occlusions,” in ECCV, 2012. 4, 25
[5] T. Zhao and R. Nevatia, “Stochastic human segmentation from a static camera,” in Motion and Video Computing, Workshop, 2002. 5
[6] T. Zhao and R. Nevatia, “Bayesian human segmentation in crowded situations,” in CVPR, 2003. 5
[7] T. V. Spina, M. Tepper, A. Esler, V. Morellas, N. Papanikolopoulos, A. X. Falcão, and G. Sapiro, “Video human segmentation using fuzzy object models and its application to body pose estimation of toddlers for behavior studies,” arXiv preprint arXiv:1305.6918, 2013. 5
[8] C. Song, Y. Huang, Z. Wang, and L. Wang, “1000fps human segmentation with deep convolutional neural networks,” in ACPR, IEEE, 2015. 5
[9] Y. Tan, Y. Guo, and C. Gao, “Background subtraction based level sets for human segmentation in thermal infrared surveillance systems,” Infrared Physics & Technology, vol. 61, pp. 230–240, 2013. 5
[10] F. He, Y. Guo, and C. Gao, “Human segmentation of infrared image for mobile robot search,” Multimedia Tools and Applications, pp. 1–14, 2017. 5
[11] R. Dragon, B. Rosenhahn, and J. Ostermann, “Multi-scale clustering of frame-to-frame correspondences for motion segmentation,” in ECCV, Springer, 2012. 6
[12] P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 6, pp. 1187–1200, 2014. 6
[13] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in CVPR, IEEE, 2009. 6

[14] M. Y. Yang, H. Ackermann, W. Lin, S. Feng, and B. Rosenhahn, “Motion segmentation via global and local sparse subspace optimization,” arXiv preprint arXiv:1701.06944, 2017. 6
[15] J. Yan and M. Pollefeys, “A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate,” in ECCV,
Springer, 2006. 6
[16] N. Shankar Nagaraja, F. R. Schmidt, and T. Brox, “Video segmentation with just a few strokes,” in ICCV, 2015. 6
[17] Y.-H. Tsai, M.-H. Yang, and M. J. Black, “Video segmentation via object flow,” in
CVPR, 2016. 6
[18] J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video object segmentation and optical flow,” in ICCV, 2017. 6
[19] R. I. Yarin Gal and Z. Ghahramani, “Deep bayesian active learning with image data,” in ICML, 2017. 6
[20] S. R. Colwell and A. W. Joshi, “Multi-item scale development for measuring institutional pressures in the context of corporate environmental action,” in IABS, 2009. 6
[21] K. Brinker, “Incorporating diversity in active learning with support vector machines,” in ICML, 2003. 6
[22] M. Ducoffe and F. Precioso, “Adversarial active learning for deep networks: a margin based approach,” arXiv preprint arXiv:1802.09841, 2018. 6
[23] R. G. Xianglin Li and J. Cheng, “Incorporating incremental and active learning for scene classification,” in ICMLA, 2012. 6
[24] A. Y. Ehsan Elhamifar, Guillermo Sapiro and S. S. Sastry, “A convex optimization framework for active learning,” in ICCV, 2013. 6
[25] Y. Yang and M. Loog, “A variance maximization criterion for active learning,”
arXiv preprint arXiv:1706.07642, 2017. 6
[26] E. R. A. P. Christoph Kading, Alexander Freytag and J. Denzler, “Large-scale active learning with approximations of expected model output changes,” in GCPR, 2016. 6
[27] A. Kuwadekar and J. Neville, “Relational active learning for joint collective classification models,” in ICML, 2011. 6
[28] J. H. B. Sujoy Paul and A. Roy-Chowdhury, “Non-uniform subset selection for active learning in structured data,” in CVPR, 2017. 6
[29] M. Fang, Y. Li, and T. Cohn, “Learning how to active learn: A deep reinforcement learning approach,” in EMNLP, 2017. 6
[30] A. S. Philip Bachman and A. Trischler, “Learning algorithms for active learning,” in ICML, 2017. 6

[31] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” in ICCV, 2015. 7
[32] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in ICML, 2015. 7
[33] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz, “Central moment discrepancy (cmd) for domain-invariant representation learn- ing,” in ICLR, 2017. 7
[34] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014. 7
[35] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in NIPS, 2016. 7
[36] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in ICML, 2015. 7, 11
[37] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” arXiv preprint arXiv:1702.05464, 2017. 7
[38] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” in NIPS, 2016. 7, 25, 31
[39] J. Hoffman, D. Wang, F. Yu, and T. Darrell, “Fcns in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv:1612.02649, 2016. 7
[40] Y.-H. Chen, W.-Y. Chen, Y.-T. Chen, B.-C. Tsai, Y.-C. F. Wang, and M. Sun, “No more discrimination: Cross city adaptation of road scene segmenters,” in ICCV, 2017. 7, 10, 11, 12, 25, 31
[41] Y. Zhang, P. David, and B. Gong, “Curriculum domain adaptation for semantic segmentation of urban scenes,” in ICCV, 2017. 7
[42] S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, “Unsupervised domain adaptation for semantic segmentation with gans,” arXiv preprint arXiv:1711.06969, 2017. 7
[43] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015. 8
[44] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, Springer, 2015. 8, 26
[45] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol. 1.
MIT press Cambridge, 1998. 12, 27
[46] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3-4, pp. 229–256, 1992. 13

[47] T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” TPAMI, vol. 33, no. 3, pp. 500–513, 2011. 17
[48] S. C. J. Weston and A. B. M. networks., “Bordes. memory networks.,” in ICLR, 2015. 19
[49] J. Oh, V. Chockalingam, S. Singh, and H. Lee, “Control of memory, active perception, and action in minecraft,” in ICML, 2016. 19
[50] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes challenge: A retrospective,” IJCV, vol. 111, no. 1, pp. 98–136, 2015. 26
[51] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014. 26
[52] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015. 26
[53] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and
C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014. 27

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文