帳號:guest(18.223.107.85)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):邱煜淵
作者(外文):Chiu, Yu-Yuan
論文名稱(中文):基於對抗式生成網路架構僅使用兩張影像合成人類動作影片
論文名稱(外文):GAN-Based Video Generation of Human Action by Using Two Frames
指導教授(中文):林嘉文
指導教授(外文):Lin, Chia-Wen
口試委員(中文):鄭旭詠
康立威
黃敬群
口試委員(外文):Cheng, Shu-Yuang
Kang, Li-Wei
Huang, Ching-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:105061634
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:32
中文關鍵詞:影片生成影片擴充
外文關鍵詞:Video predictionVideo generation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:571
  • 評分評分:*****
  • 下載下載:15
  • 收藏收藏:0
影片生成在電腦視覺領域中一直是一個很重要的課題,這門技術主要是希望能利用給定的少量影像或雜訊來生成出一段完整且連續的影片,而這將可以應用在資料集的擴充或是在傳輸影片中降低其每秒顯示張數 (fps) 的議題上。在過往的經驗中,大部分的方法都會在訓練網絡時添加一些其他的資訊,例如: 語意分割 (semantic segmentations) 、影片的光流 (optical flow) 或是深度 (depth) 等,而這些資訊在現實生活中都是比較難收集的,若不添加這些額外的資訊,結果則很有可能隨著生成時間的拉長而導致影片中正在動的物體越來越模糊,最後導致發散不見,因此我們希望能改善此種現象。
在本篇論文中,我們發現傳統方法與近代方法各有一些屬於它們的優缺點,所以我們希望能夠取得它們各自的優點來彌補另一方的缺點,所以我們結合了傳統方法內插 (interpolation) 與近幾年比較火紅的對抗式生成網路 (Generative Adversarial Network) 來設計並完成我們網路的架構,此架構可以單純利用影片中的前後兩張影像且不添加其他資訊來進行影片生成,而在我們的網路架構中可以分解成四個小步驟: 1. 人物關節點熱圖 (heat map) 抓取。 2. 利用內插生成部分關節點熱圖。 3. 利用對抗式網路與步驟二的結果生成更完整的熱圖影片。 4. 將影片的前景背景貼在生成出的影片上。
經實驗結果證明,本篇架構在不添加任何資訊的影片預測中,與其他相同條件的論文比較能夠產生出更清楚且目標物離它初始起點移動得更遠的結果。
Video generation is very import issue in computer vision. The goal of video generation is that the video can be generated by given a few frames or some noises. So that the skill can be applied to data augmentation or enable a decrease in the need of frames per second (fps) of videos when transporting. In the past, methods, that were taken required ground truth annotation of other information (e.g., semantic segmentations, video’s optical flow, or depth) at training time. If annotations were omitted, the objects generated in the videos would disperse, or become blurred over time. Considering the fact that those information are very difficult to obtain in the real world. The purpose of the thesis is to overcome this difficulty.
This thesis presents a new architecture which could generate the video from the first and the last frame. In the architecture, traditional method, interpolation, and deep learn method, generate adversarial network (GAN) are combined. In our method, it could be separated into four stages. First, using [1] to extract the joints of a person in frames. Second, we use the interpolation and some joints, not all, to generate the heat map video. Third, generating more complete heat map video by GAN. Last but not least, employing [2] to generate the appearance of the object in the video. The details of each stage would be explained in the following thesis.
Experimental results show that the videos generated by our method without giving another information are more realistic and the object in the videos could move farer from original location as well comparing to other experiments under the same conditions done before.
目錄
摘 要 ii
Abstract iii
Chapter 1 Introduction 6
1.1 Research Background 6
1.2 Motivation 6
1.3 Thesis Organization 9
Chapter 2 Related Work 10
2.1 Using Other Information 10
2.2 Using noise 11
2.3 Using one or a few of frames 12
Chapter 3 Proposed method 14
3.1 Overview 14
3.2 Extract Human Skeleton 14
3.3 Single Pose Completion 15
3.4 Sequence Completion and Interpolation 17
3.5 Paste Appearance 19
3.6 Implementation Details 22
Chapter 4 Experiments and Discussion 23
4.1 Dataset 23
4.2 Performance Evaluation 23
4.3 Visual Results 25
Chapter 5 Conclusion 29
References 30
[1] M. Mathieu, C, Couprie, and Y. LeCun. Deep multi-scale video prediction beyond mean square error. In arXiv, 2015.
[2] C. Vondrick. H, Pirsiavash, and A. Torralba. Generating videos with scene dynamics. In Neural Information Processing Systems (NIPS), 2016.
[3] J. Walker, C. Doersch. A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. In European Conference on Computer Vision (ECCV), 2016.
[4] Xiaodan Liang, Lisa Lee, Wei Dai, Eric P. Xing. Dual motion GAN for future-flow embedded video prediction. In IEEE International Conference on Computer Vision (ICCV), 2017.
[5] Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada. Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture. In Association for the Advancement of Artificial Intelligence (AAAI), 2018.
[6] Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann LeCun. Predicting Deeper into the Future of Semantic Segmentation. In IEEE International Conference on Computer Vision (ICCV), 2017.
[7] Masaki Saito, Eiichi Matsumoto, Shunta Saito. Temporal Generative Adversarial Nets with Singular Value Clipping. In IEEE International Conference on Computer Vision (ICCV), 2017.
[8] Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang. Deep Video Generation, Prediction and Completion of Human Action Sequences. In European Conference on Computer Vision (ECCV), 2018.
[9] Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz. MoCoGAN: Decomposing Motion and Content for Video Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[10] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Nets. In Neural Information Processing Systems (NIPS), 2014.
[11] Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan. Predicting Scene Parsing and Motion Dynamics in the Future. In Neural Information Processing Systems (NIPS), 2017.
[12] Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] Ishaan Gulrajani1, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville. Improved Training of Wasserstein GANs. In Computing Research Repository (CoRR), 2017.
[14] Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi and Dahua Lin. Pose Guided Human Video Generation. In European Conference on Computer Vision (ECCV), 2018.
[15] Chelsea Finn, Ian Goodfellow, Sergey Levine. Unsupervised Learning for Physical Interaction through Video Prediction. In Neural Information Processing Systems (NIPS), 2016.
[16] Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy Campbell and Sergey Levine. Stochastic Variational Video Prediction. In International Conference on Learning Representations (ICLR), 2018.
[17] Emily Denton, Rob Fergus. Stochastic Video Generation with a Learned Prior. In International Conference on Learning Representations (ICML), 2018.
[18] Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee. Hierarchical Long-term Video Prediction without Supervision. In International Conference on Learning Representations (ICML), 2018.
[19] Ruben Villegas, Jimei Yang, Yuliang Zon, Sungryull Sohn, Xunyu Lin, Honglak LeeIn. Learning to Generate Long-term Future via Hierarchical Prediction. In International Conference on Learning Representations (ICML), 2017.
[20] Catalin Ionescu, Dragos Papava, Vlad Olaru, Cristian Sminchisescu. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7):1358-1339, jul 2014.
[21] Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, Luc Van Gool. Pose Guided Person Image Generation. In Neural Information Processing Systems (NIPS), 2017.
[22] Tran Minh Quan, David G. C. Hildebrand, and Won-Ki Jeong. FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv, 1612.05360, 2016.
[23] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, ´ Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi Twitter. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[24] Vinod Nair, Geoffrey E. Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on Learning Representations (ICML), 2010.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *