帳號:guest(18.119.131.10)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張仕修
作者(外文):Chang, Shih-Hsiu
論文名稱(中文):基於跨視角一致性最佳化之適配傳統卷積網路於全景影像技術
論文名稱(外文):Adapting Perspective-domain CNNs to 360 Panoramic Imagery via Cross-viewpoint Consistency Optimization
指導教授(中文):朱宏國
指導教授(外文):Chu, Hung-Kuo
口試委員(中文):姚智原
胡敏君
口試委員(外文):Yao, Chih-Yuan
Hu, Min-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062524
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:33
中文關鍵詞:360圖像感知深層神經網路自我監督學習密集估測領域自適應
外文關鍵詞:360perceptionDeepneuralnetworkSelf-supervisedlearningDenseestimationDomainadaptation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:622
  • 評分評分:*****
  • 下載下載:1
  • 收藏收藏:0
在現今飛快發展的多媒體影音技術之下,360全景影像和影片充分展現其在進階電腦視覺或圖學的應用上擁有極大的潛力,尤其是在機器人開發,自駕車技術或著是VR與AR的發展。然而,360全景影像雖然容易取得,但標註完善的影像資料卻極其稀少,這使得仰賴大規模的標註資料的深度學習與傳統捲積網路技術在360全景影像上發展受限。相形之下,一般傳統的影像在大量資料的幫助之下,已經有相當多訓練完善的深度捲積網路可以利用。基於這點,此論文提出一個簡單且有效的方法能快速將訓練好的網路模型(訓練在傳統影像上)適配到360全景領域下。此論文提出的方法關鍵的創新點在於利用360全景中的跨視角重疊區域的一致性來進行最佳化,其中使用未標註過的360全景資料庫來進行模型訓練,以此方法將原本在傳統透視圖像領域下訓練好的網路適配到360全景圖像資料庫上。在最佳化訓練過程中,我們提出的目標函數(objective function) 由以下兩個部分組成:i) 藉由從360全景影像中以不同視角產生的多個傳統一般影像(perspective views) 產生模型的預測結果(predictions) 來計算重疊部分的數值一致性,並且為了避免單純的一致性優化所造成的數值錯誤(數值皆為單一常數),因此ii) 訓練中同時會使用最佳化前的原始預測結果來做正規化。此論文中將闡述且驗證提出的跨視角一致性最佳化方法
的有效性,其中我們實作在三種不同的室內密集預測任務上:(1) 深度估測,(2) 表面法向量估測,以及(3) 具語意的圖像分割。從論文的實驗結果中可得知提出的方法,搭配不同的全景領域適配方法都能夠提供顯著的正確性提升。
360 images and videos have shown great potential in a variety of advanced vision and graphics applications such as robotics, autonomous driving, and virtual/augmented reality.
However, the lack of large-scale, diverse annotated 360 datasets severely hinder developing general deep convolutional neural networks (CNNs), which are typically data hungry. In
contrast, there is a huge body of CNNs well-trained on massive labeled perspective images. To bridge this gap, we present a simple yet effective way to quickly adapt a model (pre-trained
on perspective images) to 360 imagery. The key technical novelty lies in a cross-viewpoint consistency optimization for re-training the source domain model merely using the unlabeled 360 images. The objective function is tailored to i) capture the consistency among the model prediction on multiple sampled views, while ii) being regularized using the original prediction to avoid numerical collapsing during the training. We demonstrate the effectiveness of our cross-viewpoint consistency framework on three dense prediction tasks: 1) depth prediction, 2) surface normal estimation and 3) semantic segmentation of indoor scenes. The results indicate that our method provides consistent accuracy improvement in five different approaches of adapting models from perspective to 360 images.
中文摘要i
Abstract ii
目錄iii
圖目錄iv
表目錄vii
1 緒論1
2 相關研究-----------------------4
360 全景感知-----------------------4
領域自適應及知識蒸餾-----------------------5
關於一致性之自我監督式學習-----------------------6
3 跨視角一致性-----------------------7
3.1 等距長方投影與透視投影-----------------------7
3.2 跨視角之一致性-----------------------9
4 最佳化-----------------------10
4.1 方法概觀-----------------------10
4.2 損失函數-----------------------12
4.3 漸進式採樣-----------------------13
5 實驗結果與討論-----------------------15
5.1 適配全景領域-----------------------15
5.2 深度資訊估測-----------------------16
5.3 表面法向量估測-----------------------22
5.4 具語意之圖像分割-----------------------23
5.5 對比實驗-----------------------27
6 結論-----------------------29
Bibliography-----------------------30
[1] Angel X. Chang, Angela Dai, Thomas A. Funkhouser, Maciej Halber, Matthias Nießner,
Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from
rgb-d data in indoor environments. 2017.
[2] Keisuke Tateno, Nassir Navab, and Federico Tombari. Distortion-aware convolutional filters
for dense prediction in panoramic images. In ECCV, 2018.
[3] Yu-Chuan Su and Kristen Grauman. Flat2sphere: Learning spherical convolution for fast
features from 360° imagery. In NIPS, 2017.
[4] Benjamin Coors, Alexandru Paul Condurache, and Andreas Geiger. Spherenet: Learning
spherical representations for detection and classification in omnidirectional images. In
ECCV, 2018.
[5] Fucheng Deng, Xiaorui Zhu, and Jiamin Ren. Object detection on panoramic images based
on deep learning. 2017.
[6] Jinsong Zhang and Jean-François Lalonde. Learning high dynamic range from outdoor
panoramas. In IEEE ICCV, 2017.
[7] Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, and
Min Sun. Cube padding for weakly-supervised saliency prediction in 360° videos. In IEEE
CVPR, 2018.
[8] Ziheng Zhang, Yanyu Xu, Jingyi Yu, and Shenghua Gao. Saliency detection in 360° videos.
In ECCV, 2018.
[9] Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, and Min
Sun. Deep 360 pilot: Learning a deep agent for piloting through 360° sports video. In
IEEE CVPR, 2017.
[10] Wei-Sheng Lai, Yujia Huang, Neel Joshi, Christopher M Bühler, Ming-Hsuan Yang, and
Sing Bing Kang. Semantic-driven generation of hyperlapse from 360 degree video. TVCG,
2018.
[11] Yu-Chuan Su and Kristen Grauman. Making 360° video watchable in 2d: Learning videography
for click free viewing. CVPR, 2017.
[12] Yu-Chuan Su, Dinesh Jayaraman, and Kristen Grauman. Pano2vid: Automatic cinematography
for watching 360° videos. 2016.
[13] Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. Omnidepth:
Dense depth estimation for indoors spherical panoramas. In ECCV, 2018.
[14] Iro Armeni, Sasha Sax, Amir R. Zamir, and Silvio Savarese. Joint 2D-3D-Semantic Data
for Indoor Scene Understanding. arXiv e-prints, 2017.
[15] Yinda Zhang, Shuran Song andvPing Tan, and Jianxiong Xiao. Panocontext: A wholeroom
3d context model for panoramic scene understanding. In ECCV, 2014.
[16] J. Xu, B. Stenger, T. Kerola, and T. Tung. Pano2cad: Room layout from a single panorama
image. 2017.
[17] Chuhang Zou, Alex Colburn, Qi Shan, and Derek Hoiem. Layoutnet: Reconstructing the
3d room layout from a single rgb image. In CVPR, 2018.
[18] Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, and Hung-Kuo
Chu. Dula-net: A dual-projection network for estimating room layouts from a single rgb
panorama. In IEEE CVPR, 2019.
[19] Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. 2018.
[20] Chiyu Max Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, and
Matthias Niessner. Spherical CNNs on unstructured grids. 2019.
[21] Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from
internet photos. In IEEE CVPR, 2018.
[22] Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab.
Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV),
2016 Fourth International Conference on, pages 239–248. IEEE, 2016.
[23] David Eigen and Rob Fergus. Predicting depth, surface normals and semantic labels with
a common multi-scale convolutional architecture. In IEEE ICCV, 2015.
[24] Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, and Wen Li.
Deep reconstruction-classification networks for unsupervised domain adaptation. In ECCV,
2016.
[25] Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learning. CoRR,
abs/1605.09782, 2016.
[26] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative
domain adaptation. IEEE CVPR, 2017.
[27] Antonio Torralba and Alexei A. Efros. Unbiased look at dataset bias. CVPR, 2011.
[28] Amir Atapour-Abarghouei and Toby P. Breckon. Real-time monocular depth estimation
using synthetic data with domain adaptation via image style transfer. IEEE CVPR, 2018.
[29] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Image style transfer using
convolutional neural networks. IEEE CVPR, 2016.
[30] Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, and Jonathon Shlens.
Exploring the structure of a real-time, arbitrary neural artistic stylization network. 2017.
[31] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style
transfer and super-resolution. In ECCV, 2016.
[32] Grégoire Payen de La Garanderie, Amir Atapour Abarghouei, and Toby P. Breckon. Eliminating
the blind spot: Adapting 3d object detection and monocular depth estimation to
360° panoramic imagery. In ECCV, 2018.
[33] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time
object detection with region proposal networks. In NIPS, 2015.
[34] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta,
and Yoshua Bengio. Fitnets: Hints for thin deep nets. 2015.
[35] Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural
network. CoRR, abs/1503.02531, 2015.
[36] Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation:
Fast optimization, network minimization and transfer learning. In IEEE CVPR, 2017.
[37] Saurabh Gupta, Judy Hoffman, and Jitendra Malik. Cross modal distillation for supervision
transfer. 2016.
[38] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image
translation using cycle-consistent adversarial networks. IEEE ICCV, 2017.
[39] Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsupervised monocular
depth estimation with left-right consistency. In IEEE CVPR, 2017.
[40] Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. Unsupervised learning
of depth and ego-motion from video. In IEEE CVPR, 2017.
[41] Yuliang Zou, Zelun Luo, and Jia-Bin Huang. Df-net: Unsupervised joint learning of depth
and flow using cross-task consistency. In ECCV, 2018.
[42] Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. Learning
category-specific mesh reconstruction from image collections. In ECCV, 2018.
[43] Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation
and support inference from rgbd images. In ECCV, 2012.
(此全文限內部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *