在360世界中行走：從單張全景圖合成全景視差__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.96) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	許瀞予
作者(外文):	Hsu, Ching-Yu
論文名稱(中文):	在360世界中行走：從單張全景圖合成全景視差
論文名稱(外文):	Moving in a 360 World: Synthesizing Panoramic Parallaxes from a Single Panorama
指導教授(中文):	陳煥宗
指導教授(外文):	Chen, Hwann-Tzong
口試委員(中文):	劉庭祿邱維辰
口試委員(外文):	Liu, Tyng-Luh Chiu, Wei-Chen
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	108062513
出版年(民國):	110
畢業學年度:	109
語文別:	中文
論文頁數:	34
中文關鍵詞:	全景圖、360影像、新視野合成
外文關鍵詞:	panorama、360、novel、view、synthesis
相關次數:	推薦:0 點閱:386 評分: 下載:0 收藏:0

我們提出 Omnidirectional Neural Radiance Fields (OmniNeRF)，是第一個可使用在全景圖上的新視角合成應用。近期在新視野合成領域中的作品主要都關注在透視影像的合成，但透視影像會受限於有限的視野和需要在特定條件下足夠數量的影像資訊。相比之下，只要給予一張 360 影像作為訓練資料，OmniNeRF 就可以在未知的視野下生成全景圖。因此，我們提出一種資料擴充的方法，將單張的全景圖從 3D 世界投影到在不同場景位置的 2D 平面座標系。利用這樣的方式，我們可以用在固定相機位置往 360 度全方向可視的像素去最佳化 Omnidirectional Neural Radiance Field，以估算在任意相機地點所看到的場景。總結而言，我們提出的 OmniNeRF 產生全新視野的全景圖時，可以輸出具說服力的影像，並且能夠產生雙眼視差的效果。我們在合成與真實世界的資料集中驗證我們的結果。

We present Omnidirectional Neural Radiance Fields (OmniNeRF), the first method to the application of parallax-enabled novel panoramic view synthesis. Recent works for novel view synthesis focus on perspective images with limited field-of-view and require sufficient pictures captured in a specific condition. Conversely, OmniNeRF can generate panorama images for unknown viewpoints given a single equirectangular image as training data. To this end, we propose to augment the single RGB-D panorama by projecting back and forth between a 3D world and different 2D panoramic coordinates at different virtual camera positions. By doing so, we are able to optimize an Omnidirectional Neural Radiance Field with visible pixels collecting from omnidirec tional viewing angles at a fixed center for the estimation of new viewing angles from varying camera positions. As a result, the proposed OmniNeRF achieves convincing renderings of novel panoramic views that exhibit the parallax effect. We showcase the effectiveness of each of our proposals on both synthetic and real-world datasets.

List of Tables 4
List of Figures 5
摘要 8
Abstract 9
1 Introduction 10
2 Related work 14
2.1 Novel view synthesis ...................................... 14
2.2 Neural 3D representation.................................... 15
3 Proposed Framework 16
3.1 Generating training samples .................................. 16
3.2 Visibility ............................................ 18
3.3 Concatenating multiple panoramas............................... 18
3.4 Regressing with gradient .................................... 19 3.5 Optimization .......................................... 19
4 Experiments 21
4.1 Datasets............................................. 21
4.2 Implementation details ..................................... 22
4.2.1 Training protocol.................................... 22
4.2.2 Evaluation protocol................................... 22
4.3 Comparison with baselines ................................... 22
4.3.1 Single image training.................................. 22
4.3.2 Interpolation from layout................................ 23
4.4 Ablation study.......................................... 24
4.5 Comparison with ground truth nearby view .......................... 25 4.6 Qualitative results........................................ 25
4.7 Multi-modal layout prediction ................................. 27
4.8 Limitation............................................ 27
5 Conclusion 29
Bibliography 30

[1] D.Anguelov,C.Dulong,D.Filip,C.Früh,S.Lafon,R.Lyon,A.S.Ogale,L.Vincent, and J. Weaver. Google street view: Capturing the world at street level. Computer, 43(6):32–38, 2010.
[2] A. Badki, O. Gallo, J. Kautz, and P. Sen. Meshlet priors for 3d mesh reconstruction. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 2846–2855. IEEE, 2020.
[3] A. X. Chang, A. Dai, T. A. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from RGB-D data in indoor environments. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pages 667–676. IEEE Computer Society, 2017.
[4] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. Atlasnet: A papier- mâché approach to learning 3d surface generation. CoRR, abs/1802.05384, 2018.
[5] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cam- bridge University Press, 2004.
[6] A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik. Learning category-specific mesh reconstruction from image collections. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, volume 11219 of Lecture Notes in Computer Science, pages 386–402. Springer, 2018.
[7] H. Kato, Y. Ushiku, and T. Harada. Neural 3d mesh renderer. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 3907–3916. IEEE Computer Society, 2018.
[8] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[9] A. P. S. Kohli, V. Sitzmann, and G. Wetzstein. Semantic implicit neural scene representations with semi-supervised training. In V. Struc and F. G. Fernández, editors, 8th International Conference on 3D Vision, 3DV 2020, Virtual Event, Japan, November 25-28, 2020, pages 423–433. IEEE, 2020.
[10] Z. Li, T. Dekel, F. Cole, R. Tucker, N. Snavely, C. Liu, and W. T. Freeman. Learning the depths of moving people by watching frozen people. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4521–4530. Computer Vision Foundation / IEEE, 2019.
[11] Z. Li, W. Xian, A. Davis, and N. Snavely. Crowdsampling the plenoptic function. In Proc. European Conference on Computer Vision (ECCV), 2020.
[12] J. Lu, Y. Yang, R. Liu, S. B. Kang, and J. Yu. 2d-to-stereo panorama conversion using GAN and concentric mosaics. IEEE Access, 7:23187–23196, 2019.
[13] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. CoRR, abs/2008.02268, 2020.
[14] N. L. Max. Optical models for direct volume rendering. IEEE Trans. Vis. Comput. Graph., 1(2):99–108, 1995.
[15] M.Meshry,D.B.Goldman,S.Khamis,H.Hoppe,R.Pandey,N.Snavely,and R.Mar tin-Brualla. Neural rerendering in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 6878–6887. Computer Vision Foundation / IEEE, 2019.
[16] B.Mildenhall,P.P.Srinivasan,R.O.Cayon,N.K.Kalantari,R.Ramamoorthi,R.Ng, and A. Kar. Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph., 38(4):29:1–29:14, 2019.

[17] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
[18] M. Niemeyer, L. M. Mescheder, M. Oechsle, and A. Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 3501–3512. IEEE, 2020.
[19] K.Park,U.Sinha,J.T.Barron,S.Bouaziz,D.B.Goldman,S.M.Seitz,andR.Martin- Brualla. Deformable neural radiance fields, 2020.
[20] E. Penner and L. Zhang. Soft 3d reconstruction for view synthesis. ACM Trans. Graph., 36(6):235:1–235:11, 2017.
[21] F. Pittaluga, S. J. Koppal, S. B. Kang, and S. N. Sinha. Revealing scenes by inverting structure from motion reconstructions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 145–154. Computer Vision Foundation / IEEE, 2019.
[22] T. K. Porter and T. Duff. Compositing digital images. In H. Christiansen, editor, Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1984, Minneapolis, Minnesota, USA, July 23-27, 1984, pages 253–259. ACM, 1984.
[23] A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer. D-nerf: Neural radi- ance fields for dynamic scenes, 2020.
[24] M. Shih, S. Su, J. Kopf, and J. Huang. 3d photography using context-aware layered depth inpainting. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8025–8035. IEEE, 2020.
[25] H. Shum, S. Chan, and S. B. Kang. Image-based rendering. Springer, 2007.
[26] H. Shum and L. He. Rendering with concentric mosaics. In W. N. Waggenspack,
editor, Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1999, Los Angeles, CA, USA, August 8-13, 1999, pages 299–306. ACM, 1999.
[27] H.Shum,K.T.Ng,and S.Chan.Avirtualrealitysystemusingtheconcentricmosaic: construction, rendering, and data compression. IEEE Trans. Multim., 7(1):85–95, 2005.
[28] P. P. Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely. Pushing the boundaries of view extrapolation with multiplane images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 175–184. Computer Vision Foundation / IEEE, 2019.
[29] C. Sun, C. Hsiao, M. Sun, and H. Chen. Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 1047–1056. Computer Vision Foundation / IEEE, 2019.
[30] M. Tancik, P. P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, and R. Ng. Fourier features let networks learn high frequency functions in low dimensional domains. CoRR, abs/2006.10739, 2020.
[31] M. Tatarchenko, A. Dosovitskiy, and T. Brox. Multi-view 3d models from single images with a convolutional network. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII, volume 9911 of Lecture Notes in Computer Science, pages 322–337. Springer, 2016.
[32] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y. Jiang. Pixel2mesh: Generating 3d mesh models from single RGB images. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI, volume 11215 of Lecture Notes in Computer Science, pages 55–71. Springer, 2018.
[33] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600– 612, 2004.

[34] K. Zhang, G. Riegler, N. Snavely, and V. Koltun. Nerf++: Analyzing and improving neural radiance fields, 2020.
[35] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 586–595. IEEE Computer Society, 2018.
[36] Y. Zhang, S. Song, P. Tan, and J. Xiao. Panocontext: A whole-room 3d context model for panoramic scene understanding. In D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, volume 8694 of Lecture Notes in Computer Science, pages 668–686. Springer, 2014.
[37] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. In Proceedings of The European Conference on Computer Vision (ECCV), 2020.
[38] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: learn- ing view synthesis using multiplane images. ACM Trans. Graph., 37(4):65:1–65:12, 2018.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文