帳號:guest(3.142.174.13)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):洪澤厚
作者(外文):Hung, Tse-Hou
論文名稱(中文):運用深度學習方法最佳化沉浸式影片編碼設定
論文名稱(外文):Optimizing Immersive Video Coding Configurations Using Deep Learning Approaches
指導教授(中文):徐正炘
指導教授(外文):Hsu, Cheng-Hsin
口試委員(中文):胡敏君
彭文孝
口試委員(外文):Hu, Min-Chun
Peng, Wen-Hsiao
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:108065534
出版年(民國):110
畢業學年度:110
語文別:英文
論文頁數:67
中文關鍵詞:虛擬實境擴增實境擴展實境畫面合成串流六自由度
外文關鍵詞:Virtual realityaugmented realityextended realityview synthesisstreaming6DoF
相關次數:
  • 推薦推薦:0
  • 點閱點閱:451
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
沉浸式視頻流技術通過為用戶提供更直觀的方式在模擬世界中移動,例如使用六自由度 (6DoF)互動模式,改善了虛擬現實 (VR) 用戶體驗。實現 6DoF 的一種簡單方法是根據用戶的移動在許多不同的位置和方向部署攝像頭,不幸的是,這既昂貴又繁瑣且效率低下。實現 6DoF 交互的更好解決方案是從有限數量的源視圖中即時合成目標視圖。雖然最近的沉浸式視頻測試模型 (TMIV) 編解碼器支持這種視圖合成,但 TMIV 需要手動選擇編碼配置,無法在視頻品質、解碼時間和頻寬消耗之間進行權衡。在本文中,我們研究了 TMIV 的局限性,並通過在巨大的搜尋空間中尋找最優配置來解決其配置優化問題。我們首先確定 TMIV 配置中的關鍵參數。然後,我們從兩個不同的方面介紹了兩種基於神經網絡的算法針對兩種問題:(i) 卷積神經網絡 (CNN) 算法解決回歸問題和 (ii) 深度強化學習 (DRL) 算法解決決策問題。我們進行了客觀和主觀實驗,以在兩個不同的數據集上評估 CNN 和 DRL 算法:透視和等距柱狀投影數據集。客觀評估表明,這兩種算法都顯著優於默認配置。對於透視(等距柱狀)投影數據集,所提出的算法平均只需要23\%(95\%)個解碼時間,傳送23\%(79\%)的視圖,並且將效用提高73\%(6\%)。主觀評估證實,與默認和最佳 TMIV 配置相比,所提出的算法消耗更少的資源,同時實現可比的體驗質量 (QoE)。
Immersive video streaming technologies improve Virtual Reality (VR) user experience by providing users with more intuitive ways to move in simulated worlds, e.g., with 6 Degree-of-Freedom (6DoF) interaction mode. A naive method to achieve 6DoF is deploying cameras in numerous different positions and orientations that may be required based on users' movements, which unfortunately is expensive, tedious, and inefficient. A better solution for realizing 6DoF interactions is to synthesize target views on-the-fly from a limited number of source views. While such view synthesis is enabled by the recent Test Model for Immersive Video (TMIV) codec, TMIV dictates manually-composed configurations, which cannot exercise the tradeoff among video quality, decoding time, and bandwidth consumption. In this thesis, we study the limitations of TMIV and solve its configuration optimization problem by searching for the optimal configuration in a huge configuration space. We first identify the critical parameters of the TMIV configurations. Then, we introduce two Neural Network (NN)-based algorithms from two heterogeneous aspects: (i) a Convolutional Neural Network (CNN) algorithm solving a regression problem and (ii) a Deep Reinforcement Learning (DRL) algorithm solving a decision making problem, respectively. We conduct both objective and subjective experiments to evaluate the CNN and DRL algorithms on two diverse datasets: a perspective and an equirectangular projection dataset. The objective evaluations revealed that both algorithms significantly outperformed the default configurations. In particular, with the perspective (equirectangular) projection dataset, the proposed algorithms only required 23\% (95\%) decoding time, streamed 23\% (79\%) of views, and improved the utility by 73\% (6\%) on average. The subjective evaluations confirm that the proposed algorithms consume fewer resources while achieving comparable Quality of Experience (QoE) than the default and the optimal TMIV configurations.
致謝i
Abstract ii
中文摘要iii
1 Introduction 1
1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 From 2D Video Toward 360◦ Video . . . . . . . . . . . . . . . . . . . . 5
2.2 3D Representation for Realizing 6DoF Interaction . . . . . . . . . . . . . 6
2.3 View Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Recent Standard Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Related Work 13
3.1 Immersive Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Machine Learning Algorithms for Optimizing Video Streaming . . . . . . 14
3.3 Machine Learning Algorithms for Optimizing Video Coding . . . . . . . 15
4 Test Model of Immersive Video 16
4.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 The Configuration Optimization Problem 21
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 The Configuration Optimizer In the Immersive Video Codec . . . . . . . 22
6 Machine-Learning-Based Configuration Optimizers 24
6.1 The input Preprocessing Procedure . . . . . . . . . . . . . . . . . . . . . 25
6.2 Output Post-Processing Procedure . . . . . . . . . . . . . . . . . . . . . 25
6.3 The Convolutional Neural Network (CNN) Algorithm . . . . . . . . . . . 25
6.4 The Deep Reinforcement Learning (DRL) Algorithm . . . . . . . . . . . 26
6.5 Training and Testing Datasets . . . . . . . . . . . . . . . . . . . . . . . . 29
6.6 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iv
7 Objective Evaluations 32
7.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Qualitative Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.3 Quantitative Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.4 Robustness Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . 39
8 Subjective Evaluations 41
8.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9 Summary of Our Findings 46
10 Use Case: Real Estate Virtual Tour 48
10.1 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
10.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
10.3 Data Collection from the Photo-Realistic Simulator . . . . . . . . . . . . 49
10.4 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
11 Conclusion and Future Work 55
11.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Bibliography 59
[1] D. Abe, L. Marc, and D. Fredo. Unstructured Light Fields. Computer Graphics
Forum, 31:305–314, May 2012.
[2] E. Adelson and J. Bergen. The Plenoptic Function and the Elements of Early Vision,
volume 2, chapter Models for Concurrency, pages 3–20. Computational Models of
Visual Processing. Cambridge, MA: MIT Press (1991), 1991.
[3] S. Altamimi and S. Shirmohammadi. QoE-Fair DASH Video Streaming Using
Server-Side Reinforcement Learning. ACM Transacrions on Multimedia Computing,
Communications, and Applications, 16(2s):68:1–68:21, 2020.
[4] B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin. Matryodshka: Real-
Time 6DoF Video View Synthesis Using Multi-Sphere Images. In Proc. of European
Conference on Computer Vision, pages 441–459, 2020.
[5] S. Avidan and A. Shashua. Novel View Synthesis in Tensor Space. In Proc. of IEEE
International Conference on Computer Vision and Pattern Recognition (CVPR’97),
pages 1034–1040, 1997.
[6] W. Bennett, J. Neel, V. Vaibhav, T. Eino-Ville, A. Emilio, B. Adam, A. Andrew,
H. Mark, and L. Marc. High Performance Imaging Using Large Camera Arrays. In
Proc. of ACM Special Interest Group on Computer Graphics and Interactive Techniques
Conference (SIGGRAPH’05), page 765–776, 2005.
[7] A. Bentaleb, B. Taani, A. Begen, C. Timmerer, and R. Zimmermann. A Survey on
Bitrate Adaptation Schemes for Streaming Media over HTTP. IEEE Communications
Surveys Tutorials, 21(1):562–585, 2019.
[8] blender. blender. 2021. Retrieved September 30, 2021 from https://www.
blender.org/.
[9] J. Boyce, R. Dor´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh,
V. K. M. Vadakital, and L. Yu. MPEG Immersive Video Coding Standard. IEEE
Proceedings of the IEEE, 109(9):1521–1536, 2021.
[10] Y. Chang, K. Chen, C. Wu, C. Ho, and C. Lei. Online Game QoE Evaluation Using
Paired Comparisons. In Proc. of IEEE International Workshop Technical Committee
on Communications Quality and Reliability (CQR’10), pages 1–6, 2010.
[11] G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis. Depth Synthesis
and Local Warps for Plausible Image-Based Navigation. ACM Transactions on
Graphics (TOG), 32(3):1–12, 2013.
[12] S. Chen. Quicktime VR: An Image-Based Approach to Virtual Environment Navigation.
In Proc. of conference on Computer graphics and interactive techniques,
pages 29–38, 1995.
[13] S. Chen and L. Williams. View Interpolation for Image Synthesis. In Proc. of
ACM Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’
93), pages 279–288, 1993.
[14] W. Chen, Y. Chang, S. Lin, L. Ding, and L. Chen. Efficient Depth Image Based
Rendering with Edge Dependent Depth Filter and Interpolation. In Proc. of IEEE
International Conference on Multimedia and Expo, pages 1314–1317, 2005.
[15] B. Cheng, J. Yang, S. Wang, and J. Chen. Adaptive Video Transmission Control
System Based on Reinforcement Learning Approach over Heterogeneous Networks.
IEEE Transactions on Automation Science and Engineering, 12(3):1104–
1113, 2015.
[16] F. Chiariotti, S. D’Aronco, L. Toni, and P. Frossard. Online Learning Adaptation
Strategy for DASH Clients. In Proc. of ACM International Conference on Multimedia
Systems (MMSys’16), pages 8:1–8:12, 2016.
[17] C. Conti, L. Soares, and P. Nunes. Dense Light Field Coding: A Survey. IEEE
Access, 8:49244–49284, March 2020.
[18] X. Corbillon, F. Simone, G. Simon, and P. Frossard. Dynamic Adaptive Streaming
for Multi-viewpoint Omnidirectional Videos. In Proc. of ACM International Conference
on Multimedia Systems Conference (MMSys’18), pages 237–249, 2018.
[19] L. Costero, A. Iranfar, M. Zapater, F. Igual, K. Olcoz, and D. Atienza. MAMUT:
Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video
Transcoding. In Proc. of IEEE Design, Automation Test in Europe Conference Exhibition
(DATE’19), pages 558–563, 2019.
[20] I. Curcio, K. Kammachi-Sreedhar, and S. Mate. Multi-Viewpoint and Overlays in
the MPEG OMAF Standard. ITU Journal: ICT Discoveries, 3(1):17–24, 2020.
[21] P. Debevec, C. Taylor, and J. Malik. Modeling and Rendering Architecture from
Photographs: A Hybrid Geometry-and Image-Based Approach. In Proc. of conference
on Computer graphics and interactive techniques, pages 11–20, 1996.
[22] R. Dor´e and G. Lafruit. Updated Call for Test Materials for 3DoF+ Visual.
International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG2018/N17617, 2018.
[23] A. Dziembowski, J. Samelak, and M. Doma´nski. View Selection for Virtual View
Synthesis in Free Navigation Systems. In Proc. of IEEE International Conference
on Signals and Electronic Systems (ICSES’18), pages 83–87, 2018.
[24] EPIC Games. Unreal Engine. 2021. Retrieved August 29, 2021 from https:
//www.unrealengine.com/en-US/.
[25] EPIC Games. Unreal Engine Marketplace. 2021. Retrieved August 29, 2021 from
https://www.unrealengine.com/marketplace/en-US/store.
[26] A. Eslami, J. Rezende, F. Besse, F. Viola, A. Morcos, M. Garnelo, A. Ruderman,
A. Rusu, I. Danihelka, K. Gregor, et al. Neural Scene Representation and Rendering.
Science, 360(6394):1204–1210, 2018.
[27] C. Fan,W. Lo, Y. Pai, and C. Hsu. A Survey on 360° Video Streaming: Acquisition,
Transmission, and Display. ACM Computing Surveys, 52(4):71:1–71:36, 2019.
[28] C. Fehn. Depth-Image-Based Rendering (DIBR), Compression, and Transmission
for a New Approach on 3D-TV. In Proc. SPIE 5291, Stereoscopic Displays and
Virtual Reality Systems XI, pages 93–104, 2004.
[29] J. Fleureau, B. Chupeau, F. Thudor, G. Briand, T. Tapie, and R. Dor´e. An Immersive
Video Experience with Real-Time View Synthesis Leveraging the Upcoming MIV
Distribution Standard. In Proc. of IEEE International Conference on Multimedia &
Expo Workshops (ICMEW), pages 1–2, 2020.
[30] J. Fu, X. Chen, Z. Zhang, S.Wu, and Z. Chen. 360SRL: A Sequential Reinforcement
Learning Approach for ABR Tile-Based 360 Video Streaming. In Proc. of IEEE International
Conference on Multimedia and Expo (ICME’19), pages 290–295, 2019.
[31] M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. D-DASH: A Deep Q-Learning
Framework for DASH Video Streaming. IEEE Transactions on Cognitive Communications
and Networking, 3(4):703–718, 2017.
[32] G. Gescheider. Psychophysics: the Fundamentals. Psychology Press, 2013.
[33] R. Ghaznavi-Youvalari and A. Aminlou. Geometry-Based Motion Vector Scaling
for Omnidirectional Video Coding. In Proc. of IEEE International Symposium on
Multimedia (ISM), pages 127–130, 2018.
[34] A. Ghosh, V. Aggarwal, and F. Qian. A rate adaptation algorithm for tile-Based
360-degree video streaming. arXiv preprint arXiv:1704.08215, 2017.
[35] S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen. The Lumigraph. In Proc. of
conference on Computer graphics and interactive techniques, pages 43–54, 1996.
[36] D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai. An
Overview of Ongoing Point Cloud Compression Standardization Activities: Video-
Based (V-PCC) and Geometry-Based (G-PCC). APSIPA Transactions on Signal and
Information Processing, 9(0):e13, 2020.
[37] M. Hannuksela, Y.Wang, and A. Hourunranta. An Overview of the OMAF Standard
for 360 Video. In Proc. of IEEE Data compression conference (DCC), pages 418–
427, 2019.
[38] P. Hedman, J. Philip, T. Price, J. Frahm, G. Drettakis, and G. Brostow. Deep Blending
for Free-Viewpoint Image-Based Rendering. ACM Transactions on Graphics
(TOG), 37(6):1–15, 2018.
[39] J. Hooft, S. Petrangeli, M. Claeys, J. Famaey, and F. Turck. A learning-Based algorithm
for improved bandwidth-awareness of adaptive streaming clients. In Proc. of
IFIP/IEEE International Symposium on Integrated Network Management (IM’15),
pages 131–138, 2015.
[40] M. Hosseini, G. Kurillo, S. Etesami, and J. Yu. Towards coordinated bandwidth
adaptations for hundred-scale 3D tele-immersive systems. Springer Multimedia Systems,
23(4):421–434, 2017.
[41] HTC VIVE. HTC VIVE. 2019. Retrieved April 21, 2020 from https://www.
vive.com/tw/product/vive.
[42] J. Hu, W. Peng, and C. Chung. HEVC/H.265 Coding Unit Split Decision Using
Deep Reinforcement Learning. In Proc. of IEEE International Symposium on Intelligent
Signal Processing and Communication Systems (ISPACS’17), pages 570–575,
2017.
[43] J. Hu, W. Peng, and C. Chung. Reinforcement Learning for HEVC/H.265 Intra-
Frame Rate Control. In Proc. of IEEE International Symposium on Circuits and
Systems (ISCAS’18), pages 1–5, 2018.
[44] J. Huang, Z. Chen, D. Ceylan, and H. Jin. 6-DOF VR Videos with A Single 360-
Camera. In Proc. of IEEE Virtual Reality Conference (VR’17), pages 37–44, 2017.
[45] T. Huang, R. Zhang, C. Zhou, and L. Sun. QARC: Video Quality Aware Rate Control
for Real-Time Video Streaming Based on Deep Reinforcement Learning. In
Proc. of ACM International Conference on Multimedia (MM’18), pages 1208–1216,
2018.
[46] J. Jeong, S. Lee, I. Ryu, T. Le, and E. Ryu. Towards Viewport-Dependent 6DoF 360
Video Tiled Streaming for Virtual Reality Systems. In Proc. of ACM International
Conference on Multimedia (MM’20), page 3687–3695, 2020.
[47] X. Jiang, Y. Chiang, Y. Zhao, and Y. Ji. Plato: Learning-Based Adaptive Streaming
of 360-Degree Videos. In Proc. of IEEE Conference on Local Computer Networks
(LCN’18), pages 393–400, 2018.
[48] J. Jung, B. Kroon, and J. Boyce. Common Test Conditions for Immersive
Video. International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG/N18563, 2019.
[49] N. Kalantari, T. Wang, and R. Ramamoorthi. Learning-Based view synthesis for
light field cameras. ACM Transactions on Graphics (TOG), 35(6):1–10, 2016.
[50] T. Kanade, P. Rander, and P. Narayanan. Virtualized Reality: Constructing Virtual
Worlds from Real Scenes. IEEE multimedia, 4(1):34–47, 1997.
[51] L. Kapov, M. Varela, T. Hoßfeld, and K. Chen. A Survey of Emerging Concepts
and Challenges for QoE Management of Multimedia Services. ACM Transacrions
on Multimedia Computing, Communications,and Application, 14(2s):29:1–29:29,
2018.
[52] D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In Proc. of
International Conference on Learning Representationsnce Track (poster), 2015.
[53] M. Levoy and P. Hanrahan. Light Field Rendering. In Proc. of conference on Computer
graphics and interactive techniques, pages 31–42, 1996.
[54] L. Li, Z. Li, X. Ma, H. Yang, and H. Li. Advanced Spherical Motion Model and Local
Padding for 360° Video Compression. IEEE Transactions on Image Processing,
28(5):2342–2356, 2019.
[55] L. Marc and H. Pat. Light Field Rendering. In Proc. of ACM Special Interest Group
on Computer Graphics and Interactive Techniques Conference (SIGGRAPH’96),
pages 31–42, 1996.
[56] C. Maxim, L. Steven, F. Jeroen, and D. Filip. Design and Evaluation of a Self-
Learning HTTP Adaptive Video Streaming Client. IEEE Communications Letters,
18(4):716–719, 2014.
[57] L. McMillan and G. Bishop. Plenoptic Modeling: An Image-Based Rendering System.
In Proc. of conference on Computer graphics and interactive techniques, pages
39–46, 1995.
[58] B. Mildenhall, P. Srinivasan, R. Ortiz-Cayon, N. Kalantari, R. Ramamoorthi, R. Ng,
and A. Kar. Local Light Field Fusion: Practical View Synthesis with Prescriptive
Sampling Guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
[59] B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, and R. Ng.
Nerf: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proc.
of European conference on computer vision, pages 405–421, 2020.
[60] MPEG. HM 16.16. 2019. Retrieved April 21, 2020 from https://hevc.hhi.
fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.16/.
[61] MPEG. Activity Report on Dense Light Fields. International Organization for Standardization
Meeting Document ISO/IEC JTC1/SC29/WG11MPEG2020/N19493,
2020.
[62] MPEG. MPEG roadmap. International Organization for Standardization Meeting
Document ISO/IEC JTC1/SC29/WG11 MPEG/w19514, 2020.
[63] MPEG. Text of ISO/IEC 23090-9 DIS Geometry-Based PCC. International
Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11
MPEG2020/N19088, 2020.
[64] MPEG. Text of ISO/IEC DIS 23090-5 Video-Based Point Cloud Compression.
International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG2020/N18670, 2020.
[65] MPEG. Text of ISO/IEC FDIS 23090-5 Visual Volumetric Video-Based Coding and
Video-Based Point Cloud Compression. International Organization for Standardization
Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w19579, 2020.
[66] MPEG. Text of ISO/IEC DIS 23090-12 MPEG Immersive Video. International
Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11
MPEG/w20003, 2021.
[67] K. Mueller, A. Smolic, K. Dix, P. Merkle, P. Kauff, and T.Wiegand. View Synthesis
for Advanced 3D Video Systems. Springer EURASIP Journal on image and video
processing, 2008(0):1–11, 2009.
[68] P. Ndjiki-Nya, M. Koppel, D. Doshkov, H. Lakshman, P. Merkle, K. Muller, and
T. Wiegand. Depth Image-Based Rendering with Advanced Texture Synthesis for
3-D Video. IEEE Transactions on Multimedia, 13(3):453–465, 2011.
[69] H. Pang, C. Zhang, F. Wang, J. Liu, and L. Sun. Towards Low Latency Multiviewpoint
360° Interactive Video: A Multimodal Deep Reinforcement Learning
Approach. In Proc. of IEEE Conference on Computer Communications (INFOCOM’
19), pages 991–999, 2019.
[70] R. Placket. The Analysis of Permutations. Journal of the Royal Statistical Society.
Series C (Applied Statistics), 24(2):193–202, 1975.
[71] B. Salahieh, S. Bhatia, and J. Boyce. Multi-Pass Add-on Tool for Coherent and
Complete View Synthesis (US Patent 2019/0320164 A1), 2019.
[72] B. Salahieh, B. Kroon, J. Jung, and M. Doma´nski. Test Model for Immersive
Video. International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG2017/N16730, 2017.
[73] B. Salahieh, B. Kroon, J. Jung, and M. Doma´nski. Test Model 2 for Immersive
Video. International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG/N18577, 2019.
[74] B. Salahieh, B. Kroon, J. Jung, and M. Doma´nski. Test Model 3 for Immersive
Video. International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG/N18795, 2019.
[75] B. Salahieh, B. Kroon, J. Jung, and M. Doma´nski. Test Model for Immersive
Video. International Organization for Standardization Meeting Document ISO/IEC
JTC1/SC29/WG11 MPEG/N18470, 2019.
[76] S. Shah, D. Dey, C. Lovett, and A. Kapoor. Airsim: High-Fidelity Visual and Physical
Simulation for Autonomous Vehicles. In Proc. of Field and Service Robotics,
pages 621–635, 2018.
[77] G. Sullivan, J. Ohm, W. Han, and T. Wiegand. Overview of the High Efficiency
Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for
Video Technology, 22(12):1649–1668, 2012.
[78] Y. Sun, A. Lu, and L. Yu. Weighted-to-Spherically-Uniform Quality Evaluation for
Omnidirectional Video. IEEE Signal Processing Letters, 24(9):1408–1412, 2017.
[79] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. A Bradford
Book, 2 edition, 2018.
[80] S. Tang, C. Hsu, Z. Tian, and X. Su. An Aerodynamic, Computer Vision, and Network
Simulator for Networked Drone Applications. In Proc. of ACM International
Conference on Mobile Computing and Networking (MobiCom’21) Poster Session,
pages 0–0, 2022.
[81] D. Tian, P. Lai, P. Lopez, and C. Gomila. View Synthesis Techniques for 3D
Video. In Proc. of SPIE Conference on Applications of Digital Image Processing
(ADIP’09), pages 74430T:1–74430T:11, 2009.
[82] R. Tucker and N. Snavely. Single-View View Synthesis with Multiplane Images. In
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 551–560, 2020.
[83] B. Vishwanath, T. Nanjundaswamy, and K. Rose. Rotational motion model for
temporal prediction in 360 video coding. In Proc. of IEEE International Workshop
on Multimedia Signal Processing (MMSP), pages 1–6, 2017.
[84] Y. Wang, D. Liu, S. Ma, F. Wu, and W. Gao. Spherical Coordinates Transform-
Based Motion Model for Panoramic Video Coding. IEEE Journal on Emerging and
Selected Topics in Circuits and Systems, 9(1):98–109, 2019.
[85] M. Wien, J. Boyce, T. Stockhammer, and W. Peng. Standardization Status of Immersive
Video Coding. IEEE Journal on Emerging and Selected Topics in Circuits
and Systems, 9(1):5–17, 2019.
[86] S.Wizadwongsa, P. Phongthawee, J. Yenphraphai, and S. Suwajanakorn. Nex: Real-
Time View Synthesis with Neural Basis Expansion. In Proc. of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 8534–8543, 2021.
[87] M. Xu, C. Li, S. Zhang, and P. Le Callet. State-of-the-art in 360 Video/Image
Processing: Perception, Assessment and Compression. IEEE Journal of Selected
Topics in Signal Processing, 14(1):5–26, 2020.
[88] A. Yaqoob, T. Bi, and G.-M. Muntean. A Survey on Adaptive 360° Video Streaming:
Solutions, Challenges and Opportunities. IEEE Communications Surveys Tutorials,
22(4):2801–2838, 2020.
[89] Y. Zhang, P. Zhao, K. Bian, Y. Liu, L. Song, and X. Li. DRL360: 360-degree Video
Streaming with Deep Reinforcement Learning. In Proc. of IEEE Conference on
Computer Communications (INFOCOM’19), pages 1252–1260, 2019.
[90] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo Magnification: Learning
View Synthesis Using Multiplane Images. arXiv preprint arXiv:1805.09817,
2018.
[91] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo Magnification:
Learning View Synthesis Using Multiplane Images. ACM Transactions on Graphics
(TOG), 37(4), 2018.
[92] S. Zinger, L. Do, and P. de With. Free-Viewpoint Depth Image Based Rendering.
Elsevier Journal of visual communication and image representation, 21(5-6):533–
541, 2010.
[93] ZION Market Research. Virtual Reality (VR) Market by Hardware and Software
for (Consumer, Commercial, Enterprise, Medical, Aerospace and Defense, Automotive,
Energy and Others): Global Industry Perspective, Comprehensive Analysis
and Forecast, 2016–2022. 2018. Retrieved April 21, 2020 from https://www.
zionmarketresearch.com/report/virtual-reality-market.
[94] L. Zitnick, S. Kang, M. Uyttendaele, S.Winder, and R. Szeliski. High-Quality Video
View Interpolation Using a Layered Representation. ACM Transactions on Graphics
(TOG), 23(3):600–608, 2004.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *