作者(外文):Fan, Ching-Ling
論文名稱(外文):Optimizing Immersive Video Streaming to Head-Mounted Virtual Reality
指導教授(外文):Hsu, Cheng-Hsin
口試委員(外文):Sheu, Jang-Ping
Lee, Che-Rung
Huang, Chun-Ying
Chen, Chien
Pang, Ai-Chun
外文關鍵詞:Virtual RealityMultimedia StreamingQuality of ExperienceHead-Mounted Display360-Degree Video
隨著科技日新月異,人們不再滿足於僅僅使用平面顯示器觀看高清(Full High Definition)、超清(Ultra-High Definition)串流影片,而開始追求沈浸式(immersive)的觀看體驗。因此,能提供使用者沈浸式體驗的360度影片蔚為潮流,例如,知名影音串流平台如YouTube及Facebook皆已支援360度影片串流。此外使用頭戴式顯示器(HMD)觀看360度影片,更能讓使用者得到身歷其境的體驗,因為使用者能透過轉頭自然地改變觀看角度,猶如親身處在影片的虛擬環境中。然而,串流360度影片至頭戴式顯示器並非易事。首先360度影片為提供使用者頭戴式顯示器中擬真的畫面,需要極高的解析度而造成相當可觀得檔案大小,這將使頻寬不堪負荷而造成額外的延遲及差強人意的使用者體驗。此外,由於360度影片需投影到二維影片後才能進行壓縮,所造
成的變形使現存的影片品質指標,如峰值信噪比(Peak Signal-to-Noise Ratio, PSNR)及結構相似性(Structural SIMilarity Index, SSIM)皆難以準確衡量360度影片的觀看品質,更遑論考慮人類複雜的視覺系統及使用者多元的觀看行為。這些困難阻礙了以使用者體驗為導向的360度影片串流最佳化發展。為了解決上述的挑戰,本論文解決了360度影片串流至頭戴式顯示器的三個核心問題,這三個問題分別處於串流的三個階段:串流傳輸、壓縮與包裝,以及顯示與觀看。首先,我們設計並開發了一個神經網路,運用感測資料及影片分析進行訓練,以預測使用者未來視野。我們所提出的預測網路有效地減少360度影片傳輸所需頻寬,但仍維持相當好的影片品質。接下來,我們利用影片模型、觀看機率及客戶端頻寬分佈,來計算最佳化編碼階梯(Encoding Ladder),以決定應儲存哪些影片版本在有儲存空間限制的伺服器上,藉此最佳化
Immersive videos, a.k.a. 360◦ videos, have become increasingly more
popular. 360° deliver more immersive viewing experience to end users because of the freedom of changing viewports. Streaming immersive videos
to Head-Mounted Displays (HMDs) offer even more immersive experience by allowing users to arbitrary rotate their heads to change the viewports as if they are physically in virtual worlds. However, streaming high-quality 360° videos to HMDs is quite challenging. First, 360° videos contain much more information than conventional videos, and thus are much larger in resolutions and size. This may introduce additional delay and degraded user experience due to insufficient network bandwidth. Second, existing quality metrics are less applicable to 360° videos, which is due to the complex human visual systems and diverse viewing behaviors. This inhibits the development of QoE-orientated optimization for 360° videos. To address these challenges, we study three core problems to optimize the: (i) delivery, (ii) production, and (iii) consumption of immersive video content in the emerging streaming systems to HMDs. First, we design a neural network that leverages sensor and content features to predict the future viewports of HMD viewers watching
immersive tiled videos. Our proposed prediction network effectively reduces the bandwidth consumption while offering comparable video quality. Second, we develop a divide-and-conquer approach to optimize the encoding ladder of immersive tiled videos considering the video models, viewing probabilities, and client distribution. Our proposed algorithm aims to maximize the overall viewing quality of clients under the limits of server storage and heterogeneous client bandwidths. Last, we design and conduct a user study to investigate and quantify the impacts of various QoE factors. We then use these factors to build QoE models for the immersive videos. The outcomes of these three studies result in better optimized immersive video streaming systems to HMDs. Our developed technologies and accumulated experience will be the cornerstone of the upcoming Virtual Reality (VR), Mixed Reality (MR), and Augmented Reality (AR), collectively referred to as Extended Reality (XR), applications.
Acknowledgments i
Abstract iv
1 Introduction 1
1.1 Delivery Optimization: Fixation Prediction . . . . . . . . . . . . . . . . 3
1.2 Production Optimization: Optimal Laddering . . . . . . . . . . . . . . . 3
1.3 Consumption Optimization: QoE Modeling . . . . . . . . . . . . . . . . 4
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background and Related Work 8
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Off-the-Shelf Hardware . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 General 360 Video Streaming Framework . . . . . . . . . . . . . 13
2.1.4 Tiled 360◦ Videos . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Delivery: Fixation Prediction . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 2D Image/Video Saliency . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 360◦ Image/Video Saliency . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Fixation/Head Movement Prediction in HMDs . . . . . . . . . . 19
2.3 Production: Optimal Laddering . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Viewport-Adaptive Tiled Streaming . . . . . . . . . . . . . . . . 21
2.3.2 Adaptive BitRate (ABR) Algorithms . . . . . . . . . . . . . . . . 22
2.3.3 Bitrate Allocation and Optimal Laddering Algorithms . . . . . . 23
2.4 Consumption: QoE Modeling . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 QoE Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 QoE Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 QoE Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Delivery Optimization: Fixation Prediction 27
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 360◦ Video Streaming Systems . . . . . . . . . . . . . . . . . . . 28
3.1.2 Viewport and Modeling . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Fixation Prediction Networks . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Orientation-Based Network . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Tile-Based Network . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.4 Future-Aware Network . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Datasets and Network Implementations . . . . . . . . . . . . . . . . . . 32
3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.2 Network Implementations . . . . . . . . . . . . . . . . . . . . . 33
3.4 Overlapping Virtual Viewports . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 Projection Models . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Overlapping Virtual Viewport (OVV) . . . . . . . . . . . . . . . 38
3.4.3 Validations with Real Computer Vision Algorithms . . . . . . . . 39
3.4.4 Fixation Prediction with OVV . . . . . . . . . . . . . . . . . . . 40
3.4.5 Validation with Additional Videos/Viewers . . . . . . . . . . . . 41
3.5 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.1 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.4 A Small-Scale User Study . . . . . . . . . . . . . . . . . . . . . 49
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Production Optimization: Optimal Laddering 51
4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Optimal Laddering Problem . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Video Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Problem Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Per-Class Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.1 Per-Class Formulation . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Lagrangian-Based Algorithm: PC-LBA . . . . . . . . . . . . . . 60
4.4.3 Greedy-based Algorithm: PC-GBA . . . . . . . . . . . . . . . . 63
4.5 Global Optimization for the Optimal Ladders . . . . . . . . . . . . . . . 64
4.6 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6.1 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6.3 Per-Class Optimization Results . . . . . . . . . . . . . . . . . . 70
4.6.4 Optimal Laddering Results . . . . . . . . . . . . . . . . . . . . . 73
4.6.5 Summary of the Key Findings . . . . . . . . . . . . . . . . . . . 75
4.7 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7.1 Comparisons with the Optimal Solution . . . . . . . . . . . . . . 76
4.7.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.8 Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Consumption Optimization: QoE Modeling 85
5.1 QoE of 360◦ Tiled Videos Streamed to HMDs . . . . . . . . . . . . . . . 87
5.1.1 QoE Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 QoE Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 A User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.2 Dataset and Subjects . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.4 Viewing Behaviors and Video Classification . . . . . . . . . . . . 92
5.2.5 The Overall QoE and QoE Features . . . . . . . . . . . . . . . . 93
5.2.6 Diverse QoE Models . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 MOS Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.1 Regressor Selection . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.2 Derived Model Performance . . . . . . . . . . . . . . . . . . . . 96
5.4 IS Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6 Conclusions and FutureWork 103
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2.1 Live 360◦ Video Streaming . . . . . . . . . . . . . . . . . . . . . 104
6.2.2 6DoF Content Streaming . . . . . . . . . . . . . . . . . . . . . . 105
6.2.3 VR Gaming with Multiple Observers . . . . . . . . . . . . . . . 105
6.2.4 Movie Creation for XR Content . . . . . . . . . . . . . . . . . . 106
Bibliography 107
