作者(外文):Wong, Yu Shiang
論文名稱(中文):SMARTANNOTATOR: 互動式室內 RGBD 場景標註系統
論文名稱(外文):SMARTANNOTATOR: An Interactive Tool for Annotating Indoor RGBD Images
指導教授(外文):Chu, Hung-Kuo
口試委員(外文):Yao, Chih-Yuan
Lee, Ruen-Rone
外文關鍵詞:Computer VisionComputer GraphicsScene UnderstandingAnnotation
在場景認知(Scene Understanding)和影像操作(Image Manipulation)領域,包含高階語意標註的RGBD資料庫是非常有用的。因為我們可以從資料庫萃取出先備知識(Prior Knowledge)。現在由於深度感測器的普及化,RGBD 資料的收集已經變得容易,但是高階語意的標註工作仍是相當繁冗。在本研究中,我們設計了一個互動式的RGBD 資料標注系統SmartAnnotator。該系統可以自動的推測出場景中物件的名稱與幾何抽象表示(Cuboid) 以及物件之間的結構關係。使用者可以由系統產生的建議名稱列表,快速得確認標注。在標注過程中,根據使用者的輸入,系統便會自動修正並改善幾何表示與場景結構。此外隨著越多資料被標註,系統的推測也會越來越準確。本研究設計了四個實驗來分析此系統的效能,包括大量數據的標注效率、與簡易方法(Naive Method) 的比較、對於不同物件分割影響探討、以及系統計算速度分析。實驗結果顯示本系統可以有效改善傳統
RGBD 資料標注的效率,並產生高品質的RGBD 標註資料庫。
RGBD images with high quality annotations, both in the form of geometric(i.e., segmentation) and structural (i.e., how do the segments mutually relate in 3D) information, provide valuable priors for a diverse range of applications in scene understanding and image manipulation. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging. We present SmartAnnotator, an interactive system to facilitate annotating raw RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and generates an ordered list of hypotheses. The user simply has to flip through the suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As annotations are finalized, the process becomes simpler with fewer ambiguities to resolve. Moreover, as more scenes are annotated, the system makes better suggestions based on the structural and geometric priors learned from previous annotation sessions. We test the system on a large number of indoor scenes across different users and experimental settings, validate the results on existing benchmark datasets, and report significant improvements over low-level annotation alternatives.
中文摘要 i
Abstract ii
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
2 Related Work 5
2.1 Image annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Incremental learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Indoor scene understanding . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Overview 9
3.1 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Annotating Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Algorithm 12
4.1 Modeling the 3D Structure of Scene . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 3D Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Structure Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Learning Probability Models . . . . . . . . . . . . . . . . . . . . . 15
4.3 Annotating Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Label Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.2 User Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.3 Structure Graph Refinement . . . . . . . . . . . . . . . . . . . . . 20
5 Experiment and Evaluation 24
5.1 Design of the Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1.1 Dataset and Ground Truth . . . . . . . . . . . . . . . . . . . . . . 24
5.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Performance of Learning and Labeling . . . . . . . . . . . . . . . . . . . . 26
5.2.1 User experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Performance of probability models . . . . . . . . . . . . . . . . . . 29
5.3 Comparing with Naive Annotation Tool . . . . . . . . . . . . . . . . . . . 29
5.4 Sensitivity to Object Segmentation . . . . . . . . . . . . . . . . . . . . . . 30
Contents iv
5.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.1 User scribbling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.2 Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Conclusion 33
6.1 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Generate Over-segmentation 35
B Reconstruct Object Geometry 36
Bibliography 38
