帳號:guest(3.128.205.122)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):翁郁翔
作者(外文):Wong, Yu Shiang
論文名稱(中文):SMARTANNOTATOR: 互動式室內 RGBD 場景標註系統
論文名稱(外文):SMARTANNOTATOR: An Interactive Tool for Annotating Indoor RGBD Images
指導教授(中文):朱宏國
指導教授(外文):Chu, Hung-Kuo
口試委員(中文):姚智原
李潤容
口試委員(外文):Yao, Chih-Yuan
Lee, Ruen-Rone
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:101062524
出版年(民國):104
畢業學年度:103
語文別:英文
論文頁數:40
中文關鍵詞:電腦視覺電腦圖學場景認知標註
外文關鍵詞:Computer VisionComputer GraphicsScene UnderstandingAnnotation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:383
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
在場景認知(Scene Understanding)和影像操作(Image Manipulation)領域,包含高階語意標註的RGBD資料庫是非常有用的。因為我們可以從資料庫萃取出先備知識(Prior Knowledge)。現在由於深度感測器的普及化,RGBD 資料的收集已經變得容易,但是高階語意的標註工作仍是相當繁冗。在本研究中,我們設計了一個互動式的RGBD 資料標注系統SmartAnnotator。該系統可以自動的推測出場景中物件的名稱與幾何抽象表示(Cuboid) 以及物件之間的結構關係。使用者可以由系統產生的建議名稱列表,快速得確認標注。在標注過程中,根據使用者的輸入,系統便會自動修正並改善幾何表示與場景結構。此外隨著越多資料被標註,系統的推測也會越來越準確。本研究設計了四個實驗來分析此系統的效能,包括大量數據的標注效率、與簡易方法(Naive Method) 的比較、對於不同物件分割影響探討、以及系統計算速度分析。實驗結果顯示本系統可以有效改善傳統
RGBD 資料標注的效率,並產生高品質的RGBD 標註資料庫。
RGBD images with high quality annotations, both in the form of geometric(i.e., segmentation) and structural (i.e., how do the segments mutually relate in 3D) information, provide valuable priors for a diverse range of applications in scene understanding and image manipulation. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging. We present SmartAnnotator, an interactive system to facilitate annotating raw RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and generates an ordered list of hypotheses. The user simply has to flip through the suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As annotations are finalized, the process becomes simpler with fewer ambiguities to resolve. Moreover, as more scenes are annotated, the system makes better suggestions based on the structural and geometric priors learned from previous annotation sessions. We test the system on a large number of indoor scenes across different users and experimental settings, validate the results on existing benchmark datasets, and report significant improvements over low-level annotation alternatives.
中文摘要 i
Abstract ii
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
2 Related Work 5
2.1 Image annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Incremental learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Indoor scene understanding . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Overview 9
3.1 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Annotating Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Algorithm 12
4.1 Modeling the 3D Structure of Scene . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 3D Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Structure Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Learning Probability Models . . . . . . . . . . . . . . . . . . . . . 15
4.3 Annotating Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Label Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.2 User Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.3 Structure Graph Refinement . . . . . . . . . . . . . . . . . . . . . 20
5 Experiment and Evaluation 24
5.1 Design of the Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1.1 Dataset and Ground Truth . . . . . . . . . . . . . . . . . . . . . . 24
5.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Performance of Learning and Labeling . . . . . . . . . . . . . . . . . . . . 26
5.2.1 User experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Performance of probability models . . . . . . . . . . . . . . . . . . 29
5.3 Comparing with Naive Annotation Tool . . . . . . . . . . . . . . . . . . . 29
5.4 Sensitivity to Object Segmentation . . . . . . . . . . . . . . . . . . . . . . 30
iii
Contents iv
5.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.1 User scribbling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.2 Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Conclusion 33
6.1 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Generate Over-segmentation 35
B Reconstruct Object Geometry 36
Bibliography 38
[1] Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman.
LabelMe: A database and web-based tool for image annotation. IJCV, 2008.
[2] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ”grabcut”: Interactive
foreground extraction using iterated graph cuts. ACM Trans. Graph. (Proc.
SIGGRAPH), 23(3):309–314, 2004.
[3] Ruiqi Guo and Derek Hoiem. Support surface prediction in indoor scenes. IEEE
ICCV, 2013.
[4] Jianxiong Xiao, Andrew Owens, and Antonio Torralba. SUN3D: A database of big
spaces reconstructed using sfm and object labels. In IEEE ICCV, 2013.
[5] Aleksey Boyko and Thomas Funkhouser. Cheaper by the dozen: Group annotation
of 3D data. In UIST, October 2014.
[6] Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. Active
learning with gaussian processes for object categorization. In IEEE ICCV, pages
1–8, 2007.
[7] Sudheendra Vijayanarasimhan and Kristen Grauman. Large-scale live active learning:
Training object detectors with crawled data and crowds. In IEEE CVPR,
pages 1449–1456, 2011.
[8] Li-Jia Li and Li Fei-Fei. Optimol: automatic online picture collection via incremental
model learning. IJCV, 88(2):147–168, 2010.
[9] Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. Neil: Extracting visual
knowledge from web data. In IEEE ICCV, 2013.
[10] Hema S Koppula, Abhishek Anand, Thorsten Joachims, and Ashutosh Saxena.
Semantic labeling of 3d point clouds for indoor scenes. In NIPS, pages 244–252,
2011.
[11] N. Silberman and R. Fergus. Indoor scene segmentation using a structured light
sensor. In IEEE ICCV, 2011.
[12] Xiaofeng Ren, Liefeng Bo, and Dieter Fox. RGB-(D) scene labeling: Features and
algorithms. In IEEE CVPR, 2012.
[13] Niloy J. Mitra, Michael Wand, Hao Zhang, Daniel Cohen-Or, and Martin Bokeloh.
Structure-aware shape processing. In EUROGRAPHICS State-of-the-art Report,
2013.
[14] Bryan C. Russell and Antonio Torralba. Building a database of 3D scenes from user
annotations. In IEEE CVPR, 2009.
[15] Ashutosh Saxena, Min Sun, and Andrew Y. Ng. Make3D: Learning 3D scene structure
from a single still image. IEEE Trans. Pattern Anal. Mach. Intell., 31(5):
824–840, 2009.
[16] Abhinav Gupta, Alexei A. Efros, and Martial Hebert. Blocks world revisited: Image
understanding using qualitative geometry and mechanics. In ECCV, 2010.
[17] Varsha Hedau, Derek Hoiem, and David Forsyth. Thinking inside the box: Using
appearance models and context based on room geometry. In ECCV, 2010.
[18] Luca Del Pero, Joshua Bowdish, Daniel Fried, Bonnie Kermgard, Emily Hartley,
and Kobus Barnard. Bayesian geometric modeling of indoor scenes. In IEEE CVPR,
pages 2719–2726, 2012.
[19] Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, and Silvio Savarese. Understanding
indoor scenes using 3d geometric phrases. In IEEE CVPR, pages 33–40, 2013.
[20] Yibiao Zhao and Song-Chun Zhu. Scene parsing by integrating function, geometry
and appearance models. IEEE CVPR, 2013.
[21] Yinda Zhang, Shuran Song, Ping Tan, and Jianxiong Xiao. Panocontext: A wholeroom
3d context model for panoramic scene understanding. In ECCV, 2014.
[22] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation
and support inference from RGBD images. In ECCV, 2012.
[23] Dahua Lin, Sanja Fidler, and Raquel Urtasun. Holistic scene understanding for 3D
object detection with rgbd cameras. IEEE ICCV, 2013.
[24] Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, and Tsuhan Chen. 3D-based
reasoning with blocks, support, and stability. IEEE CVPR, 2013.
[25] Tianjia Shao, Aron Monszpart, Youyi Zheng, Bongjin Koo, Weiwei Xu, Kun Zhou,
and Niloy J. Mitra. Imagining the unseen: Stability-based cuboid arrangements
for scene understanding. ACM Trans. Graph. (Proc. SIGGRAPH Asia), 33(6):
209:1–209:11, 2014.
[26] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient graph-based image
segmentation. IJCV, 2004.
[27] Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun.
Interactive furniture layout using interior design guidelines. ACM Trans. Graph.
(Proc. SIGGRAPH), 30(4):87:1–87:10, 2011.
[28] Lap-Fai Yu, Sai-Kit Yeung, Chi-Keung Tang, Demetri Terzopoulos, Tony F. Chan,
and Stanley J. Osher. Make it home: Automatic optimization of furniture arrangement.
ACM Trans. Graph. (Proc. SIGGRAPH), 30(4):86:1–86:12, 2011.
[29] Matthew Fisher, Manolis Savva, and Pat Hanrahan. Characterizing structural relationships
in scenes using graph kernels. ACM Trans. Graph. (Proc. SIGGRAPH),
30(4):34, 2011.
[30] Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan.
Example-based synthesis of 3D object arrangements. ACM Trans. Graph.
(Proc. SIGGRAPH Asia), 31(6):135:1–135:11, 2012.
[31] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines.
ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011. ISSN 2157-6904.
[32] Dennis Park and Deva Ramanan. N-best maximal decoders for part models. In
IEEE ICCV, pages 2627–2634, 2011.
[33] Ming-Ming Cheng, Fang-Lue Zhang, Niloy J. Mitra, Xiaolei Huang, and Shi-Min
Hu. Repfinder: Finding approximately repeated scene elements for image editing.
ACM Trans. Graph. (Proc. SIGGRAPH), 29(4):83:1–8, 2010.
[34] Emilie Guy, Jean-Marc Thiery, and Tamy Boubekeur. Simselect: similarity-based
selection for 3d surfaces. Comp. Graphics Forum (Proc. EUROGRAPHICS), 33(2):
165–173, 2014.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *