帳號:guest(3.135.188.211)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃家琬
作者(外文):Huang, Chia Wan
論文名稱(中文):藉由深度學習預測商品類別再結合細部外觀以及局部組態之商品辨識
論文名稱(外文):Mobile Product Recognition with Deep Learning of Category Selection, Fine-grained Appearance and Part Configuration
指導教授(中文):許秋婷
指導教授(外文):Hsu, Chiou Ting
口試委員(中文):王聖智
孫民
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062543
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:52
中文關鍵詞:商品辨識卷積類神經網路聚類細部外觀描述
外文關鍵詞:Mobile product recognitionConvolutional neural networkClusteringFine-grained appearance
相關次數:
  • 推薦推薦:0
  • 點閱點閱:200
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
在行動裝置的商品辨識中,由於商品類別眾多而且更新快速,大部分方法會
透過檢索資料庫中相似的參考影像來辨識商品快照。商品識別中主要的問題來自
跨商品之間的相似性以及同商品之內的差異性。為了解決前述兩大問題,本篇論
文首先介紹一個多階段的方法,它主要是透過三個卷積類神經網路以及我們提出
的相似度計算方式來比對商品影像。由於多階段的方法必須花費大量時間與所有
資料庫影像比對相似度,因此,我們進一步提出一個二階段的方法來加速商品辨
識。我們的方法首先會利用聚類方式將所有商品區分成多個類別,接著,我們會
在測試階段預測測試影像所屬的類別,並且只與該類別下的商品比對相似度。在
離線聚類的部分,若直接使用傳統的方法會導致最終聚類結果不理想,這是因為
傳統的方法會分開執行特徵擷取以及聚類,而且易受隨機初始的影響。為了解決
傳統方法的問題,我們在二階段的方法中提出兩個在卷積類神經網路的架構下,
重複調整影像特徵以及聚類分布的方法。另外,為了達到更有效率的辨識流程,
我們在二階段的方法中,將重複之局部組態比對以及特徵擷取合而為一。我們收
集了一個可公開並包含100 個符合現實情況的商品資料庫PRODUCT-100,實驗
證明我們提出的二階段的方法在SHORT 以及PRODUCT-100 中可以更有效率的
辨識商品,並且達到比現行的方法還要高的精准度。
The goal of product recognition is to retrieve database images similar to the query
image and then determine the product of the query based on the retrieved images. In
product recognition, there are two issues, inter-product similarity and intra-product
diversity, concerning the recognition performance. To address these two issues, we
first introduce an intuitive multi-stage method which consists of three convolutional
neural networks (CNN) and the proposed similarity measurement. Since the similarity
estimation with all images in the database costs a great amount of time, we further
propose a two-stage method which estimates similarity with the images under the
predicted category of the query. In this scenario, we need to offline cluster products
into categories. However, because traditional clustering methods extract features and
cluster images separately without coordination, they usually end up with improper
cluster assignment. To tackle the weakness of these methods, we propose two
clustering schemes that iteratively refine the last few layers of Faster RCNN toward
the best category assignment and feature representation. Next, to achieve an efficient
recognition process, we execute the repetitive steps of part configuration comparison
as well as feature extraction once in the two-stage method. For validation, we conduct
experiments on SHORT and our own collected dataset, PRODUCT-100, taken under
different variations. The experimental results show that the proposed two-stage
method demonstrates promising performance with regard to both accuracy and
efficiency.
中文摘要 1
Abstract 2
1. Introduction 4
2. Related Work 7
2.1 Object Detection 7
2.2 Fine-grained Classification 8
2.3 Product Recognition 9
2.4 Deep Learning for Clustering 10
3. Proposed Methods 11
3.1 Multi-stage Method for Product Recognition (co-work) 12
3.1.1 Product Localization 13
3.1.2 Rotation Alignment 13
3.1.3 Similarity Estimation 15
3.2 Two-stage Method for Product Recognition 18
3.2.1 The Training Process of Faster RCNN (individual work) 20
3.2.2 Target Box Selection (individual work) 26
3.2.3 Feature Extraction (co-work) 28
3.2.4 Similarity Estimation (co-work) 28
4. Experimental Results 29
4.1 Implementation Details 29
4.2 Datasets 31
4.3 Performance Measurement 33
4.4 Evaluation of the Proposed Method 33
4.4.1 Evaluation of Different Clustering Schemes 33
4.4.2 Evaluation of Rotation-invariant Similarity Learning 36
4.4.3 Quantitative Evaluation of Different Functions in the Proposed Method 37
4.4.4 Qualitative Evaluation of Different Functions in the Proposed Method 39
4.5 Comparison with Existing Methods 43
5. Limitation 45
6. Conclusions 48
7. References 49
[1] X. Shen, Z. Lin, J. Brandt, and Y. Wu, “Mobile Product Image Search by Automatic Query Object Extraction,” in ECCV, 2012.
[2] M. George and C. Floerkemeier, “Recognizing products: A Per-exemplar Multi-label Image Classification Approach,” in ECCV, 2014.
[3] M. George, D. Mircic, G. Soros, C. Floerkemeier, and F. Mattern, “Fine-Grained Product Class Recognition for Assisted Shopping,” in ICCV, 2015.
[4] Q. Feng, J. Pan, L. Yan, “Two Classifiers Based on Nearest Feature Plane for Recognition,” in ICIP, 2013.
[5] Q., Feng, C. Yuan, J. Huang, and W. Li, “Center-based weighted kernel linear regression for image classification,” in ICIP, 2015.
[6] T., Zhang, K. Huang, X. Li, J. Yang, and D. Tao, “Discriminative orthogonal neighborhood-preserving projections for classification,” in TSMCB, 2010.
[7] H. Kekre, S. Thepade, T. Sarode, and V. Suryawanshi, “Image retrieval using texture features extracted from GLCM, LBG and KPE,” in IJCTE, 2010.
[8] S.K. Naik, and C.A. Murthy, “Distinct Multicolored Region Descriptors for Object Recognition,” in IEEE TPAMI, 2007.
[9] D. Koubaroulis and J. Matas, “Evaluating colour-based object recognition algorithms using the SOIL-47 database,” in ACCV, 2002.
[10] S. Nene, S. Nayar, and H. Murase, “Columbia Object Image Library (COIL-100),” Technical Report CUCS-006-96, Columbia University, 1996.
[11] D.G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 60(2):91-110, 2004.
[12] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in CVPR, 2010.
[13] N. Dalal, B. Triggs, “Histogram of oriented gradients for human detection,” in kikkkkCVPR, 2005.
[14] J. Rivera-Rubio, S. Idrees, I. Alexiou, L. Hadjilucas, and A. Bharath, “Small hand-held object recognition test (short) ,” in WACV, 2014.
[15] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE TPAMI, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[16] H. Azizpour and I. Laptev, “Object Detection Using Strongly-Supervised Deformable Part Models,” in ECCV, 2012.
[17] I. Endres, K.J Shih, J. Jiaa, and D. Hoiem, “Learning Collections of Part Models for Object Recognition,” in CVPR, 2013.
[18] R.B. Girshick, F.N. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in CVPR, 2015.
[19] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014.
[20] R. Girshick. Fast R-CNN. ICCV, 2015
[21] N. Zhang, R. Farrell, and T. Darrell, “Pose pooling kernels for sub-category recognition,” in CVPR, 2012.
[22] N. Zhang, R. Farrell, F. Iandola, and T. Darrell, “Deformable part descriptors for fine-grained recognition and attribute prediction,” in ICCV, 2013.
[23] N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for Fine-grained Category Detection,” in ECCV, 2014.
[24] Y. Zhang, X. Wei, J. Wu, J. Cai, J. Lu, V. Nguyen, and M. Do, “Weakly supervised fine-grained image categorization,” in arXiv, 2015.
[25] T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang, “The application of two-level attention models in deep convolutional neural network for fine-grained image classification,” in CVPR, 2015.
[26] J.B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proc. Fifth Berkeley Symp. Math. Statistical Probability, 1967.
[27] L. Von, ”A tutorial on spectral clustering,” Statistics and computing, 2007.
[28] C. Bishop, ”Pattern recognition and machine learning,” Springer, New York, 2006.
[29] S. Alelyani, J. Tang, and H. Liu, “Feature selection for clustering: A review,” Data Clustering: Algorithms and Applications, 2013.
[30] H. Liu and L. Yu. “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on Knowledge and Data Engineering, 2005.
[31] M. Merler, C. Galleguillos, and S. Belongie, “Recognizing groceries in situ using in vitro training data,” in CVPR, 2007.
[32] E. Xing, M. Jordan, S. Russell, and A. Ng, “Y. Distance metric learning with application to clustering with side-information,” in NIPS, 2002.
[33] S. Xiang, F. Nie, and C. Zhang, “Learning a mahalanobis distance metric for data clustering and classification,” Pattern Recognition, 2008.
[34] Z. Wang, S. Chang, J. Zhou, and T. S. Huang, “Learning a task-specific deep architecture for clustering,” arXiv preprint:1509.00151, 2015. 2, 5, 7
[35] J. Xie, R. Girshick, A. Farhadi, “Unsupervised Deep Embedding for Clustering Analysis,” arXiv preprint:1511.06335, 2016. 5, 24
[36] C. Xiong, L. Liu, X. Zhao, S. Yan, and T. K. Kim, “Convolutional fusion network for face verification in the wild,” IEEE Trans. Circuits Syst. Video Technol, vol. 26, no. 3, pp. 517-528, March. 2016.
[37] J. Yang, D. Parikh, and D. Batra,”Joint Unsupervised Learning of Deep Representations and Image Clusters,” in CVPR, 2016
[38] K. C. Gowda and G. Krishna, ”Agglomerative clustering using the concept of mutual nearest neighbourhood,” Pattern recognition, 10(2):105–112, 1978.
[39] S. Ren, K He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in NIPS, 2015.
[40] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results,” 2007.
[41] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS, 2012.
[42]L. Liu, C. Shen, A.V.D. Hengel, and C. Wang, “The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification,” in CVPR, 2015.
[43] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Hypercolumns for Object Segmentation and Fine-grained Localization,” in CVPR, 2015.
[44] D. Marcos, M. Volpi, D. Tuia, “Learning rotation invariant convolutional filters for texture classification,” arXiv preprint:1604.06720, 2016. 4, 22
[45] A. Fathi, X. Ren and M. Rehg, “Learning to Recognize Objects in Egocentric Activities,” in CVPR, 2011.
[46] S. Bambach, S. Lee, D. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” in ICCV, 2015.
[47] G. Csurka, C. Dance, and L. Fan, “Visual categorization with bags of keypoints,” in ECCV Workshop, 2004.
[48] F. Perronnin, J. Sanchez, and T. Mensink, “Improving the Fisher kernel for large-scale image classification,” in ECCV, 2010.
[49] J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in CVPR, 2015.
[50] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut—interactive foreground extraction using iterated graph cuts,” in SIGGRAPH, 2004.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *