帳號:guest(3.142.166.186)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李豪韋
作者(外文):Lee, Hao Wei
論文名稱(中文):自動樂譜辨識
論文名稱(外文):Automatic Music Score Recognition
指導教授(中文):劉奕汶
指導教授(外文):Liu, Yi Wen
口試委員(中文):孫民
趙煦
張智星
劉奕汶
口試委員(外文):Sun, Min
Chao, Shiuh
Jang, Jyh Shing
Liu, Yi Wen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:103061527
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:62
中文關鍵詞:模型識別圖像處理音樂科技光學音樂辨識
外文關鍵詞:Pattern RecognitionImage ProcessingMusic TechnologyOptical Music Recognition
相關次數:
  • 推薦推薦:0
  • 點閱點閱:351
  • 評分評分:*****
  • 下載下載:32
  • 收藏收藏:0
在音樂應用上,人們發明樂譜原先是為了方便以圖像的方式紀錄一段音樂的資訊,而光學音樂辨識旨在設計一套演算流程,讓電腦也能夠自動辨識原先是設計給人閱讀的樂譜。一般而言,樂譜被存為電子檔的格式大多為圖片檔,因此光學音樂辨識的目的在於從一張圖片上取得其音樂資訊。本論文主要探討兩個面向:樂譜的前處理以及對於單一組五線譜的辨識演算。一個樂譜會先經過前處理將其分割成更小的單位來獨立運算以及處理一些印刷上所造成的雜訊或瑕疵,讓後續的辨識能夠得到最好的輸入圖片。辨識則是本論文的核心,本論文以樣本匹配法及支持向量機實作辨識演算法,在實際的樂譜圖片上都有不錯的結果。除此之外,在演算法的設計上也與以往有所不同。第一點,在前處理中使用隨機抽樣一致法,使其結果多了隨機性,每一次的結果在同一張圖上都會不一樣,因此讓重複執行變得有意義。其不同次執行的結果,可以歸納出一個更好的結果,使一些原先穩定演算法無法辨識到的符號因為其隨機性而有機會被辨識。第二點則是其演算法基於分治法的概念,意即其分割出來的子問題幾乎是完全獨立的,也因此讓此實作更適合平行處理來加快運算速度。
The purpose of optical music recognition is to develop a computer program that is able to understand the musical score, which is invented for human beings to annotate melody. A score is usually stored as an image. Therefore, a recognition system must retrieve musical information from a set of pixels. This dissertation deals with two major issues: preprocessing and recognition. Preprocessing aims at dividing the input image into several slices that can be processed independently and handling the defects in the printing step. The goal of preprocessing is to simplify the subsequent recognition stage. Afterward, recognition on a staff image is the core of this dissertation. The implementation is based on template matching and the support vector machine. For real score images, the present algorithm works well. The design of the present algorithm brings a different perspective to optical music recognition. First, the preprocessing uses random sample consensus (RANSAC) as a part of staff detection. Such randomness makes it meaningful to repeat the same operation; by comparing the results between different iterations, consensus-based correction provides possibility of finding symbols that other existing stable algorithms cannot find. Secondly, the algorithm is based on the divide and conquer concept, which means the subtasks have little correlation, and hence the algorithm can be readily parallelized.
摘要 v
Abstract vii
1 Introduction 1
1.1 Motivation................................. 1
1.2 Goal.................................... 1
1.3 Divide and Conquer............................ 2
1.3.1 Definition............................. 2
1.3.2 Main Contribution of This Dissertation . . . . . . . . . . . . . 2
1.4 Organization of Dissertation ....................... 3
2 Overview of OMR 5
2.1 Binarization................................ 5
2.2 Staff Detection and Removal....................... 6
2.3 Segmentation ............................... 8
2.4 Symbol Recognition ........................... 8
2.5 Human in the Loop ............................ 9
3 Technical Background 11
3.1 Run-Length Encoding (RLE)....................... 11
3.2 Projection................................. 12
3.2.1 Projection onto a Line on a 2D Plane . . . . . . . . . . . . . . 12
3.2.2 Horizontal and Vertical Projection................ 12
3.3 Features.................................. 13
3.3.1 Character Profile ......................... 13
3.3.2 Horizontal/Vertical Centroid................... 13
3.3.3 Tiny Image ............................ 14
3.4 Random Sample Consensus (RANSAC)................. 16
3.5 Detection and Segmentation ....................... 17
3.5.1 Sliding Window ......................... 17
3.5.2 Onset Region Guided Detection ................. 17
3.6 Classification ............................... 18
3.6.1 Support Vector Machine (SVM) ................. 18
3.6.2 Template Matching........................ 19
4 Implementation Details 21
4.1 Overview of Recognition Procedure ................... 21
4.2 Staff Profile................................ 22
4.3 Staff Detection .............................. 22
4.3.1 Staffsegment Detection...................... 22
4.3.2 Staffsegment Grouping...................... 24
4.3.3 Strategy Choosing Parameters of RANSAC . . . . . . . . . . . 27
4.4 Staff Segmentation ............................ 29
4.5 Staff Removal............................... 29
4.6 Performance of Staff Removal ...................... 31
4.7 Barline Detection ............................. 32
4.7.1 Barline Detection Preprocessing................. 33
4.8 Pitch.................................... 34
4.8.1 Pitch Number........................... 34
4.8.2 Pitch-related Cropping ...................... 35
4.9 Notehead Detection............................ 36
4.9.1 Notehead-or-not Classification.................. 36
4.9.2 Notehead Classification ..................... 36
4.9.3 Detection with Onset Curve ................... 37
4.9.4 Stem-guided Detection...................... 38
4.9.5 Confusion Matrix......................... 39
4.10 Accidental Detection ........................... 40
4.10.1 Detection on Pitch Images .................... 40
4.11 Clef Detection............................... 41
4.11.1 Clef-or-not Classification..................... 42
4.11.2 Clef Classification ........................ 42
4.12 Rest Detection............................... 43
4.12.1 Segmentation........................... 43
4.12.2 Detection on Segments...................... 44
4.13 Stem Detection .............................. 44
4.13.1 Vertical Line Detection...................... 45
4.13.2 Stem-or-not Determination.................... 46
4.14 Beam Grouping.............................. 46
4.14.1 Line Detection by Integral Along The Line . . . . . . . . . . . 47
4.14.2 Algorithm of Beam Grouping by Detected Vertical Lines . . . . 47
4.15 Summary of Detection Algorithms.................... 49
4.16 Result on Real Staff Images ....................... 50
5 Discussion 53
5.1 Algorithm of the Preprocessing Stage .................. 53
5.2 Consensus-based Correction of The Results . . . . . . . . . . . . . . . 54
6 Conclusion 57
6.1 Summary of Achievement ........................ 57
6.2 Future Work................................ 59
References 61
[1] T. Pinto, A. Rebelo, G. Giraldi, and J. S. Cardoso,“Music score binarization based on domain knowledge,” Pattern Recognition and Image Analysis - 5th Iberian Conf. (IbPRIA), pp. 700–708, 2011.
[2]  O. Nobuyuki, “A threshold selection method from gray-level histograms,” IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979.
[3]  Q. Chen, Q.-s. Sun, P. A. Heng, and D.-s. Xia,“A double-threshold image binarization method based on edge detector,” Pattern Recognition, vol. 41, pp. 1254–1267, 2008.
[4]  L.-K. Huang and M.-J. J. Wang, “Image thresholding by minimizing the measures of fuzziness,” Pattern Recognition, vol. 28, pp. 41–51, 1995.
[5]  D.-M. Tsai,“A fast thresholding selection procedure for multimodal and unimodal histograms,” Pattern Recognition Letters, vol. 16, pp. 653–666, 1995.
[6]  J. Bernsen, “Dynamic thresholding of grey-level images,” in Proc. the 8th. Int. IEEE Conf. CAD Systems in Microelectronics (CADSM), pp. 1254–1267, 2005.
[7]  R. Randriamahefa, J. P. Cocquerez, F. Fluhr, C. Pepin, and S. Philipp,“Printed music recognition,” in Proc. the Second Int. Conf. on Document Analysis and Recognition, pp. 898–901, 1993.
[8]  K. T. Reed and J. Parker, “Automatic computer recognition of printed music,” in Proc. the 13th Int. Conf. on Pattern Recognition, vol. 3, p. 803–807, 1996.
[9]  P. Bellini, I. Bruno, and P. Nesi, “Optical music sheet segmentation,” in Proc. the First Int. Conf. on Web Delivering of Music, pp. 183–190, 2001.
[10]  H. Miyao, “Stave extraction for printed music scores,” Intelligent Data Engineering and Automated Learning-IDEAL 2002, H. Yin, N. Allinson, R. Freeman, J. Keane, and S. Hubbard, Eds. Springer, pp. 621–634, 2002.
[11]  F. Rossant and I. Bloch, “Robust and adaptive OMR system including fuzzy modeling, fusion of musical rules, and possible error detection,” EURASIP J. on Advances in Signal Processing, vol. 1, 2007.
[12]  C. Dalitz, M. Droettboom, B. Pranzas, and I. Fujinaga, “A comparative study of staff removal algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, pp. 753–766, 2008.
[13]  J. d. S. Cardoso, A. Capela, A. Rebelo, C. Guedes, and J. Pinto da Costa,“Staff detection with stable paths,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, p. 1134–1139, 2009.
[14]  A. Dutta, U. Pal, A. Fornés, and J. Lladós, “An efficient staff removal approach from printed musical documents,” in Proc. the 20th Int. Conf. on Pattern Recognition, p. 1965–1968, IEEE Computer Society, 2010.
[15]  A. Rebelo, G. Capela, and J. S. Cardoso, “Optical recognition of music symbols: a comparative study,” Int. J. Document Analysis Recognition, vol. 13, pp. 19–31, 2010.
[16]  I. Leplumey, J. Camillerapp, and G. Lorette, “A robust detector for music staves,” in Proc. the Int. Conf. on Document Analysis and Recognition, p. 203–210, 1993.
[17]  Q.-A. Arshad, W. Z. Khan, and Z. Ihsan, “Overview of algorithms and techniques for optical music recognition.”
[18]  I. Fujinaga, Adaptive Optical Music Recognition. PhD thesis, Faculty of Music, McGill University, Montréal, Canada, 1996.
[19]  A. Forns, A. Dutta, A. Gordo, and J. Llads, “CVC-MUSCIMA: A ground-truth of handwritten music score images for writer identification and staff removal,” Int. J. on Document Analysis and Recognition, vol. 15, pp. 243–251, 2012.
[20]  T. Kanungo, R. M. Haralick, H. S. Baird, W. Stuezle, and D. Madigan, “A statistical, nonparametric methodology for document degradation model validation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 1209–1223, 2000.
[21]  L. Pugin, “Optical music recognition of early typographic prints using hidden Markov models,” in Proc. the Int. Society for Music Information Retrieval, pp. 53– 56, 2006.
[22]  A. Fornés, S. Escalera, J. L Ladòs, G.Sànchez, P.Radeva, and O.Pujol,“Handwritten symbol recognition by a boosted blurred shape model with error correction,” in Proc. the 3rd Iberian Conf. on Pattern Recognition and Image Analysis, Part I. Springer, Berlin, pp. 13–21, 2007.
[23]  H. Miyao and Y. Nakano, “Head and stem extraction from printed music scores using a neural network approach,” in Proc. Third Int. Conf. on Document Analysis and Recognition, pp. 1074–1079, IEEE Computer Society, 1995.
[24]  L. Chen and C. Raphael, “Human-directed optical music recognition,” in Electronic Imaging, Document Recognition and Retrieval XXIII, pp. 1–9, Society for Imaging Science and Technology, 2016.
[25]  M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting applications to image analysis and automated cartography,” in Proc. Image Understanding Workshop, pp. 71–88, 1980.
[26]  C.-W. Hsu, C.-C. Chang, and C.-J. Lin, A Practical Guide to Support Vector Classification. National Taiwan University, Taipei 106, Taiwan, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *