自動樂器家族分類__國立清華大學博碩士論文全文影像系統

帳號：guest(18.227.209.84) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	黃彥學
作者(外文):	Huang, Ian Shiue
論文名稱(中文):	自動樂器家族分類
論文名稱(外文):	Music Instrument Family Classification
指導教授(中文):	劉奕汶
指導教授(外文):	Liu, Yi Wen
口試委員(中文):	李祈均陳新陳志強
口試委員(外文):	Lee, Chi Chun Chen, Hsin Chan, Chi Keung
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	103061519
出版年(民國):	105
畢業學年度:	105
語文別:	英文、中文
論文頁數:	59
中文關鍵詞:	音樂訊號處理、機器學習、音色分類
外文關鍵詞:	music signal processing、machine learning、timbre classification
相關次數:	推薦:0 點閱:489 評分: 下載:3 收藏:0

常見的樂團通常包含了五個不同的樂手，分別是主唱、電吉他手、電貝斯手、鼓手以及鍵盤手，其中鍵盤手常見的問題為，市面上缺少著鍵盤手的樂譜，以至於需要參考其他樂手的譜以了解整首歌的進行，但通常這些譜是缺少樂器資訊的，使用者並無法得知某個時間點需要在鍵盤上模擬的樂器為何，為了解決這樣的問題，我們用了預錄好的三十種不同的樂器音檔，形成了六種不同的樂器家族的一秒檔案，並且利用這六種家族有次序的混合產生十五種雙重樂器以及二十種三重樂器的資料，這些加起來有四十一種類別的一秒音檔分別取了時域訊號以及頻域訊號堆疊起來當作特徵向量，並且透過一些機器學習演算法，使系統能自動分類樂器，本文獻的結果為，最近鄰居法於驗證(validation)與實測(testing)有最好的精準度，分別是71.1%以及65.2%。此外，我們也提出了十題的聽力測試，分別是九題的兩秒音檔以及一題的陷阱題，九題中的每一題多選題均須回答全對才算答對了完整一題，陷阱題須回答對才算有效樣本，否則為無效樣本，這樣的測試是為了檢測我們所使用的演算法是否超越了人類的能力，總共參與的樣本數有498人，但有效樣本數只有301人，這些人依照音樂能力分了三個等級，等級最高的人群確實表現超越了系統，但平均而言，機器的能力是大於人類的。

A typical music band is composed of a vocal, an electric guitarist, an electric bassist, a drummer, and a keyboardist. The task of a keyboardist is to utilize the music instruments plugged-in in a keyboard appropriately. Nevertheless, keyboard sheets are hard to obtain. A keyboard beginner usually refers to guitar tabs to practice, thus the information of the instruments decision is lost. In this thesis, we have built a system of classification in an attempt to solve this problem. Each music instrument family data is composed of various pitches in 1 second. Also, duo-timbre and trio-timbre are mixed in order to generate mixtures and they serve as different labels. Their feature vectors are composed of a low-pass filtered power spectrogram, a high-pass filtered power spectrogram, a chromagram, and the time domain waveform. Several machine learning methods have been applied respectively, yet not all of the methods perform well. The k-nearest neighbors method has the most accurate result in both validation step (71.1%) and testing step (65.2%). We also have carried out a hearing test in order to understand whether the ability of classification for humans can compete with computers. As a result, humans’ accuracy is lower than computers’ in average.

摘要 i
Abstract ii
1. Introducion 1
1.1 Music instruments 2
1.2 Timbres 2
1.3 Music instrument families 5
1.4 Literature review 6
1.5 Motivation 7
2. Methods 10
2.1 Training databases 10
2.2 Feature extraction 11
2.2.1 Time domain features 13
2.2.2 Frequency domain features 13
2.2.3 Pooling 17
2.2.4 Summary 19
2.3 Machine learning algorithms 19
2.3.1 k-nearest neighbors 20
2.3.2 Support vector machines 21
2.3.3 Neural networks 22
2.3.4 Nearest neighbor of sparse coding 27
2.3.5 Principal components analysis 30
2.4 Testing database 30
2.5 Block diagrams 32
2.6 Hearing test 34
3. Results 36
3.1 Cross-validation 36
3.2 Testing 42
3.3 Hearing tests and overall accuracy 44
3.4 Summary 45
4. Discussion 47
4.1 classifying distribution 47
4.2 kNN versus NNSC 49
4.3 Hearing tests 51
5. Conclusion and future works 53
Reference 55
Appendix 57

[1] A. de Cheveigné, & H. Kawahara. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917-1930
[2] A. Noll. (1967). Cepstrum Pitch Determination. Journal of the Acoustical Society America, 41(2), 293-309.
[3] T. Fujishima. (1999). Realtime chord recognition of musical sound: A system using common lisp music. In Proceedings of the International Computer Music Conference, 464-467.
[4] A. Sheh, & Daniel, P.W. Ellis. (2003). Chord Segmentation and Recognition using EM-Trained Hidden Markov Models. In Proceedings of the International Conference on Music Information Retrieval, 3, 183-189.
[5] Hung-Chen Chen, & Arbee, L. P. Chen. (2001). A music recommendation system based on music data grouping and user interests. Proceedings of the tenth international conference on Information and knowledge management, 231-238.
[6] Ja-Hwung Su, Hsin-Ho Yeh, Philip S. Yu, & Vincent S., Tseng. (2010). Music Recommendation Using Content and Context Information Mining. IEEE Intelligent Systems, 25(1), 16-26.
[7] J. C. Brown. (1999). Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. The Journal of the Acoustical Society of America, 105(3), 1933-1941.
[8] T. Kitahara, M. Goto, & H. G. Okuno. (2003). Musical instrument identification based on F0-dependent multivariate normal distribution. In Proceedings of Acoustics, Speech, and Signal Processing, 5, V-421.
[9] J. Marques, & P. Moreno. (1999) A study of musical instrument classification using Gaussian mixture models and support vector machines. Compaq, 99(4).
[10] A. Eronen. (2001). Comparison of features for musical instrument recognition. In Proc. IEEE Workshop Appl. Signal Process, Audio Acoust., 19–22.
[11] J. C. Brown, O. Houix, & S. McAdams. (2001). Feature dependence in the automatic identification of musical woodwind instruments. The Journal of the Acoustical Society of America, 109(3), 1064–1072.
[12] G. Agostini, M. Longari, & E. Poolastri. (2003). Musical instrument timbres classification with spectral features. EURASIP Journal on Applied Signal Processing, 2003(1). 5–14.
[13] I. Kaminskyj, & T. Czaszejko. (2005). Automatic recognition of isolated monophonic musical instrument sounds using kNN. Journal of Intelligent Information Systems, 24(2/3), 199–221.
[14] E. M. Hornbostel, & C. Sachs. (1914). Zeitschrift für Ethnologie German: Braunschweig, A. Limbach.
[15] S. Md Saha Goutam. (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543–565.
[16] http://impossible-music.wikia.com/wiki/Microsoft_GS_Wavetable_Synth
[17] B. Gold, N. Morgan, & D. Ellis. (2011). Speech and audio signal processing: processing and perception of speech and music. John Wiley & Sons.
[18] N. S. Roger. (1964). Circularity in judgments of relative pitch. Journal of the Acoustic Society of America, 36(212), 2346–2353.
[19] T. Cho, & J. P. Bello. (2014). On the relative importance of individual components of chord recognition systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 477-492.
[20] B. E. Boser, I. M. Guyon, & V. N. Vapnik. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, 144-152.
[21] M. B. Christopher. (2006). Pattern Recognition and Machine Learning (1st ed). America: Springer.
[22] Hsu Chih-Wei, & Lin Chih-Jen (2002). A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks.
[23] Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael, Ringgaard, & Chih-Jen Lin. (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research, 11, 1471–1490.
[24] F. Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. JMLR 12, 2825-2830.
[25] D. E. Rumelhart, G. E. Hinton, & R. J. Williams. (1988). Learning representations by back-propagating errors. Cognitive modeling, 5(3), 1.
[26] S. Shai. (2011). Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning, 107–194.
[27] J. Mairal, F. Bach, J. Ponce, & G. Sapiro. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning, 689-696.
[28] M. Schmidt. (2005). Least squares optimization with l1-norm regularization. CS542B Project Report of The University of British Columbia, 14-18.
[29] K. Pearson. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2(6), 559–572.
[30] J. P. Bello et al. (2005). A tutorial on onset detection in music signals. IEEE Transactions on speech and audio processing, 13(5), 1035-1047.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文