帳號:guest(3.16.47.86)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):江衍霖
作者(外文):Chiang, Yen-Lin
論文名稱(中文):基於音框層級自動標註預測之繪圖式音樂檢索系統
論文名稱(外文):Sketch-based Music Retrieval Based on Frame-level Auto-tagging Predictions
指導教授(中文):楊奕軒
許秋婷
指導教授(外文):Yang, Yi-Hsuan
Hsu, Chiou-Ting
口試委員(中文):王浩全
陳宜欣
口試委員(外文):Wang, Hao-Chuan
Chen, Yi-Shin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:104062581
出版年(民國):106
畢業學年度:105
語文別:英文
論文頁數:71
中文關鍵詞:音樂檢索繪圖式檢索人機互動自動標註
外文關鍵詞:music retrievalsketch-based retrievalhuman-computer interactionauto-tagging
相關次數:
  • 推薦推薦:0
  • 點閱點閱:372
  • 評分評分:*****
  • 下載下載:22
  • 收藏收藏:0
本論文提出一套新型、直覺式的音樂檢索系統,讓使用者僅需透過簡易繪圖的方式,即能表達複雜的標籤位置條件,有利於精準找尋所想要的樂曲。例如在本系統中,使用者可精確搜尋一首先含有「小提琴」段落,再含有另一個出現「吉他」「慢速」段落的「古典」樂曲。為產生近萬首樂曲的資料庫以供本系統使用,本研究運用Liu and Yang提出基於深度學習技術的音框層級 (frame-level) 自動標註模型所產生之預測輸出,經由本論文另提出的前處理機制而生成段落層級 (segment-level) 的資料庫。實驗中藉由問卷調查與試用網站,評估本系統之方法所適用的對象及用途。經實驗結果得知本研究之繪圖式音樂檢索系統,相較於非繪圖式的對照系統而言,明顯具有較高的「創新性」及「使用者體驗滿意度」;且本系統的搜尋技術,尤其將造福多媒體創作相關產業的工作者。
We proposed a novel and intuitive music retrieval interface that allows users to precisely search music containing multiple localized social tags with merely simple sketches. For example, one may search for a “classical” music clip that also includes a segment with “violin”, followed by another segment which simultaneously includes “slow” and “guitar”, while such complex conditions can be simply and correctly expressed in the query. We also proposed a segment-level database with thousands of songs and its preprocessing algorithms for our music retrieval method, which leverages the predictions by Liu and Yang’s deep learning-based frame-level auto-tagging model. To assess how users feel about this system, we have conducted a user study with a questionnaire and a demo website. Experimental results show that: i) the proposed sketch-based system outperforms the two non-sketch-based baselines we implemented in “interestingness” and “satisfaction in user experience”; ii) our proposed method is especially beneficial to multimedia content creators.
1 Introduction 1
1.1 Motivation 1
1.2 Challenges 2
1.3 Thesis Organization 4
2 Background 5
2.1 Multimedia Information Retrieval 5
2.1.1 Tag-based Music Retrieval 6
2.1.2 Content-based Music Retrieval 6
2.1.3 Sketch-based Image Retrieval 7
2.1.4 A Novel Music Retrieval Idea 8
2.2 Music Auto-tagging 9
3 The Sketch-based Frontend Interface 12
3.1 Overview 12
3.1.1 The Query Sketching Panel 13
3.2 Sketching Behaviors 16
3.3 Keyword Suggestion Mechanisms 18
3.3.1 The Tag Name Interpreter 19
3.3.2 Symbolic Icons 21
3.4 Sketching Multiple Tag Rows 23
3.5 The Colors of Tag Rows 26
4 The Retrieval Backend 28
4.1 The Main Framework 28
4.2 Database Preprocessing Stage 29
4.2.1 Obtaining the Auto-tagging Prediction Results 30
4.2.2 Segmenting the Frame-level Sequences 31
4.2.3 Data Structure of the Database 37
4.3 User Query Simplification Stage 38
4.3.1 User Queries vs. Database Segment Representations 38
4.3.2 User Sketch Simplification 39
4.4 Retrieval Stage 44
4.4.1 Matching Algorithm and Score Calculation 44
4.4.2 Obtaining Secondary Scores 46
4.4.3 Obtaining the Ranking List 48
5 Visualizing Searching Results 49
5.1 Timeline Warping Algorithm 51
6 Experimental User Study 52
6.1 Users Recruitment and the Questionnaire 52
6.2 Text-based Baselines 54
6.3 Results and Analyses 56
7 Conclusion 60
7.1 Future Work 60
Appendix: The User Study Questionnaire 61
References 68
[1] J.-Y. Liu and Y.-H. Yang. Event Localization in Music Auto-tagging. In Proc. ACM Multimedia, 2016, [Online] https://github.com/ciaua/clip2frame.

[2] P. Knees. Searching for Audio by Sketching Mental Images of Sound: A Brave New Idea for Audio Retrieval in Creative Music Production. In Proc. ICMR, 2016.

[3] Y.-L. Chiang, Y.-S. Lee, W.-C. Hsieh, and J.-C. Wang. Efficient and Portable Content-based Music Retrieval System. In Proc. IEEE International Conference on Orange Technologies (ICOT), 2014.

[4] Y.-S. Lee, Y.-L. Chiang, P.-R. Lin, C.-H. Lin, and T.-C. Tai. Robust and Efficient Content-based Music Retrieval System. In Proc. APSIPA Transactions on Signal and Information Processing, 2016.

[5] C. Poynton. Digital Video and HDTV Algorithms and Interfaces. Morgan Kaufmann, 2003.

[6] J. Maller. RGB and YUV Color. 2016. [Online] http://joemaller.com/fcp/fxscript_yuv_color.shtml.

[7] P. Gaonkar, S. Varma, and R. Nikhare. A Survey on Content-Based Audio Retrieval Using Chord Progression. In Proc. International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, issue 1. January, 2016.

[8] Y.-H. Yang. Towards Real-time Music Auto-tagging Using Sparse Features. In Proc. ICME, 2013.

[9] D. Tingle, Y. E. Kim, and D. Turnbull. Exploring Automatic Music Annotation with Acoustically Objective Tags. In Proc. ACM MIR, 2010, pp. 55-62, [Online] http://cosmal.ucsd.edu/cal/projects/AnnRet/.

[10] E. Law, K. West, M. I. Mandel, M. Bay, and J. S. Downie. Evaluation of Algorithms Using Games: The Case of Music Tagging. In Proc. ISMIR, 2009, [Online] http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset.

[11] C. L. Zitnick and S. B. Kang. Stereo for Image-Based Rendering using Image Over-Segmentation. In Proc. IJCV, 2007.

[12] C. Pal, A. Chakrabarti, and R. Ghosh. A Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital Images. arXiv:1503.07297 [cs.CV], Mar. 2015.

[13] Peter Eggleston. Understanding Oversegmentation and Region Merging. Vision Systems Design, Dec. 1, 1998.

[14] Z. Lu, Z. Fu, T. Xiang, P. Han, L. Wang, and X. Gao. Learning from Weak and Noisy Labels for Semantic Segmentation. In Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.

[15] P. Vellachu and S. Abburu. Tag Based Audio Search Engine. In Proc. International Journal of Computer Science Issues (IJCSI), vol. 9, issue 2, no. 3, Mar. 2012.

[16] Google Inc. (2017) “AutoDraw - A.I. Experiments”. [Online] https://aiexperiments.withgoogle.com/autodraw.

[17] M. A. Casey et al. Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proceedings of the IEEE, vol. 96, no. 4, Apr. 2008.

[18] N. Borjian. A Survey on Query-by-Example based Music Information Retrieval. In Proc. International Journal of Computer Applications, vol. 158, no. 8, Jan. 2017.

[19] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. arXiv:1703.06870 [cs.CV], Mar. 2017.

[20] R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Medleydb: A Multitrack Dataset for Annotation-intensive MIR Research. In Proc. ISMIR, pp. 155-160, 2014. [Online] http://medleydb.weebly.com.

[21] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497v3 [cs.CV], Jan. 2016.

[22] S. Oramas and L. Espinosa. Tutorial - Natural Language Processing for Music Information Retrieval. Poblenou Campus, UPF, Jan. 2017. [Online] https://www.upf.edu/web/mdm-dtic/tutorial-natural-language-processing-for-music-information-retrieval.

[23] B. Schuller, G. Rigoll, and M. Lang. HMM-based Music Retrieval Using Stereophonic Feature Information and Framelength Adaptation. In Proc. ICME, 2003.

[24] H.-M. Wang and L.-S. Lee. Tone Recognition for Continuous Mandarin Speech with Limited Training Data Using Selected Context-dependent Hidden Markov Models. In Proc. Journal of the Chinese Institute of Engineers, vol. 17, no. 6, pp. 775-784, 1994.

[25] H. Li. (2016) Deep Learning for Information Retrieval. [Online] http://www.hangli-hl.com/uploads/3/4/4/6/34465961/deep_learning_for_information_retrieval.pdf.

[26] R. Typke. Music Retrieval Based on Melodic Similarity. M.A. Thesis, Utrecht University, Netherlands, 2007.

[27] Hooktheory, LLC. “Songs with the same chords - Theorytab”. [Online] https://www.hooktheory.com/trends.

[28] R. Typke, F. Wiering, and R. C. Veltkamp. A Survey of Music Information Retrieval Systems. In Proc. ISMIR, pp. 153–160, 2005.

[29] BlogPress. “What Is The Difference Between Tags And Keywords?” [Online] https://theblogpress.com/blog/what-is-the-difference-between-tags-and-keywords/.

[30] Ava Garcia. (Nov. 12, 2015) “The importance of Visual Consistency in UI Design” [Online] http://www.uxpassion.com/blog/the-importance-of-visual-consistency-in-ui-design/.

[31] D. D. McCracken and E. D. Reilly. Backus-Naur form (BNF). Encyclopedia of Computer Science, 4th edition, pp. 129-131, John Wiley and Sons Ltd. Chichester, UK, 2003. ISBN: 0-470-86412-5.

[32] L. Li, H. Zhao, W. Zhang, and W. Wang. An Action-Stack Based Selective-Undo Method in Feature Model Customization. In Proc. Intl. Conference on Software Reuse (ICSR), 2013.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *