作者(外文):Liao, Pei-Yu
論文名稱(外文):Improving Audio Fingerprinting for Music Retrieval
外文關鍵詞:music retrievalaudio fingerprintinglandmarkSVMconfidence measuresegmental music query
本論文中,我們針對現有的音樂聲紋辨識 (Audio Fingerprinting, AFP)技術進行改良。音樂聲紋辨識是一種快速的音樂檢索方式,使用者可在噪音環境下錄製一段正在播放的音樂片段,作為搜尋目標,在音樂聲紋辨識系統中找到最符合此播放音樂的歌曲。
為了提升本系統之辨識率,我們將查詢片段分類成易/不易尋找出正確答案的兩類,並建立一個分類機制:查詢片段在辨識之前,先以SVM作為分類器,進行查詢片段的分類,依據分類的結果進行四次或八次的特徵擷取,再進行辨識比對。此方法實驗得到的辨識率為 84.18%,接近特徵擷取八次的84.28%,且辨識時間比特徵擷取八次減少了2%的時間。

The goal of this research is to improve the current audio fingerprinting technique. Audio fingerprinting is a fast and convenient music retrieval method that allows a user to retrieve an intended song and related information by recording a portion of the song under a noisy environment.
In order to improve the recognition rate of our system, we classify the queried segment into one of the two classes: easy or difficult to find the intended song. The recognition mechanism is as follows. Before the queried segment is recognized, we adopt SVM as our classifier to classify the queried segment. Depending on its class, we conduct 4 or 8 times of landmark finding on this query and then perform the matching step as usual. The recognition rate by using our method is 84.18%, which is close to 84.28% by using 8 times of landmarks finding, and the matching time is also reduced by 2% of the time required by using 8 times of landmarks finding.
In addition, we employ a verification mechanism using confidence measure in our audio fingerprinting system to determine if the query is in our database or not. If the confidence result is lower than a certain threshold, our system rejects this query. When we set the matched landmark count per second as 1.5, we can filter about 86% of queried segments which are not in our database.
At last, if the matched landmark count of the user-defined duration of the queried segment is greater than the confidence threshold, our system returns the result directly. Otherwise, the system extends the duration of the queried segment for searching and matching. Therefore, we divide a query into two parts with equal length to conduct the experiment. To solve the problem of missing landmarks on edge between two parts, we overlap 15 frames towards the front for the second part of the query segment. And we also find the landmarks forwards only. This effectively solves the problem of finding duplicate landmarks of the former segment when finding landmarks bidirectionally. Comparing to the original method, this method achieves a 21% reduction in response time and a 2% improvement in recognition rate.

Keywords: music retrieval, audio fingerprinting, landmark, SVM, confidence measure, segmental music query
摘要 I
Abstract II
謝誌 IV
目錄 V
表目次 VIII
圖目次 IX
第一章 緒論 11
1.1 研究背景 11
1.2 研究目的 11
1.3 章節概要 12
第二章 相關研究 13
2.1 Ke’s Method 13
2.1.1 Representing Audio as Images 14
2.1.2 Filter Selection and Modeling 16
2.1.3 Retrieval 19
2.2 Baluja’s Method 20
2.2.1 Fingerprint creation 20
2.2.2 Min-Hash-based sub-fingerprints 23
2.2.3 Database-creation process 24
2.2.4 Retrieval process 24
2.3 Wang’s Method 26
2.3.1 Robust Constellations 26
2.3.2 Fast Combinatorial Hashing 27
2.3.3 Searching and Scoring 28
第三章 實作方法 31
3.1 簡介及辨識流程架構 31
3.2 尋找landmark 32
3.2.1 尋找threshold及salient peaks 33
3.2.2 組成landmarks 38
3.3 Landmark儲存方法 39
3.3.1 產生雜湊鍵及雜湊值 39
3.3.2 儲存landmark 39
3.4 比對資料庫 41
3.4.1 從資料庫取回相對應雜湊鍵的雜湊值 41
3.4.2 計算時間偏移量 42
3.4.3 還原歌曲編號及時間偏移量 44
3.4.4 排名結果 45
第四章 改良方法與實驗結果分析 46
4.1 實驗環境設定 46
4.2 改良方法一:查詢片段分類 46
4.2.1 改良目的 46
4.2.2 改良方法 47
4.2.3 資料庫與語料簡介 51
4.2.4 實驗結果與分析 52
4.3 改良方法二:信心度測量 53
4.3.1 改良目的 53
4.3.2 改良方法 54
4.3.3 資料庫與測試語料簡介 55
4.3.4 實驗結果與分析 56
4.4 改良方法三:分段查詢對於辨識情形的影響 58
4.4.1 改良目的 58
4.4.2 改良方法 58
4.4.3 資料庫與測試語料簡介 61
4.4.4 實驗結果與分析 62
第五章 結論與未來研究方向 67
5.1 結論 67
5.2 未來研究方向 68
參考文獻 69
