作者(外文):Shih, Yen-Ting
論文名稱(外文):Combining Empirical Mode Decomposition with Constant Q Cepstral Coefficients and Mel-Frequency Cepstral Coefficients on Automatic Speaker Verification System
指導教授(外文):King, Chung-Ta
口試委員(外文):Pan, Shing-Tai
Liu, Yi-Wen
中文關鍵詞:回放攻擊自動語者驗證常數Q倒頻譜係數 (CQCC)梅爾倒頻譜係數 (MFCC)經驗模態分解 (EMD)ASVspoof 2019
外文關鍵詞:Replay spoofingAutomatic speaker verificationConstant Q Cepstral Coefficients (CQCC)Mel-Frequency Cepstral Coefficients (MFCC)Empirical mode decomposition (EMD)ASVspoof 2019
近幾年來,回放攻擊一直威脅著自動語者驗證系統(ASV)。經驗模態分解(EMD)是分析語音信號的一種有效方法。由於相對較高的頻率區域包含了更多的資訊來區分真假語音信號。因此,EMD是一種有效的分析語音信號的方法。語音信號被分解為多個本質模態函數(IMF)。我們提出了一種基於EMD的方法來實現檢測回放攻擊。該方法的主要思想是用EMD對信號進行分解,然後從不同的IMF組合中提取常數Q倒頻譜係數(CQCC)或者梅爾倒頻譜係數(MFCC)。根據我們在ASVspoof 2019資料庫上的結果,每個IMFs都在一定程度上提供了一些資訊。通過對部分IMFs的組合,我們可以得到比原始信號更好的結果。我們提出的方法可實現92.04%的高正確率和0.0693左右的低相等錯誤率(EER)。我們也討論了實驗結果可能的原因,包括EMD適合與CQCC結合,而不適合與MFCC結合的可能因素。
Replay spoofing attacks have been threatening the Automatic Speaker Verification (ASV) system in the past few years. Since relatively high frequency regions contain more information to differentiate genuine from spoofed speech signals. Empirical Mode Decomposition (EMD) is an effective method to analyze a speech signal, in which the signal is decomposed into several Intrinsic Mode Functions (IMF). We propose a method based on EMD for detecting spoofed speech signals. The main idea is to decompose the signal with EMD and then extract Constant Q Cepstral Coefficients (CQCC) or Mel-Frequency Cepstral Coefficients (MFCC) from different combinations of IMFs. According to the experiments using the ASVspoof 2019 database, we find that each IMF can provide information to a certain degree. By combining some of the IMFs, we can better detect spoofed speech signals. Our proposed approach attains a high accuracy rate of 92.04% and a low Equal Error Rate (EER) around 0.0693. We also discuss the possible reasons for our results, e.g., why EMD is suitable for combining with CQCC while it fails to combine with MFCC.
1 Introduction..........................................1
2 Related Work..........................................5
2.1 Empirical Mode Decomposition (EMD)....................5
2.2 Mel­-Frequency Cepstral Coefficients (MFCC)............9
2.3 Constant­-Q Cepstral Coefficients (CQCC)..............10
2.4 Automatic Speaker Verification (ASV) systems.........11
3 Method...............................................13
3.1 Motivation...........................................13
3.2 Proposed Approach....................................15
4 Experiments..........................................21
4.1 Database.............................................21
4.2 Evaluation Method....................................22
4.3 Experiment Details...................................23
5 Results and Discussion...............................25
5.1 Results..............................................25
5.2 Discussion...........................................28
6 Conclusion and Future Work...........................29
