帳號:guest(18.216.120.209)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):施彥廷
作者(外文):Shih, Yen-Ting
論文名稱(中文):結合經驗模態分解法以及常數Q倒頻譜係數和梅爾倒頻譜係數於自動語者驗證系統之研究
論文名稱(外文):Combining Empirical Mode Decomposition with Constant Q Cepstral Coefficients and Mel-Frequency Cepstral Coefficients on Automatic Speaker Verification System
指導教授(中文):金仲達
指導教授(外文):King, Chung-Ta
口試委員(中文):潘欣泰
劉奕汶
口試委員(外文):Pan, Shing-Tai
Liu, Yi-Wen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062701
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:33
中文關鍵詞:回放攻擊自動語者驗證常數Q倒頻譜係數 (CQCC)梅爾倒頻譜係數 (MFCC)經驗模態分解 (EMD)ASVspoof 2019
外文關鍵詞:Replay spoofingAutomatic speaker verificationConstant Q Cepstral Coefficients (CQCC)Mel-Frequency Cepstral Coefficients (MFCC)Empirical mode decomposition (EMD)ASVspoof 2019
相關次數:
  • 推薦推薦:0
  • 點閱點閱:250
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近幾年來,回放攻擊一直威脅著自動語者驗證系統(ASV)。經驗模態分解(EMD)是分析語音信號的一種有效方法。由於相對較高的頻率區域包含了更多的資訊來區分真假語音信號。因此,EMD是一種有效的分析語音信號的方法。語音信號被分解為多個本質模態函數(IMF)。我們提出了一種基於EMD的方法來實現檢測回放攻擊。該方法的主要思想是用EMD對信號進行分解,然後從不同的IMF組合中提取常數Q倒頻譜係數(CQCC)或者梅爾倒頻譜係數(MFCC)。根據我們在ASVspoof 2019資料庫上的結果,每個IMFs都在一定程度上提供了一些資訊。通過對部分IMFs的組合,我們可以得到比原始信號更好的結果。我們提出的方法可實現92.04%的高正確率和0.0693左右的低相等錯誤率(EER)。我們也討論了實驗結果可能的原因,包括EMD適合與CQCC結合,而不適合與MFCC結合的可能因素。
Replay spoofing attacks have been threatening the Automatic Speaker Verification (ASV) system in the past few years. Since relatively high frequency regions contain more information to differentiate genuine from spoofed speech signals. Empirical Mode Decomposition (EMD) is an effective method to analyze a speech signal, in which the signal is decomposed into several Intrinsic Mode Functions (IMF). We propose a method based on EMD for detecting spoofed speech signals. The main idea is to decompose the signal with EMD and then extract Constant Q Cepstral Coefficients (CQCC) or Mel-Frequency Cepstral Coefficients (MFCC) from different combinations of IMFs. According to the experiments using the ASVspoof 2019 database, we find that each IMF can provide information to a certain degree. By combining some of the IMFs, we can better detect spoofed speech signals. Our proposed approach attains a high accuracy rate of 92.04% and a low Equal Error Rate (EER) around 0.0693. We also discuss the possible reasons for our results, e.g., why EMD is suitable for combining with CQCC while it fails to combine with MFCC.
誌謝
Acknowledgements
摘要
Abstract
1 Introduction..........................................1
2 Related Work..........................................5
2.1 Empirical Mode Decomposition (EMD)....................5
2.2 Mel­-Frequency Cepstral Coefficients (MFCC)............9
2.3 Constant­-Q Cepstral Coefficients (CQCC)..............10
2.4 Automatic Speaker Verification (ASV) systems.........11
3 Method...............................................13
3.1 Motivation...........................................13
3.2 Proposed Approach....................................15
4 Experiments..........................................21
4.1 Database.............................................21
4.2 Evaluation Method....................................22
4.3 Experiment Details...................................23
5 Results and Discussion...............................25
5.1 Results..............................................25
5.2 Discussion...........................................28
6 Conclusion and Future Work...........................29
References..................................................31
[1] A. E. Rosenberg, “Automatic speaker verification: A review,” Proceedings of the IEEE, vol. 64, no. 4, pp. 475–487, 1976.
[2] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transac­tions on Acoustics, Speech, and Signal Processing, vol. 29, no. 2, pp. 254–272, 1981.
[3] J. P. Campbell, “Speaker recognition: a tutorial,” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997.
[4] F. Bimbot, J.­F. Bonastre, C. Fredouille, G. Gravier, I. Magrin­Chagnolleau, S. Meignier, T. Merlin, J. Ortega­García, D. Petrovska­Delacrétaz, and D. A. Reynolds, “A tutorial on text­independent speaker verification,” EURASIP Journal on Advances in Signal Process­ing, vol. 2004, no. 4, p. 101962, 2004.
[5] W. M. Campbell, J. P. Campbell, T. P. Gleason, D. A. Reynolds, and W. Shen, “Speaker verification using support vector machines and high­level features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2085–2094, 2007.
[6] N. W. Evans, T. Kinnunen, and J. Yamagishi, “Spoofing and countermeasures for auto­matic speaker verification.,” in Interspeech, pp. 925–929, 2013.
[7] Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi, M. Sahidullah, and A. Sizov, “Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[8] M. Todisco, H. Delgado, and N. Evans, “Constant q cepstral coefficients: A spoof­ing countermeasure for automatic speaker verification,” Computer Speech & Language, vol. 45, pp. 516–535, 2017.
[9] M. Singh and D. Pati, “Countermeasures to replay attacks: A review,” IETE Technical Review, pp. 1–16, 2019.
[10] H. Tak and H. Patil, “Novel linear frequency residual cepstral features for replay attack detection,” in Proc. Interspeech 2018, pp. 726–730, 2018.
[11] M. Todisco, H. Delgado, and N. Evans, “A new feature for automatic speaker verification anti-­spoofing: Constant q cepstral coefficients,” in Odyssey 2016, pp. 283–290, 2016.
[12] S. Jelil, R. K. Das, S. M. Prasanna, and R. Sinha, “Spoof detection using source, instanta­neous frequency and cepstral features.,” in Interspeech, pp. 22–26, 2017.
[13] Z. Chen, Z. Xie, W. Zhang, and X. Xu, “Resnet and model fusion for automatic spoofing detection.,” in Interspeech, pp. 102–106, 2017.
[14] N. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N.­C. Yen, C.­C. Tung, and H. Liu, “The empirical mode decomposition and the hilbert spectrum for nonlinear and non­stationary time series analysis,” Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 454, pp. 903–995, 03 1998.
[15] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.
[16] M. Bezoui, A. Elmoutaouakkil, and A. Beni­hssane, “Feature extraction of some quranic recitation using mel-­frequency cepstral coeficients (mfcc),” in 2016 5th International Conference on Multimedia Computing and Systems (ICMCS), pp. 127–131, 2016.
[17] A. B. Kandali, A. Routray, and T. K. Basu, “Emotion recognition from assamese speeches using mfcc features and gmm classifier,” in TENCON 2008 ­ 2008 IEEE Region 10 Con­ference, pp. 1–5, 2008.
[18] K. S. R. Murty and B. Yegnanarayana, “Combining evidence from residual phase and mfcc features for speaker recognition,” IEEE Signal Processing Letters, vol. 13, no. 1, pp. 52–55, 2006.
[19] M. Witkowski, S. Kacprzak, P. Żelasko, K. Kowalczyk, and J. Gałka, “Audio replay attack detection using high­-frequency features,” in Interspeech, pp. 27–31, 08 2017.
[20] T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans, J. Yamagishi, and K. A. Lee, “The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detec­tion,” 2017.
[21] H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K. A. Lee, and J. Yam­agishi, “Asvspoof 2017 version 2.0: meta­-data analysis and baseline enhancements,” in Odyssey 2018-­The Speaker and Language Recognition Workshop, 2018.
[22] M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, and K. A. Lee, “Asvspoof 2019: Future horizons in spoofed and fake audio detection,” arXiv preprint arXiv:1904.05441, 2019.
[23] P. Tapkir and H. Patil, “Novel empirical mode decomposition cepstral features for replay spoof detection,” in Proc. Interspeech 2018, pp. 721–725, 2018.
[24] S. Mankad and S. Garg, “On the performance of empirical mode decomposition­-based replay spoofing detection in speaker verification systems,” Progress in Artificial Intelli­gence, vol. 9, 08 2020.
[25] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[26] D. Cai, Z. Ni, W. Liu, W. Cai, G. Li, M. Li, D. Cai, Z. Ni, W. Liu, and W. Cai, “End­-to-­end deep learning framework for speech paralinguistics detection based on perception aware spectrum.,” in INTERSPEECH, pp. 3452–3456, 2017.
[27] L. Huang, Y. Gan, and H. Ye, “Audio­-replay attacks spoofing detection for automatic speaker verification system,” in 2019 IEEE International Conference on Artificial Intelli­gence and Computer Applications (ICAICA), pp. 392–396, 2019.
[28] H. Dinkel, Y. Qian, and K. Yu, “Investigating raw wave deep neural networks for end­-to­-end speaker spoofing detection,” IEEE/ACM Transactions on Audio, Speech, and Lan­guage Processing, vol. 26, no. 11, pp. 2002–2014, 2018.
[29] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The det curve in assessment of detection task performance,” tech. rep., National Inst of Standards and Technology Gaithersburg MD, 1997.
[30] B. Bakar and C. Hanilçi, “An experimental study on audio replay attack detection us­ing deep neural networks,” in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 132–138, 2018.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *