帳號:guest(18.222.182.29)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃博浩
作者(外文):Huang, Po-Hao
論文名稱(中文):以梯度近似和自動編碼器產生黑盒音頻對抗例
論文名稱(外文):Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders
指導教授(中文):王廷基
指導教授(外文):Wang, Ting-Chi
口試委員(中文):何宗易
黃世旭
口試委員(外文):Ho, Tsung-Yi
Huang, Shih-Hsu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062637
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:39
中文關鍵詞:對抗式攻擊對抗式防禦梯度近似自動編碼器深度神經網路
外文關鍵詞:Adversarial attackAdversarial defenseGradient approximationAutoencoderDeep neural network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:114
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著深度學習技術越來越發達,各種應用的安全性也變得越來越重要,根據研究發現,自動語音識別系統容易受到對抗例的攻擊。現有的攻擊主要將產生對抗例的方法制定成一個最佳化的問題,並以疊代的方式來取得結果,但儘管這些攻擊擁有較高的攻擊準確率,但它們仍然需要大量時間來生成對抗例,這使得它們很難應用於現實世界中。在這篇論文中,我們提出了一種可以即時產生對抗力的攻擊算法,該算法利用梯度近似法來訓練自動編碼器來生成語音對抗例。實驗結果表明,我們生成的對抗例可以輕易地攻擊黑盒的語音指令辨識系統,且只需要一次推論就可以產生對抗例。與之前的攻擊相比,我們的攻擊可以在不到0.004秒的時間內產生對抗例,並獲得更高的攻擊成功率。我們也將攻擊方法做兩種延伸,第一種延伸是透過集合攻擊來同時攻擊多個不同的語音指令辨識系統,第二種延伸是針對現有具備防禦能力的模型做攻擊,實驗結果也充分支持我們這些延伸方法的有效性。
Deep Neural Network (DNN) has been increasingly successful when applied to various security-crucial fields. However, recent research shows that DNN based Automatic Speech Recognition (ASR) systems are vulnerable to adversarial attacks. Specifically, these attacks mainly focus on formulating a process of adversarial example generation as iterative, optimization-based attacks. Although these attacks make significant progress, they still take a large generation time to produce adversarial examples, which makes them extremely difficult to launch in real-world scenarios. In this thesis, we propose a real-time attack framework that utilizes the neural network trained by the gradient approximation method to generate adversarial examples on Keyword Spotting (KWS) systems. The experimental results show that these generated adversarial examples can easily fool a black-box KWS system to output incorrect results with only one inference. In comparison to previous works, our attack can achieve a higher success rate with less than 0.004 seconds.
We also extend our work. The first extension is for an ensemble attack against multiple KWS systems. While the second extension is to attack a KWS system equipped with existing defense mechanisms. The efficacy of each extension is well supported by promising experimental results.
1 Introduction 1
2 Related Works 5
3 Methodology 9
4 Experimental Results 22
5 Conclusion 34
References 36
[1] Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, “A novel scheme for speaker recognition using a phonetically-aware deep neural network,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699, 2014.
[2] N. Carlini and D. A. Wagner, “Audio adversarial examples: targeted attacks on speech-to-text,” CoRR, vol. abs/1801.01944, 2018.
[3] C. Anirban, A. Manaar, D. Vishal, C. Anupam, and M. Debdeep, “Adversarial attacks and defences: A survey,” CoRR, vol. abs/1810.00069, 2018.
[4] Y. Zhang, N. Suda, L. Lai, and V. Chandra, “Hello edge: keyword spotting on microcontrollers,” CoRR, vol. abs/1711.07128, 2017.
[5] C. Teacher, H. Kellett, and L. Focht, “Experimental, limited vocabulary, speech recognizer,” IEEE Transactions on Audio and Electroacoustics, vol. 15, pp. 127–130, 1967.
[6] J. Wilpon, L. Rabiner, C.-H. Lee, and E. Goldman, “Automatic recognition of keywords in unconstrained speech using hidden markov models,” IEEE Transactions on Audio and Electroacoustics, vol. 38, pp. 1870–1878, 1990. 36
[7] R. C. Rose and D. B. Paul, “A hidden markov model based keyword recognition system,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 129–132, 1990.
[8] S. Fern´andez, A. Graves, and J. Schmidhuber, “An application of recurrent neural networks to discriminative keyword spotting,” in Proc. Internet Corporation for Assigned Names and Numbers (ICANN), pp. 220–229, 2007.
[9] G. Tucker, M. Wu, M. Sun, S. Panchapagesan, G. Fu, and S. Vitaladevuni, “Model compression applied to small-footprint keyword spotting,” in Proc. International Speech Communication Association (INTERSPEECH), pp. 1878–1882, 2016.
[10] G. Nakkiran, R. Alvarez, R. Prabhavalkar, and C. Parada, “Compressing deep neural networks using a rank-constrained topology,” in Proc. International Speech Communication Association (INTERSPEECH), pp. 1473—-1477, 2015.
[11] N. Carlini and D. A. Wagner, “Towards evaluating the robustness of neural networks,” in Proc. IEEE Symposium on Security and Privacy (SP), pp. 39–57, 2017.
[12] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng, “Deep speech: Scaling up end-to-end speech recognition,” CoRR, vol. abs/1412.5567, 2014.
[13] Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, “Imperceptible, robust, and targeted adversarial examples for automatic speech recognition,” in Proc. International Conference on Machine Learning (ICML), 2019. 37
[14] V. Subramanian, E. Benetos, N. Xu, S. McDonald, and M. Sandler, “Adversarial attacks in sound event classification,” CoRR, vol. abs/1907.02477, 2019.
[15] M. Alzantot, B. Balaji, and M. B. Srivastava, “Did you hear that? adversarial examples against automatic speech recognition,” CoRR, vol. abs/1801.00554, 2018.
[16] R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, “Targeted adversarial examples for black box audio systems,” CoRR, vol. abs/1805.07820, 2018.
[17] J. Vadillo and R. Santana, “Universal adversarial examples in speech command classification,” CoRR, vol. abs/1911.10182, 2019.
[18] K.-H. Chang, P.-H. Huang, H. Yu, Y. Jin, and T.-C. Wang, “Audio adversarial examples generation with recurrent neural networks,” in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 488–493, 2020.
[19] Y. Gong, B. Li, C. Poellabauer, and Y. Shi, “Real-time adversarial attacks,” in Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 4672–4680, 2019.
[20] Z. Yang, B. Li, P.-Y. Chen, and D. Song, “Towards mitigating audio adversarial perturbations,” in Proc. International Conference on Learning Representations (ICLR) workshop, 2018.
[21] K. Rajaratnam, K. Shah, and J. Kalita, “Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition,” in Proc. Conference on Computational Linguistics and Speech Processing (ROCLING), 2018.
38
[22] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” CoRR, vol. abs/1406.2661, 2014.
[23] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” in Proc. ACM Workshop on Artificial Intelligence and Security, pp. 15–26, 2017.
[24] C.-C. Tu, P. Ting, P.-Y. Chen, S. Liu, H. Zhang, J. Yi, C.-J. Hsieh, and S.-M. Cheng, “Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks,” in Proc. AAAI, 2019.
[25] F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” CoRR, vol. abs/2002.08347, 2020.
[26] J. Yang, Q. Zhang, R. Fang, B. Ni, J. Liu, and Q. Tian, “Adversarial attack and defense on point sets,” CoRR, vol. abs/1902.10899, 2019.
[27] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer, A. Prakash, T. Kohno, and D. Song, “Physical adversarial examples for object detectors,” CoRR, vol. abs/1807.07769, 2018.
[28] “Speech commands dataset." https://research.googleblog.com/2017/08/launchingspeech-commands-dataset.html.
[29] H. Yakura and J. Sakuma, “Robust audio adversarial example for a physical attack,” CoRR, vol. abs/1810.11793, 2018.
[30] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” CoRR, vol. abs/1412.6572, 2015.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *