帳號:guest(3.141.12.117)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳昶志
作者(外文):Chen, Chang-Zhih
論文名稱(中文):基於指標的關鍵字偵測早期退出架構
論文名稱(外文):Metric-Based Early-Exit Architecture for Keyword Spotting
指導教授(中文):張世杰
指導教授(外文):Chang, Shih-Chieh
口試委員(中文):陳添福
林泰吉
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062543
出版年(民國):111
畢業學年度:110
語文別:英文
論文頁數:27
中文關鍵詞:關鍵字偵測早期退出架構
外文關鍵詞:KeywordSpottingEarlyExitArchitecture
相關次數:
  • 推薦推薦:0
  • 點閱點閱:133
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,關鍵字偵測因爲其在語音互動界面中扮演了關鍵作用而受到
關注。大多數語音互動界面依靠關鍵字偵測來啟動。但是由於硬體上的限
制,關鍵字偵測模型的計算成本不能太高。早期退出架構試圖透過允許部
分樣本預測結果藉由早期退出分支的方式提早從模型輸出,使得模型得以
用節能的方式運行。
在本文中,我們提出了一種基於評估指標的早期退出閾值,以提高早
期退出架構在關鍵字偵測上的性能。我們也展示了我們方法的靈活性,允
許我們為特定類別設置早期退出閾值,使得我們的模型可以滿足不同的關
鍵字偵測應用需求,從而節省更多的計算量。大量實驗表明與基線模型相
比,我們的方法可以實現更好的準確度與計算量的權衡。
Keyword spotting (KWS) has been gaining attention over recent years because it plays a crucial role in voice interfaces. Most voice interfaces depend on KWS to start interac- tions. However, KWS models cannot be computationally expensive because of hardware constraints. Early-exit architectures try to run models in an energy-efficient way by al- lowing prediction results for some samples to exit models via early-exit branches. In this paper, we proposed metric-based early-exit threshold that can improve the performance of early-exit architecture on KWS. We also show the flexibility of our method by allowing us to set early-exit thresholds for specific classes so that our model can meet different KWS requirements, saving more computation. Extensive experiments show that our method can achieve a better accuracy-computation trade-off compared to the baseline model.
1 Introduction 1
2 Related Works 6
3 Methods 8
3.1 Model Architecture 8
3.2 Training process 10
3.3 Fast inference process 11
4 Experiments 15
4.1 Dataset & EvaluationMetric 15
4.2 Experimental Setups 16
4.3 Experimental Results 16
5 Conclusions 23
6 References 24
[1] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu. Ghostnet: More features from cheap operations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1577–1586, 2020.
[2] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
[3] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[4] Y. He, X. Dong, G. Kang, Y. Fu, C. Yan, and Y. Yang. Asymptotic soft filter prun- ing for deep convolutional neural networks. IEEE Transactions on Cybernetics, 50(8):3594–3604, 2020.
[5] G. E. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. ArXiv, abs/1503.02531, 2015.
[6] B. Kim, S. Chang, J. Lee, and D. Sung. Broadcasted Residual Learning for Efficient Keyword Spotting. arXiv e-prints, page arXiv:2106.04140, June 2021.
[7] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Wein- berger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
[9] H. Li, H. Zhang, X. Qi, R. Yang, and G. Huang. Improved techniques for training adaptive deep networks. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1891–1900, 2019.
[10] M. Phuong and C. Lampert. Distillation-based training for multi-exit architectures. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1355–1364, 2019.
[11] T. N. Sainath and C. Parada. Convolutional neural networks for small-footprint keyword spotting. In INTERSPEECH, 2015.
[12] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
[13] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014.
[14] S. Teerapittayanon, B. McDanel, and H. Kung. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Confer- ence on Distributed Computing Systems (ICDCS), pages 328–339, 2017.
[15] S. Teerapittayanon, B. McDanel, and H. T. Kung. Branchynet: Fast inference via early exiting from deep neural networks. 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2464–2469, 2016.
[16] R. Vygon and N. Mikhaylovskiy. Learning efficient representations for keyword spotting with triplet loss. ArXiv, abs/2101.04792, 2021.
[17] P. Warden. Speech commands: A dataset for limited-vocabulary speech recognition. ArXiv, abs/1804.03209, 2018.
[18] L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3712–3721, 2019.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *