Keyword spotting (KWS) has been gaining attention over recent years because it plays a crucial role in voice interfaces. Most voice interfaces depend on KWS to start interac- tions. However, KWS models cannot be computationally expensive because of hardware constraints. Early-exit architectures try to run models in an energy-efficient way by al- lowing prediction results for some samples to exit models via early-exit branches. In this paper, we proposed metric-based early-exit threshold that can improve the performance of early-exit architecture on KWS. We also show the flexibility of our method by allowing us to set early-exit thresholds for specific classes so that our model can meet different KWS requirements, saving more computation. Extensive experiments show that our method can achieve a better accuracy-computation trade-off compared to the baseline model.
1 Introduction 1
2 Related Works 6
3 Methods 8
3.1 Model Architecture 8
3.2 Training process 10
3.3 Fast inference process 11
4 Experiments 15
4.1 Dataset & EvaluationMetric 15
4.2 Experimental Setups 16
4.3 Experimental Results 16
5 Conclusions 23
6 References 24
