帳號:guest(18.118.139.82)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳冠達
作者(外文):Chen, Kuan-Dar
論文名稱(中文):以複合音訊特徵及叢集分群實現非監督式異音檢測系統
論文名稱(外文):Unsupervised Anomaly Sound Detection through Composite Audio Features and their Clustering
指導教授(中文):劉奕汶
指導教授(外文):Liu, Yi-Wen
口試委員(中文):白明憲
吳炤民
賴穎暉
學位類別:碩士
校院名稱:國立清華大學
系所名稱:AI智慧製造與工業物聯網產業碩士專班
學號:110137504
出版年(民國):112
畢業學年度:112
語文別:中文
論文頁數:48
中文關鍵詞:異音檢測非監督式學習叢集分群
外文關鍵詞:anomaly sound detectionunsupervised learningclustering
相關次數:
  • 推薦推薦:0
  • 點閱點閱:125
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
異常聲音檢測(anomaly sound detection)為產線上根據風扇馬達等作動元件發出的聲音判斷是否具有瑕疵的檢測過程,因為瑕疵樣本具有稀缺且難以蒐集的特性,在過去的研究中多以頻譜圖(spectrogram)搭配自動編碼器(autoencoder)進行非監督式學習。本研究以台達風扇聲音資料集為檢測目標,以卷積自動編碼器(convolutional autoencoder)作為模型架構搭配複小波結構相似性(CW-SSIM)作為損失函數,實驗比較不同音訊特徵轉化方式及其組合對於檢測效果之影響,提出以複合音訊特徵改善檢測效能。此外,於實驗過程中發現,資料集中的正常樣本具有隨著時間產生變異性進而致使模型欠擬合的特性,遂提出以叢集分群演算法(clustering algorithm)將正常樣本分群、各群訓練模型、後將模型結合的作法,改善系統檢測效能。同時,本研究提出以非監督方式訂定叢集分群數及瑕疵判斷閾值,從樣本蒐集至最終判定皆不需人工介入,構建完整的非監督式異音檢測解決方案。經前述操作,檢測系統於台達風扇聲音資料集之AUC(Area Under Curve)可達到95.95%,較基線系統提升34.96%,證實本研究所提出之方法能夠有效改善檢測效能。
Anomaly sound detection is the procedure to determine whether electric components such as fans or motors are normal or anomaly by detecting defective sounds. Due to the rarity and difficulty of collecting defective samples, previous research has mostly employed spectrograms with autoencoder for unsupervised learning. In this thesis, we focused on the Delta fan sound dataset as the main task, using convolutional autoencoder as the model architecture, combined with Complex Wavelet Structural Similarity (CW-SSIM) as the loss function. We compared different feature transformations and their combinations to investigate their effects, thereby proposing a composite audio feature method to enhance detection performance. Furthermore, we observed that normal samples in the dataset exhibit temporal variations that led to model under-fitting during the experimentation. To address this phenomenon, we proposed the method of clustering normal samples into groups, trained models for each cluster separately, and then combined the models to improve detection performance. Additionally, we developed a method of choosing the number of clusters and determining the threshold for anomaly detection in an unsupervised manner, which eliminated the need for manual collection of data and threshold determination and established a completely unsupervised anomaly detection solution. Through the proposed method, the anomaly detection system achieves an AUC (Area Under Curve) of 95.95% on the Delta fan sound dataset, resulting a 34.96% improvement over the baseline system. This demonstrates the effectiveness of the proposed method in enhancing the performance of defect detection.
1 緒論 1
1.1 問題描述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 目標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 文獻回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 音訊特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 叢集分群及整體學習 . . . . . . . . . . . . . . . . . . . . 4
1.3.3 非監督式異音檢測 . . . . . . . . . . . . . . . . . . . . . 4
1.4 本論文貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 章節編排 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 系統設計與訓練 6
2.1 台達風扇聲音資料集 . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 音訊預處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 資料分群 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 決定分群數 . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 叢集分群 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.4 訓練-驗證集分割 . . . . . . . . . . . . . . . . . . . . . . 16
2.4 模型訓練 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 模型架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 瑕疵指數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 訓練過程 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 模型測試 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 測試瑕疵指數校正 . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 子模型結合 . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.3 判定閾值 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.4 衡量指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 實驗結果與討論 25
3.1 實驗環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 基線系統 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 實驗一:不分群時的系統 . . . . . . . . . . . . . . . . . . . . . . 26
3.4 實驗二:最佳分群數分析 . . . . . . . . . . . . . . . . . . . . . . 27
3.5 實驗三:分群時的系統 . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 實驗四:閾值對召回率影響 . . . . . . . . . . . . . . . . . . . . 32
3.7 最佳配置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 結論與未來展望 35
References 37
Appendix 41
A.1 各配置下之 t-SNE 分析 . . . . . . . . . . . . . . . . . . . . . . . 41
A.2 口委的建議 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
[1] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of
Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
[2] F.-C. Chen, K.-D. Chen, and Y.-W. Liu, “Domestic sound event detection by
shift consistency mean-teacher training and adversarial domain adaptation,” in
International Congress on Acoustics, Oct 2022.
[3] R. Giri, S. V. Tenneti, K. Helwani, F. Cheng, U. Isik, and A. Krishnaswamy, “Unsupervised anomalous sound detection using self-supervised
classification and group masked autoencoder for density estimation,” tech.
rep., DCASE2020 Challenge, July 2020.
[4] M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, and H. Fan, “Heart sound classification based on improved mfcc features and convolutional recurrent neural
networks,” Neural Networks, vol. 130, pp. 22–32, 2020.
[5] V. Srinivasan, V. Ramalingam, and P. Arulmozhi, “Artificial neural network
based pathological voice classification using mfcc features,” International
Journal of Science, Environment and Technology, vol. 3, no. 1, pp. 291–302,
2014.
[6] Y. Liu, J. Guan, Q. Zhu, and W. Wang, “Anomalous sound detection using spectral-temporal information fusion,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 816–820, IEEE, 2022.
[7] J. Lopez, G. Stemmer, and P. Lopez-Meyer, “Ensemble of complementary
anomaly detectors under domain shifted conditions,” tech. rep., DCASE2021
Challenge, July 2021.
[8] Y. Su, K. Zhang, J. Wang, and K. Madani, “Environment sound classification
using a two-stream cnn based on decision-level fusion,” Sensors, vol. 19, no. 7,
2019.
[9] Y. Su, K. Zhang, J. Wang, D. Zhou, and K. Madani, “Performance analysis of
multiple aggregated acoustic features for environment sound classification,”
Applied Acoustics, vol. 158, p. 107050, 2020.
[10] R. Cohn and E. Holm, “Unsupervised machine learning via transfer learning
and k-means clustering to classify materials image data,” Integrating Materials
and Manufacturing Innovation, vol. 10, pp. 231–244, apr 2021.
[11] M. Li, S. Gururangan, T. Dettmers, M. Lewis, T. Althoff, N. A. Smith, and
L. Zettlemoyer, “Branch-train-merge: Embarrassingly parallel training of expert language models,” in International Conference on Learning Representations, May 2023.
[12] Y. Koizumi, Y. Kawaguchi, K. Imoto, T. Nakamura, Y. Nikaido, R. Tanabe, H. Purohit, K. Suefusa, T. Endo, M. Yasuda, and N. Harada, “Description and discussion on DCASE2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring,” in Proceedings of the
37
Detection and Classification of Acoustic Scenes and Events 2020 Workshop
(DCASE2020), pp. 81–85, Nov 2020.
[13] P. Daniluk, M. Gozdziewski, S. Kapka, and M. Kosmider, “Ensemble of autoencoder based systems for anomaly detection,” tech. rep., DCASE2020 Challenge, July 2020.
[14] J. Yamashita, H. Mori, S. Tamura, and S. Hayamizu, “Vae-based anomaly
detection with domain adaptation,” tech. rep., DCASE2021 Challenge, July
2021.
[15] A. Ribeiro, L. Matos, P. Pereira, E. Nunes, A. Ferreira, P. Cortez, and A. Pilastri, “Deep dense and convolutional autoencoders for unsupervised anomaly
detection in machine condition sounds,” tech. rep., DCASE2020 Challenge,
July 2020.
[16] M.-H. Nguyen, D.-Q. Nguyen, D.-Q. Nguyen, C.-N. Pham, D. Bui, and H.-D.
Han, “Deep convolutional variational autoencoder for anomalous sound detection,” in 2020 IEEE Eighth International Conference on Communications and
Electronics (ICCE), pp. 313–318, Jan 2021.
[17] P. Primus, “Reframing unsupervised machine condition monitoring as a
supervised classification task with outlier-exposed classifiers,” tech. rep.,
DCASE2020 Challenge, July 2020.
[18] Y. Deng, J. Liu, and J. Ma, “AITHU system for unsupervised anomalous sound
detection,” tech. rep., DCASE2021 Challenge, July 2021.
[19] Y. Zeng, H. Liu, L. Xu, Y. Zhou, and L. Gan, “Robust anomaly sound detection framework for machine condition monitoring,” tech. rep., DCASE2022
Challenge, July 2022.
[20] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and
O. Nieto, “librosa: Audio and music signal analysis in python,” in Proceedings
of the 14th python in science conference, vol. 8, pp. 18–25, 2015.
[21] K. Palanisamy, D. Singhania, and A. Yao, “Rethinking CNN models for audio
classification,” CoRR, vol. abs/2007.11154, 2020.
[22] P. Pedersen, “The mel scale,” Journal of Music Theory, vol. 9, no. 2, pp. 295–
308, 1965.
[23] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation
of block based transformation in mfcc computation for speaker recognition,”
Speech Communication, vol. 54, no. 4, pp. 543–565, 2012.
[24] M. Müller and S. Ewert, “Chroma Toolbox: Matlab implementations for extracting variants of chroma-based audio features,” in Proceedings of the 12th
International Society for Music Information Retrieval Conference, pp. 215–
220, ISMIR, Oct. 2011.
38
[25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A
large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” in International Conference on Learning Representations, 2015.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the
inception architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015.
[29] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,”
CoRR, vol. abs/1610.02357, 2016.
[30] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional
networks,” CoRR, vol. abs/1608.06993, 2016.
[31] M. Tan and Q. V. Le, “Efficientnetv2: Smaller models and faster training,”
CoRR, vol. abs/2104.00298, 2021.
[32] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics,
vol. 20, pp. 53–65, 1987.
[33] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2,
pp. 224–227, 1979.
[34] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu,
“An efficient k-means clustering algorithm: analysis and implementation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24,
no. 7, pp. 881–892, 2002.
[35] F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an
overview,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, vol. 2, no. 1, pp. 86–97, 2012.
[36] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality
assessment: from error visibility to structural similarity,” IEEE Transactions
on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
[37] M. P. Sampat, Z. Wang, S. Gupta, A. C. Bovik, and M. K. Markey, “Complex
wavelet structural similarity: A new image similarity index,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2385–2401, 2009.
[38] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp,
G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg,
D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens,
39
B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and
X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. Software available from tensorflow.org.
[39] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in International
Conference on Learning Representations, 2016.
[40] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” CoRR,
vol. abs/1701.07875, 2017.
[41] I. Haloui, J. S. Gupta, and V. Feuillard, “Anomaly detection with wasserstein
GAN,” CoRR, vol. abs/1812.02463, 2018.
[42] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in
Advances in Neural Information Processing Systems 33: Annual Conference
on Neural Information Processing Systems 2020, NeurIPS 2020, December
6-12, 2020, virtual, 2020.
[43] J. Wolleb, F. Bieder, R. Sandkühler, and P. C. Cattin, “Diffusion models for
medical anomaly detection,” in International Conference on Medical image
computing and computer-assisted intervention, pp. 35–45, Springer, 2022.
[44] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves,
N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu, “Wavenet: A generative
model for raw audio,” in The 9th ISCA Speech Synthesis Workshop, p. 125,
ISCA, 2016.
[45] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative
domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176, 2017.
[46] K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization:
A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 45, no. 4, pp. 4396–4415, 2022
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *