帳號:guest(18.97.9.169)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士以作者查詢全國書目勘誤回報
作者:陳沿廷
作者(外文):Yan-Ting Chen
論文名稱:用於工控系統非均衡網路流量資料之降噪自動編碼器極限梯度提升異常的偵測與分類
論文名稱(外文):Anomaly Detection and Classification Based on Denoising Autoencoder and XGBoost for Imbalanced Network Traffic Data in Industrial Control Systems
指導教授:江振瑞
指導教授(外文):Jehn-Ruey Jiang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學號:108522110
出版年:110
畢業學年度:109
語文別:中文
論文頁數:66
中文關鍵詞:異常分類異常偵測自動編碼器資料不平衡F1-分數工業控制系統精確度召回率極限梯度提升
外文關鍵詞:Anomaly ClassificationAnomaly DetectionAutoencoderData ImbalanceF1-scoreIndustrial Control SystemPrecisionRecallXGBoost
相關次數:
  • 推薦推薦:0
  • 點閱點閱:2
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
工控系統(Industrial Control System, ICS)整合資訊技術(Information Technology, IT)與運營技術(Operational Technology, OT),是近年工業領域熱門的研究主題。 ICS 廣泛應用於控制與管理透過網路聯結的重要機器設備,若 ICS 遭受來源不明的網路攻擊,可能導致設備運作異常,因而造成巨大經濟損失甚至於影響人員的安危。因此,針對ICS 網路安全的研究是關鍵且必要的。
本篇論文提出一個關於ICS 網路安全的異常偵測與分類方法,用以偵測使用工業傳輸協定 Modbus 與 S7 Comm (S7 Communication) 的網路流量資料 (network traffic data)是否異常,並對異常資料進行分類。本論文提出的方法包含三項主要步驟,以最大化異常偵測與分類效果。首先,使用降噪自動編碼器 (Denoising Autoencoder, DAE) 去除資料中潛在的雜訊。其次,面對含有異常行為的不平衡(imbalanced)資料,採用SMOTE (Synthetic Minority Oversampling Technique) 與 Tomek link (T-Link) 結合的資料過採樣(oversampling)與欠採樣(undersampling)方法,用以增加特定樣本的特徵代表性。最後使用極限梯度提升(eXtreme Gradient Boosting, XGBoost)建立異常偵測與分類模型。
本篇論文採用真實鐵路工業ICS的Electra資料集,用以評估所提方法的效能並和其他相關方法進行比較。實驗結果顯示,本篇論文提出的異常偵測與分類的方法,相較於其他異常偵測方法有較佳的精確度 (precision)、召回率 (recall) 與 F1-score 。
The industrial control system (ICS), which integrates information technology (IT) and operational technology (OT), is a hot research topic in the industrial field in recent years. ICS is widely used to control and manage important machines and devices connected through networks. If the ICS suffers from network attacks, machines and devices may work abnormally, causing huge economic losses and even affecting the safety of personnel. Therefore, research on ICS network security is critical and necessary.
This thesis proposes an anomaly detection and classification method for ICS network security to detect and classify abnormalities in network traffic data of industrial field protocols like Modbus and S7 Communication (S7 Comm). The proposed method contains three major steps, as shown below. First, it uses the denoising autoencoder (DAE) to remove potential noise in data. Second, in face of imbalanced data of abnormalities, the synthetic minority oversampling technique (SMOTE) and the Tomek link (T-Link) mechanism are used to oversample and undersample data to increase representative characteristics of particular samples. Finally, extreme gradient boosting (XGBoost) is used to build anomaly detection and classification models.
The real-life railway industry ICS dataset Electra is used to evaluate the effectiveness of the proposed method. The evaluation results are compared with those of other related methods. The proposed method is shown to have better precision, recall and F1-score than others in terms of both anomaly detection and anomaly classification.
中文摘要 IX
Abstract X
誌謝 XI
圖目錄 XIV
表目錄 XV
一、 緒論 1
1.1 研究背景與動機 1
1.2 研究目的與方法 2
1.3 論文架構 3
二、 背景知識 4
2.1 異常偵測 4
2.2 機器學習 6
2.2.1 機器學習介紹 6
2.2.2 監督式學習 6
2.2.3 非監督式學習 7
2.2.4 半監督式學習 7
2.2.5 強化式學習 7
2.3 深度學習 8
2.3.1 深度學習介紹 8
2.3.2 多層感知器 8
2.3.3 激勵函數 10
2.3.4 反向傳播演算法 12
2.4 自動編碼器 14
2.4.1 自動編碼器介紹 14
2.4.2 正規化自動編碼器 15
2.5 過採樣與欠採樣 17
2.5.1 不平衡資料 17
2.5.2 隨機過採樣與隨機欠採樣 17
2.5.3 合成少數群集過採樣技術 18
2.5.4 Tomek Links 19
2.6 集成式學習 20
2.6.1 集成式學習介紹 20
2.6.2 引導聚合 20
2.6.3 自適應提升 21
2.6.4 梯度提升 22
2.7 相關研究 24
三、 問題定義 28
3.1 問題定義 28
3.2 標籤定義 30
四、 研究方法 31
4.1 資料前處理 31
4.2 模型架構 32
4.3 評估標準 35
五、 實驗和分析 38
5.1 實驗環境 38
5.2 實驗結果與分析 38
5.2.1 使用Electra Modbus資料集進行異常偵測的效能比較 39
5.2.2 使用Electra Modbus資料集進行異常分類的效能比較 39
5.2.3 使用Electra S7Comm資料集進行異常偵測的效能比較 42
5.2.4 使用Electra S7Comm資料集進行異常分類的效能比較 43
六、 結論和未來展望 46
參考文獻 47
[1] Stuxnet
(https://en.wikipedia.org/wiki/Stuxnet)
[2] Karnouskos, S. (2011, November). Stuxnet worm impact on industrial cyber-physical system security. In IECON 2011-37th Annual Conference of the IEEE Industrial Electronics Society (pp. 4490-4494). IEEE.
[3] Kaplan, A., & Haenlein, M. (2019). Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons, 62(1), 15-25.
[4] Guérillot, D. R., & Bruyelle, J. (2017, March). Uncertainty assessment in production forecast with an optimal artificial neural network. In SPE Middle East oil & gas show and conference. Society of Petroleum Engineers.
[5] Activation function
(https://en.wikipedia.org/wiki/Activation_function)
[6] Yang, Y. C., & Jiang, J. R. (2019, October). Web-based Machine Learning Modeling in a Cyber-Physical System Construction Assistant. In 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE) (pp. 478-481). IEEE
[7] Autoencoder
(https://en.wikipedia.org/wiki/Autoencoder)
[8] Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008, July). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (pp. 1096-1103).

[9] Park, S., Gil, M. S., Im, H., & Moon, Y. S. (2019). Measurement noise recommendation for efficient Kalman filtering over a large amount of sensor data. Sensors, 19(5), 1168.
[10] Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011, January). Contractive auto-encoders: Explicit invariance during feature extraction. In Icml.
[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[12] Tomek, I. (1976). Two modifications of CNN.
[13] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
[14] Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.
[15] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
[16] Kearns, M. (1988). Learning Boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88 Harvard University Aikem Computation Laboratory.
[17] Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM (JACM), 41(1), 67-95.
[18] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[19] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[20] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[21] Centre for Research in Cyber Security, iTrust.
(https://itrust.sutd.edu.sg/)
[22] Mathur, A. P., & Tippenhauer, N. O. (2016, April). SWaT: a water treatment testbed for research and training on ICS security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater) (pp. 31-36). IEEE.
[23] Ahmed, C. M., Palleti, V. R., & Mathur, A. P. (2017, April). WADI: a water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks (pp. 25-28).
[24] Adepu, S., Kandasamy, N. K., & Mathur, A. (2018). Epic: An electric power testbed for research and training in cyber physical systems security. In Computer Security (pp. 37-52). Springer, Cham.
[25] Gómez, Á. L. P., Maimó, L. F., Celdran, A. H., Clemente, F. J. G., Sarmiento, C. C., Masa, C. J. D. C., & Nistal, R. M. (2019). On the generation of anomaly detection datasets in industrial control systems. IEEE Access, 7, 177460-177473.
[26] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[27] Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.
[28] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 eighth ieee international conference on data mining (pp. 413-422). IEEE.
[29] Ning, B., Qiu, S., Zhao, T., & Li, Y. Power IoT Attack Samples Generation and Detection Using Generative Adversarial Networks. In 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2) (pp. 3721-3724). IEEE.
[30] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661.
[31] https://sthalles.github.io/intro-to-gans/
[32] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). PMLR
[33] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[34] Batista, G. E., Bazzan, A. L., & Monard, M. C. (2003, December). Balancing Training Data for Automated Annotation of Keywords: a Case Study. In WOB (pp. 10-18).
[35] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
[36] dmlc XGBoost
(https://xgboost.ai/)
[37] Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. Z. (2019). XAI—Explainable artificial intelligence. Science Robotics, 4(37).
論文全文檔清單如下︰
1.電子全文連結(3321.797K)
(電子全文 已開放)
紙本授權註記:2023/9/1開放
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *