帳號:guest(18.224.69.84)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):周啓松
作者(外文):Chou, Chi Sung
論文名稱(中文):基於統計特徵之應用程式辨識系統研製
論文名稱(外文):Realization of Application Identification System Based on Statistical Signatures
指導教授(中文):黃能富
指導教授(外文):Huang, Nen Fu
口試委員(中文):陳俊良
石維寬
學位類別:碩士
校院名稱:國立清華大學
系所名稱:通訊工程研究所
學號:102064542
出版年(民國):104
畢業學年度:103
語文別:英文中文
論文頁數:54
中文關鍵詞:應用程式辨識機器學習流量辨識
外文關鍵詞:Application identificationMachine learning algorithmTraffic classification
相關次數:
  • 推薦推薦:0
  • 點閱點閱:125
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
流量分類在網路的管理中,扮演了一個重要的角色,由於加密流量無法被解析取得其內容,致使傳統的深度封包檢測技術對其連線無法加以辨識,也因為如此,不依靠封包內容的統計式分類技術開始發展,然而因為區域性的問題,也造成了這個技術難以實現於真實世界中。在許多的機器學習研究中,只能透過深度封包檢測技術來獲取訓練集所對應的真實解,這種做法卻也與加密流量問題相互矛盾。此外,隨著智慧型裝置的流量逐年倍增,智慧型應用的辨識功能也無法再被忽視。不幸的,上述這些未解決的問題也使得統計式流量辨識技術僅能實作於研究中。
為了解決這些問題並且提高準確度,本論文提出基於統計特徵的應用辨識系統。此系統採用「應用層回合制」演算法與統計方法進行包含加密流量在內的流量行為分析,所有的統計資訊與對應的應用程式名稱將被送雲端平台藉由多種機器學習演算法建立分類模型。針對區域性問題,本論文以多個實驗結果證明此問題嚴重性,並且設計出多層式架構來對訓練集進行分群與建立多個分類模型,最後於分類時,基於網路環境來選擇出最佳的模型進行流量辨識。本論文加入了智慧型裝置訓練流程,促使系統具有智慧型應用的辨識功能。最後本系統佈建於雲端平台,擴展性架構將使系統能承受大量分類請求,並且增加本系統運作於真實世界的可能性。
Traffic classification plays an important role on the management of networks. Traditional deep packet inspection (DPI) cannot be used to analyze encrypted traffic if the key pairs haven’t be captured in the early communication flow. The statistical based traffic classification is developed to analyze traffic without the content of packets; however, the locational issue causes that the statistical based classification is hard to work in the real world. And generating ground truth of training data via DPI in most machine learning studies is in contradiction to the problem of encrypted traffic. Furthermore, as traffic of smart devices continues its meteoric rise within the few years, the classification of apps also no longer can be ignored. Unfortunately, the above-mentioned pending issues lead to the statistical based traffic classification in the academic research.
This thesis proposes a statistical signatures based application identification system to solve the following problems and accelerate classification. This system uses the application round technique and statistic methods to handle encrypted traffic and analyze flow behaviors. All of the statistical information are sent to servers and trained to classified models by multiple machine learning algorithms. For locational issues, the multi-stage architecture is designed to separate training data and build multiple models, it also selects the best model based on the network environment for apps identification feature. The smart device training architecture is contained in this system that enables the feature of apps classification. At last, the system is deployed in the cloud, and the scalable architecture allows the system to handle large amounts of classifying requests. This system is possible to implement in reality.
Chapter 1 Introduction 1
Chapter 2 Related Works 4
2.1 Traffic Classification Techniques 4
2.2 Frameworks of Application Identification 6
Chapter 3 System Core Design 9
3.1 Machine learning Algorithms 9
3.1.1 Selection of Machine Learning Algorithms 10
3.1.2 Definition of Attributes 14
3.2 Multi-stage Architecture 17
3.2.1 Discussion of Locational Issue 17
3.2.2 Multi-stage Architecture Design 19
Chapter 4 System Implementation 22
4.1.1 Management Server 22
4.1.2 Rule Server 25
4.1.3 Trainer and Classifier 26
4.1.4 DNS Module 28
4.1.5 Train Phone and AppMeta Server 31
4.2 System Overview 31
4.2.1 Training Service 32
4.2.2 Smart Device Training Service 33
4.2.3 Classifying Service 34
Chapter 5 Experimentation 35
5.1 Experimental Architecture and Data Sets 35
5.2 Experimentation of Classification Delay 37
5.3 Experimentation of Locational Issue 39
Chapter 6 Conclusion and Future Works 49
[1] Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos, "BLINC: multilevel traffic classification in the dark," presented at the Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, Philadelphia, Pennsylvania, USA, October 2005, pp. 229-240.
[2] Center for Applied Internet Data Analysis. CoralReef Software Suite. Available: https://www.caida.org/tools/measurement/coralreef/
[3] Laurent Bernaille, Renata Teixeira, Ismael Akodkenou, Augustin Soule, and Kave Salamatian, "Traffic classification on the fly," SIGCOMM Comput. Commun. Rev., vol. 36, April 2006 pp. 23-26.
[4] F. Constantinou and P. Mavrommatis, "Identifying Known and Unknown Peer-to-Peer Traffic," in Network Computing and Applications, 2006. NCA 2006. Fifth IEEE International Symposium on, Kendal Square, Cambridge, MA USA, July 2006, pp. 93-102.
[5] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano, "nDPI: Open-source high-speed deep packet inspection," in Wireless Communications and Mobile Computing Conference (IWCMC), 2014 International, Nicosia, Cyprus, August 2014, pp. 617-622.
[6] Olivier Beaudoux and Michel Beaudouin-Lafon, "OPENDPI: A TOOLKIT FOR DEVELOPING DOCUMENT-CENTERED ENVIRONMENTS," in Enterprise Information Systems VII, Chin-Sheng Chen, Joaquim Filipe, Isabel Seruca, and José Cordeiro, Eds., ed: Springer Netherlands, January 2006, pp. 231-239.
[7] Pan Tian, Guo Xiaoyu, Zhang Chenhui, Jiang Junchen, Wu Hao, and Liu Bin, "Tracking millions of flows in high speed networks for application identification," in INFOCOM, 2012 Proceedings IEEE, Orlando, FL, USA, March 2012, pp. 1647-1655.
[8] B. Hullar, S. Laki, and A. Gyorgy, "Efficient Methods for Early Protocol Identification," Selected Areas in Communications, IEEE Journal on, vol. 32, November 2014, pp. 1907-1918.
[9] F. Dehghani, N. Movahhedinia, M. R. Khayyambashi, and S. Kianian, "Real-Time Traffic Classification Based on Statistical and Payload Content Features," in Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on, Wuhan, China, May 2010, pp. 1-4.
[10] T. Bujlow, T. Riaz, and J. M. Pedersen, "Classification of HTTP traffic based on C5.0 Machine Learning Algorithm," in Computers and Communications (ISCC), 2012 IEEE Symposium on, Cappadocia, Turkey, July 2012, pp. 000882-000887.
[11] T. T. T. Nguyen and G. Armitage, "A survey of techniques for internet traffic classification using machine learning," Communications Surveys & Tutorials, IEEE, vol. 10, January 2009, pp. 56-76.
[12] T. Bujlow, T. Riaz, and J. M. Pedersen, "A method for classification of network traffic based on C5.0 Machine Learning Algorithm," in Computing, Networking and Communications (ICNC), 2012 International Conference on, Maui, Hawaii, USA, January 2012, pp. 237-241.
[13] S. Zander and G. Armitage, "Practical machine learning based multimedia traffic classification for distributed QoS management," in Local Computer Networks (LCN), 2011 IEEE 36th Conference on, Bonn, Germany, October 2011, pp. 399-406.
[14] Yu Jin, Nick Duffield, Jeffrey Erman, Patrick Haffner, Subhabrata Sen, and Zhi-Li Zhang, "A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks," ACM Trans. Knowl. Discov. Data, vol. 6, March 2012, pp. 1-34.
[15] Huang Nen-Fu, Jai Gin-Yuan, Chen Chih-Hao, and Chao Han-Chieh, "On the cloud-based network traffic classification and applications identification service," in Mobile and Wireless Networking (iCOST), 2012 International Conference on Selected Topics in, Avignon, France, July 2012, pp. 36-41.
[16] Marcin Pietrzyk, Jean-Laurent Costeux, Guillaume Urvoy-Keller, and Taoufik En-Najjary, "Challenging statistical classification for operational usage: the ADSL case," presented at the Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, Chicago, Illinois, USA, November 2009, pp. 122-135.
[17] Suchul Lee, Hyunchul Kim, Dhiman Barman, Sungryoul Lee, Chong-Kwon Kim, Ted Kwon, et al., "NeTraMark: a network traffic classification benchmark," SIGCOMM Comput. Commun. Rev., vol. 41, January 2011, pp. 22-30.
[18] W. De Donato, Pescape, X, A., and A. Dainotti, "Traffic identification engine: an open platform for traffic classification," Network, IEEE, vol. 28, April 2014, pp. 56-64.
[19] G. Aceto, A. Dainotti, W. De Donato, and A. Pescape, "PortLoad: Taking the Best of Two Worlds in Traffic Classification," in INFOCOM IEEE Conference on Computer Communications Workshops , 2010, San Diego, CA, USA, March 2010, pp. 1-5.
[20] Alberto Dainotti, Antonio Pescapé, and Carlo Sansone, "Early Classification of Network Traffic through Multi-classification," in Traffic Monitoring and Analysis. vol. 6613, Jordi Domingo-Pascual, Yuval Shavitt, and Steve Uhlig, Eds., ed: Springer Berlin Heidelberg, 2011, pp. 122-135.
[21] A. Dainotti, A. Pescape, and K. C. Claffy, "Issues and future directions in traffic classification," Network, IEEE, vol. 26, January 2012, pp. 35-40.
[22] Valentín Carela-Español, Tomasz Bujlow, and Pere Barlet-Ros, "Is Our Ground-Truth for Traffic Classification Reliable?," in Passive and Active Measurement. vol. 8362, Michalis Faloutsos and Aleksandar Kuzmanovic, Eds., ed: Springer International Publishing, 2014, pp. 98-108.
[23] Haitao He, Xiaonan Luo, Feiteng Ma, Chunhui Che, and Jianmin Wang, "Network traffic classification based on ensemble learning and co-training," Science in China Series F: Information Sciences, vol. 52, February 2009, pp. 338-346.
[24] Jagan Mohan Reddy and Chittaranjan Hota, "P2P traffic classification using ensemble learning," presented at the Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop, New Delhi, India, October 2013 pp. 1-4.
[25] Deng Shengxiong, Luo Jiangtao, Liu Yong, Wang Xiaoping, and Yang Junchao, "Ensemble learning model for P2P traffic identification," in Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on, Xiamen, China, August 2014, pp. 436-440.
[26] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten, "The WEKA data mining software: an update," SIGKDD Explor. Newsl., vol. 11, June 2009, pp. 10-18.
[27] Leo Breiman, "Bagging predictors," Machine Learning, vol. 24, August 1996, pp. 123-140.
[28] Weka. RandomCommittee. Available: http://wiki.pentaho.com/display/DATAMINING/RandomCommittee
[29] Ho Tin Kam, "The random subspace method for constructing decision forests," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, August 1998, pp. 832-844.
[30] Leo Breiman, "Random Forests," Machine Learning, vol. 45, October 2001, pp. 5-32.
[31] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, October 2006, pp. 1619-1630.
[32] Weka. MultiScheme. Available: http://wiki.pentaho.com/display/DATAMINING/MultiScheme
[33] Xue Yibo, Wang Dawei, and Zhang Luoshi, "Traffic classification: Issues and challenges," in Computing, Networking and Communications (ICNC), 2013 International Conference on, San Diego, USA, January 2013, pp. 545-549.
[34] Cisco. Snort. Available: https://www.snort.org/
[35] Martin Roesch, "Snort: Lightweight Intrusion Detection for Networks," in Large Installation System Administration (LISA), 1999 13th Systems Administration Conference on, Seattle, Washington, USA, November 1999, pp. 229-238.
[36] Claudia Beleites, Ute Neugebauer, Thomas Bocklitz, Christoph Krafft, and Jürgen Popp, "Sample size planning for classification models," Analytica Chimica Acta, vol. 760, January 2013, pp. 25-33.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *