釣魚網站偵測：基於網頁模仿行為一致性__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	游譯萱
作者(外文):	Yu, Yi-Hsuan
論文名稱(中文):	釣魚網站偵測：基於網頁模仿行為一致性
論文名稱(外文):	Detecting Phishing Based on Consistency Checking Imitating Behavior
指導教授(中文):	陳宜欣
指導教授(外文):	Chen, Yi-Shin
口試委員(中文):	高榮駿張嘉惠
口試委員(外文):	Jung-Chun Kao Chia-Hui Chang
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	101065504
出版年(民國):	103
畢業學年度:	102
語文別:	英文
論文頁數:	39
中文關鍵詞:	釣魚網站、分類、一致性
外文關鍵詞:	Phishing、Classification、Consistency
相關次數:	推薦:0 點閱:452 評分: 下載:0 收藏:0

釣魚攻擊是一種現今在網路上很氾濫的網路犯罪行為之一。駭客期望透過釣魚攻擊來竊取使用者的個人資料，而這種攻擊將會造成使用者的金錢損失。其中，最普遍的一種釣魚攻擊便是製造一個釣魚網站，使其模仿一個知名網站的網頁內容或者是網址，如此一來使用者將因為很難辨認其真偽而誤觸攻擊。然而，我們仍可透過分析製作此種網站的意圖來檢查該網頁的內容與網址的一致性。在這篇論文裡，我們提出了一個分析內文的方法來偵測釣魚網站。最後，根據實驗數據，我們的方法可以達到 93/% 的準確率並同時擁有很低的時間複雜度。

Phishing attack is a cybercrime. The hackers try to steal users' personal information so that they can earn some benefit from it. It may cause a lot of losses to users, especially the loss of money. One of the way that can lure users is to mimic a web page by its content or URL. By this way, users will have difficulties to distinguish the websites. However, we can still check some of the consistency from the URL to the web content or the intentions of building such websites, since phishing websites will usually contain some inconsistency patterns. In this paper, we proposed a content based approach to detect the phishing websites. Our proposed method can achieve 93% accuracy while the time complexity is low.

1 INTRODUCTION
2 RELATED WORK
2.1 Non-content based approaches
2.2 Content based approaches
3 CONTENT CONSISTENCY
3.1 Randomness of the URL(RU)
3.2 Position of the domain token(CPos)
3.3 Ratio of the found domain token(RDT)
3.4 Conceptual similarity(CSim)
4 METHODOLOGY
4.1 Pre-Filtering Phase
4.2 Web Page Classification Phase
4.2.1 Redirecting web sites(Re)
4.2.2 Popup windows(Pop)
4.2.3 Ratio of existing keywords(RKey)
4.2.4 Webpage authorization(WA)
4.2.5 Conceptual consistency
4.3 Imitated URL identification
5 Experiments
5.1 Experimental dataset
5.1.1 Experimental setup
5.1.2 Experimental results
6 Conclusions

[1] The alexa website.
[2] The wot website.
[3] The apwg website, 2013.
[4] D. Aha and D. Kibler. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991.
[5] R. B. Basnet, A. H. Sung, and Q. Liu. Learning to detect phishing urls. 2014.
[6] A. Blum, B. Wardman, T. Solorio, and G. Warner. Lexical feature based phishing url detection using online learning. In Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security, pages 54–60. ACM, 2010.
[7] L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
[8] R. A.-F. C. Center. Rsa monthly online fraud report, 2012.
[9] Y. Chen, H. Liu, Y. Yu, and P. Wang. Detecting phishing by checking content consistency. In Information Reuse and Integration, 2014. IRI 2014. IEEE International Conference on, 2014.
[10] N. Chou, R. Ledesma, Y. Teraguchi, J. C. Mitchell, et al. Client-side defense against web-based identity theft. In NDSS, 2004.
[11] J. G. Cleary, L. E. Trigg, et al. K*: An instance-based learner using an entropic distance measure. In ICML, pages 108–114, 1995.
[12] W. W. Cohen. Fast effective rule induction. In Twelfth International Conference on Machine Learning, pages 115–123. Morgan Kaufmann, 1995.
[13] G. Demiroz and A. Guvenir. Classification by voting feature intervals. In 9th European Conference on Machine Learning, pages 85–92. Springer, 1997.
[14] L. Dong, E. Frank, and S. Kramer. Ensembles of balanced nested dichotomies for multi-class problems. In PKDD, pages 84–95. Springer, 2005.
[15] E. Frank and M. Hall. A simple approach to ordinal classification. In 12th European Conference on Machine Learning, pages 145–156. Springer, 2001.
[16] E. Frank, G. Holmes, R. Kirkby, and M. Hall. Racing committees for large datasets. In Proceedings of the 5th International Conferenceon Discovery Science, pages 153–164. Springer, 2002.
[17] E. Frank and S. Kramer. Ensembles of nested dichotomies for multi-class problems. In Twenty-first International Conference on Machine Learning. ACM, 2004.
[18] E. Frank and I. H. Witten. Generating accurate rule sets without global optimization. In J. Shavlik, editor, Fifteenth International Conference on Machine Learning, pages 144–151. Morgan Kaufmann, 1998.
[19] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Thirteenth International Conference on Machine Learning, pages 148–156, San Francisco, 1996. Morgan Kaufmann.
[20] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. In 11th Annual Conference on Computational Learning Theory, pages 209–217, New York, NY, 1998. ACM Press.
[21] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Technical report, Stanford University, 1998.
[22] M. Hall and E. Frank. Combining naive bayes and decision tables. In Proceedings of the 21st Florida Artificial Intelligence Society Conference (FLAIRS), pages 318–319. AAAI press, 2008.
[23] R. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.
[24] P. Hsu, P. Liu, and Y. Chen. Using ontology to map categories in blog. In AIDM ’06 Proceedings of the International Workshop on Integrating AI and Data Mining, pages 65–72, 2006.
[25] G. H. John and P. Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338–345. Morgan Kaufmann Publishers Inc., 1995.
[26] R. Kohavi. The power of decision tables. In 8th European Conference on Machine Learning, pages 174–189. Springer, 1995.
[27] N. Landwehr, M. Hall, and E. Frank. Logistic model trees. 95(1-2):161–205, 2005.
[28] S. le Cessie and J. van Houwelingen. Ridge estimators in logistic regression. Applied Statistics, 41(1):191–201, 1992.
[29] L. Lee and Y. Chen. Clustering user queries into conceptual spaces. M. eng. thesis, National Tsing Hua University, Hsinchu, Taiwan, Aug. 2010.
[30] V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. In Soviet physics doklady, volume 10, page 707, 1966.
[31] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Beyond blacklists: learning to detect malicious web sites from suspicious urls. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1245–1254. ACM, 2009.
[32] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying suspicious urls: an application of large-scale online learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 681–688. ACM, 2009.
[33] P. Melville and R. J. Mooney. Constructing diverse classifier ensembles using artificial training examples. In Eighteenth International Joint Conference on Artificial Intelligence, pages 505–510, 2003.
[34] P. Melville and R. J. Mooney. Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems, 2004. submitted.
[35] J. Pearl. Bayesian netwcrks: A model cf’ self-activated memory for evidential reasoning. 1985.
[36] J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.
[37] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta. Phishnet: predictive blacklisting to detect phishing attacks. In INFOCOM, 2010 Proceedings IEEE, pages 1–5. IEEE, 2010.
[38] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619–1630, 2006.
[39] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.
[40] M. Sumner, E. Frank, and M. Hall. Speeding up logistic model tree induction. In 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 675–683. Springer, 2005.
[41] K. M. Ting and I. H. Witten. Stacking bagged and dagged models. In D. H. Fisher, editor, Fourteenth international Conference on Machine Learning, pages 367–375, San Francisco, CA, 1997. Morgan Kaufmann Publishers.
[42] C. Whittaker, B. Ryner, and M. Nazif. Large-scale automatic classification of phishing pages. In NDSS, 2010.
[43] Y. Zhang, J. I. Hong, and L. F. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW ’07 Proceedings of the 16th international conference on World Wide Web, pages 639–648, 2007.

(此全文未開放授權)
電子全文
摘要檔

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文