帳號:guest(18.118.12.102)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳宗毅
作者(外文):Chen, Zong-Yi
論文名稱(中文):使用基於影像具空間金字塔池化層的卷積神經網路之軟體缺陷預測方法
論文名稱(外文):Image-Based Approach for Software Defect Prediction by using CNN with Spatial Pyramid Pooling Layer
指導教授(中文):黃慶育
指導教授(外文):Huang, Chin-Yu
口試委員(中文):林振緯
林其誼
蘇銓清
口試委員(外文):Lin, Jenn-Wei
Lin, Chi-Yi
Sue, Chuan-Ching
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:109062636
出版年(民國):111
畢業學年度:110
語文別:英文
論文頁數:82
中文關鍵詞:軟體可靠度軟體缺陷預測深度學習
外文關鍵詞:Software reliabilitySoftware defect predictionDeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:741
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
軟體缺陷預測是一種能夠辨識出專案中潛在缺陷的技術。在軟體開發生命週期中,為了確保軟體的品質與減少開發成本,軟體缺陷預測是不可或缺的。一般來說,軟體缺陷預測主要分為兩個步驟:從程式碼中提取特徵和使用機器學習方法建立分類模型。然而軟體缺陷預測仍然有一些限制,例如,機器學習模型必須要求輸入大小是固定的,但是每支程式大小幾乎都不固定。另一個限制是對於機器學習來說,訓練的資料量過少,擴增和平衡資料集非常困難。
在這份研究中,我們提出了一個名為SPP-DP (Spatial Pyramid Pooling for Defect Prediction)的方法。首先我們將程式碼轉換成影像,其中每一支程式都將以不同長寬的方式產生多個不同大小的影像,以此解決資料集平衡及擴增的限制。其次,我們將這些影像輸入到卷積神經網路(CNN)中訓練分類模型來預測軟體缺陷,為了讓神經網路能夠接受不同大小的影像輸入,我們引入空間金字塔池化層(SPP-Layer)的架構。在五個PROMISE儲存庫的數據集上與不同的深度學習軟體缺陷預測技術比較,我們提出的SPP-DP方法的效能在F-measure指標上有7%的提升、在AUC指標上提升了6.8%,而在MCC(Matthews correlation coefficient)指標上有24%的改進。實驗結果表明我們提出的SPP-DP是一種有效的軟體缺陷預測工具,能夠平衡資料集的同時也能提供更好的軟體缺陷辨識能力。
Software defect prediction is a technique that can identify potential defects in software projects. During the software development life cycle, to ensure software quality and reduce development cost and effort, software defect prediction technique is indispensable. Generally, software defect prediction is mainly divided into two procedures: extracting features from source code and building a classification model using machine learning methods. However, software defect prediction still has limitations. For example, machine learning models require a fixed input size, but the size of each program is mostly inconsistent. Another limitation is that the amount of training data may be too small for machine learning, and it is extremely difficult to handle class imbalance and dataset expansion.
In this study, we propose a method called spatial pyramid pooling for defect prediction (SPP-DP) that first converts all the source files into images, each of which will generate multiple images with different lengths and widths, to address the limitations of class imbalance handling and data augmentation. Second, we input these images into a convolutional neural network (CNN) to build a classifier to predict software defects. We added a spatial pyramid pooling layer (SPP-Layer) architecture to the CNN to relax the limitation of the fixed input size. Compared with different deep learning-based techniques on five PROMISE datasets, the performance of our proposed SPP-DP method has a 7% improvement according to F-measure criteria, 6.8% in AUC metric, and 24% in Matthews correlation coefficient (MCC) criteria. The experimental results show that our proposed SPP-DP is effective, as it can balance the dataset and provide better software defect identification ability.
Abstract i
Abstract in Chinese iii
Acknowledgement iv
List of Figures vii
List of Tables viii
List of Acronyms x
List of Notation xi
Chapter 1 Introduction 1
Chapter 2 Background and Related Works 7
2.1 Traditional Software Defect Prediction 7
2.2 Deep Learning-based Software Defect Prediction 10
Chapter 3 CNN with Spatial Pyramid Pooling for SDP 15
3.1 Convert Source Code into Token Vector 16
3.2 Converting Token Vector into Pixel Vector 17
3.3 Generate Different Aspect Ratios of RGBA Images 18
3.4 Modeling CNN with a Spatial Pyramid Pooling Layer 22
Chapter 4 Experiments and Analysis 29
4.1 Dataset Description 29
4.2 Baseline Methods 31
4.3 Comparison Criteria 32
4.4 Experimental Results 35
4.4.1 Within-Project Defect Prediction 35
4.4.2 Cross-Project Defect Prediction 39
Chapter 5 Further Discussion of Image-based SDP Methods 44
5.1 Performance of Other Color Spaces 44
5.2 Performance of Different Color Orders 50
5.2.1 Different Channel Orders of RGB 50
5.2.2 Different Channel Order of RGBA 56
Chapter 6 Research Questions and Threat to Validity 62
6.1 Research Questions 62
6.2 Threats to Validity 72
Chapter 7 Conclusion and Future Work 76
References 78
[1] R. Moser, W. Pedrycz, and G. Succi, “A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction,” 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 181-190, May 2008.
[2] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic review of fault prediction performance in software engineering,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276-1304, Nov.-Dec. 2012.
[3] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A Systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Transactions on Software Engineering, vol. 38, pp. 1276-1304, Nov.-Dec. 2012.
[4] T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: current results, limitations, new approaches,” Automated Software Engineering, vol. 17, pp. 375-407, Dec. 2010.
[5] H. H. Maurice, Elements of software science (operating and programming systems series): Elsevier Science Inc., 1977.
[6] T. J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976.
[7] S. R. Chidamber, and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. 476-493, June 1994.
[8] R. Harrison, S. J. Counsell, and R. V. Nithi., “An evaluation of the mood set of object-oriented software metrics,” IEEE Transactions on Software Engineering, vol. 24, no. 6, pp. 491-496, June 1998.
[9] T. Jiang, L. Tan, and S. Kim., “Personalized Defect Prediction,” 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 279-289, 2013.
[10] D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson, “Using the support vector machine as a classification method for software defect prediction with static code metrics,” Communications in Computer and Information Science, vol. 43, pp. 223-234, Jan. 2009.
[11] P. A. Habibi, V. Amrizal, and R. B. Bahaweres, “Cross-project defect prediction for web application using naive bayes (case study: Petstore web application),” 2018 International Workshop on Big Data and Information Security (IWBIS), pp. 13-18, 2018.
[12] M. J. Siers, and M. Z. Islam, “Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem,” Information Systems, vol. 51, pp. 62-71, 2015.
[13] T. Shippey, D. Bowes, and T. Hall, “Automatically Identifying Code Features for Software Defect Prediction: Using AST N-grams,” Information and Software Technology, vol. 106, pp. 142-160, Oct. 2018.
[14] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy., “Cross-project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process.,” In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE), Amsterdam, pp. 91-100, Aug. 2009.
[15] K. C. Louden, and K. A. Lambert, Programming languages: principles and practices. Cengage Learning, 3 ed., 2011.
[16] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, “Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition,” 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277-4280, 2012.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA, vol. 1, pp. 1097-1105, 2012.
[18] Q. Xuan, B. Fang, M. Yi Liu, J. W. IEEE, J. Zhang, Y. Zheng, and G. Bao, “Automatic Pearl Classification Machine Based on a Multistream Convolutional Neural Network,” IEEE Transactions on Industrial Electronics, vol. 65, no. 8, pp. 6538-6547, Aug. 2018.
[19] Y. Liu, C. Yang, Z. Gao, and Y. Yao, “Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes,” Chemometrics and Intelligent Laboratory Systems, vol. 174, pp. 15-21, Mar. 2018.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
[21] A. Mnih, and G. Hinton, “A scalable hierarchical distributed language model,” Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA, pp. 1081–1088, 2008.
[22] Y.-H. Tu, J. Du, and C.-H. Lee, “Speech Enhancement Based on Teacher–Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. 2080-2091, Dec. 2019.
[23] A. J. Keya, S. Afridi, A. S. Maria, S. S. Pinki, J. Ghosh, and M. F. Mridha, “Fake News Detection Based on Deep Learning,” 2021 International Conference on Science & Contemporary Technologies (ICSCT), pp. 1-6, Dec. 2021.
[24] T.-Y. Yu, C.-Y. Huang, and N. C. Fang, “Use of Deep Learning Model with Attention Mechanism for Software Fault Prediction,” 2021 8th International Conference on Dependable Systems and Their Applications (DSA), pp. 161-171, Aug. 2021.
[25] C.-Y. Huang, Arthur, C. Huang, M.-C. Yang, and W.-C. Su, “A Study of Applying Deep Learning-Based Weighted Combinations to Improve Defect Prediction Accuracy and Effectiveness,” 2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp. 1471-1475, Dec. 2019.
[26] S. Wang, T. Liu, and L. Tan, “Automatically Learning Semantic Features for Defect Prediction,” 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), New York, NY, USA, pp. 297-308, May. 2016.
[27] J. Li, P. He, J. Zhu, and M. R. Lyu, “Software Defect Prediction via Convolutional Neural Network,” 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 318-328, Jul. 2017.
[28] J. Chen, K. Hu, Y. Yu, Z. Chen, Q. Xuan, Y. Liu, and V. Filkov, “Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction,” Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE), New York, NY, USA, pp. 578-589, Oct. 2020.
[29] A. Krizhevsky, V. Nair, and G. Hinton., “Learning Multiple Layers of Features from Tiny Images,” University of Toronto, 2009.
[30] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading Digits in Natural Images with Unsupervised Feature Learning,” NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, Jun. 2009.
[32] D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, and J. C. Riquelme, “Preliminary comparison of techniques for dealing with imbalance in software defect prediction,” Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE), New York, NY, USA, pp. 1-10, 2014.
[33] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” Computer Vision (ECCV) 2014, pp. 346-361, Jun. 2014.
[34] "PROMISE Repository," http://openscience.us/repo/defect/ (accessed Jul. 2022).
[35] K. Pan, S. Kim, and J. E. James Whitehead, “Bug Classification Using Program Slicing Metrics,” 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation, pp. 31-42, Sep. 2006.
[36] T. Lee, J. Nam, D. Han, S. Kim, and H. Peter, “Micro Interaction Metrics for Defect Prediction,” Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11), New York, NY, USA, pp. 311-321, 2011.
[37] J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” 2013 35th International Conference on Software Engineering (ICSE), pp. 382-391, May 2013.
[38] Z. Marian, I.-G. Mircea, I.-G. Czibula, and G. Czibula, “A Novel Approach for Software Defect Prediction Using Fuzzy Decision Trees,” 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), vol. 240-247, Sep. 2016.
[39] S. Kim, E. James Whitehead, and Y. Zhang, “Classifying software changes: clean or buggy,” IEEE Transactions on Software Engineering, vol. 34, pp. 181-196, Mar.-Apr. 2008.
[40] H. D. Tessema, and S. L. Abebe, “Enhancing Just-in-Time Defect Prediction Using Change Request-based Metrics,” 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 511-515, Mar. 2021.
[41] X.-Y. Jing, S. Ying, Z.-W. Zhang, S.-S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” Proceedings of the 36th International Conference on Software Engineering (ICSE), pp. 414-423, May 2014.
[42] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proceedings of Workshop at ICLR, Jan. 2013.
[43] Y. Yang, X. Xia, D. Lo, and J. Grundy, “A Survey on Deep Learning for Software Engineering,” ACM Computing Surveys, New York, NY, USA, Dec. 2021.
[44] L. Pelayo, and S. Dick, “Applying Novel Resampling Strategies To Software Defect Prediction,” 2007 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), pp. 69-72, Jun. 2007.
[45] R. A. Vivanco, Y. Kamei, A. Monden, K.-i. Matsumoto, and D. Jin, “Using Search-Based Metric Selection and Oversampling to Predict Fault Prone Modules,” 2010 Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1-6, May 2010.
[46] S. Wang, and X. Yao, “Using Class Imbalance Learning for Software Defect Prediction,” IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434-443, Jun. 2013.
[47] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321-357, Jun. 2002.
[48] L. Mou, G. Li, Y. Liu, H. Peng, Z. Jin, Y. Xu, and L. Zhang, “Building Program Vector Representations for Deep Learning,” Knowledge Science, Engineering and Management (KSEM), Nov. 2015.
[49] C. Thunes, “Javalang,” GitHub repository, https://github.com/c2nes/javalang (accessed Jul. 2022).
[50] C. Shorten, and T. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” Journal of Big Data, vol. 6, pp. 1-48, Jul. 2019.
[51] T. Fawcett, “Introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, pp. 861-874, Jun. 2006.
[52] D. Chicco, and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, Jan. 2020.
[53] R. P. Espíndola, and N. Ebecken, “On extending F-measure and G-mean metrics to multi-class problems,” Sixth international conference on data mining, text mining and their business applications, vol. 35, pp. 25-34, Jan. 2005.
[54] P. Runeson, and M. Höst, “Guidline for conducting and reporting case study research in software engineering,” Empirical Software Engineering, vol. 14, pp. 131-164, Dec. 2008.
[55] "PyTorch," https://pytorch.org/ (accessed Jul. 2022).
[56] "TensorFlow," https://www.tensorflow.org/ (accessed Jul. 2022).
[57] "Lizard," http://www.lizard.ws/# (accessed Jul. 2022).

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *