運用考量可理解性與品質驅動模組化多特徵叢集演算法以增強軟體維護度_

帳號：guest(18.216.26.104) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳怡婷
作者(外文):	Chen, Yi-Ting
論文名稱(中文):	運用考量可理解性與品質驅動模組化多特徵叢集演算法以增強軟體維護度
論文名稱(外文):	Enhancing Software Maintainability by Using Multi-Pattern Clustering Algorithm Considering Both Understandability and Quality Driven Modularization
指導教授(中文):	黃慶育
指導教授(外文):	Huang, Chin-Yu
口試委員(中文):	蘇銓清林振緯
口試委員(外文):	Sue, Chuan-Ching Lin, Jenn-Wei
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	104065533
出版年(民國):	106
畢業學年度:	105
語文別:	英文
論文頁數:	73
中文關鍵詞:	軟體架構回復、軟體叢集、軟體模組化、叢集分析、軟體維護度、可理解性、軟體品質
外文關鍵詞:	Software architecture recovery、Software clustering、Software modularization、Cluster analysis、Software maintainability、Understandability、Software quality
相關次數:	推薦:0 點閱:569 評分: 下載:0 收藏:0

　　近十年來，軟體對我們的生活而言是絕對必需品。軟體開發生命週期對軟體工程師而言是用來生產軟體的流程，在軟體開發生命週期的階段中，軟體維護佔了多數成本，此外在長時間的維護活動之後，軟體架構總是會變形、軟體大小與複雜度會隨之提升、品質也會變差。其中一個解決這些問題的方案為軟體模組化，這是一個減少時間成本、提升品質的有效方法。而叢集(Clustering)是種直覺的方式來實作模組化並將程式碼分類為更小的區塊，然而傳統的叢集演算法有些缺點，例如未定義的模組數量、模組化結果可理解性很低等等，因此許多這個領域的研究嘗試找出不同的特徵、不同的相似度計算方式或是一些特殊的演算法來提升效能。
　　在這份研究中，我們提出多特徵叢集演算法(MPC)來作軟體架構回復。在這個演算法中共有五個主要步驟，分別為前置作業、標記檔案、連鎖關聯的收集、階層凝聚行的演算法、修正叢集結果。在實驗中，我們使用了擁有不同複雜度與用途的三種開源程式系統與一個閉源程式系統，比較MPC 演算法與其他傳統軟體叢集分析工具的效能，例如單一連結演算法、完整連接演算法、未加權連接演算法、加權連接演算法、聯合演算法與理解驅動叢集演算法，評估結果顯示MPC 演算法的品質比專家切割的模組化品質好約莫1.6倍，此外，比起其他演算法的結果與人類思維的相似度，MPC 演算法平均而言分別提升13%。由此證明MPC 演算法對人類理解而言是種更有效的叢集工具，此外他也比這些演算法中能產出更好的模組品質。

In recent decades, software has become absolutely necessary for our life. Software development lifecycle (SDLC) is a process for software engineers to produce software. Within the steps of SDLC, software maintenance occupies most of the total cost of software. Additionally, the systems’ structure always exacerbates after extended maintenance activities, in that the size and complexity of systems increases and that quality usually degrades. One solution is software modularization, which is a beneficial way to reduce the time and cost and to enhance the quality. Clustering is an intuitive way to do modularization and classify codes into small pieces. However, traditional clustering algorithms have some drawbacks such as the undefined number of modules, low understandability for the modularization results, etc., so a lot of research in this field tries to use different features, different similarity calculation, or some special algorithms to enhance the performance.
In this study, we propose a multi-pattern clustering (MPC) algorithm to do software architecture recovery. There are five main steps in the MPC algorithm: preprocessing, file labeling, collection of chain dependency, hierarchical agglomerative algorithm, and modification of the clustering result. In our experiments, we compare the performance of the MPC algorithm to some traditional software clustering techniques, such as the single linkage algorithm, the complete linkage algorithm, the unweighted linkage algorithm, the weighted linkage algorithm, the combined algorithm, and the algorithm for comprehensive-driven clustering with three open-source software and one closed-source software programs, and each of them have different sizes and usages. The assessment results show that the modularization quality of the proposed MPC algorithm is nearly 1.6 times better than that of expert decomposition. Additionally, compared to other algorithms, the MPC algorithm, on average, has a 13% enhancement in producing results similar to human thinking. Consequently, it has been proven that the proposed MPC algorithm is the more suitable clustering technique for human comprehension, and it produces the better module quality compared to these algorithms.

Abstract i
Abstract in Chinese iii
Acknowledgement iv
Contents v
List of Tables vi
List of Figures vii
List of Symbols viii
Acronyms and Abbreviations viii
Notation ix
Chapter 1 Introduction 1
Chapter 2 Background and Literature Review 6
2.1 Basic Concepts of Hierarchical Agglomerative Clustering 6
2.2 Review of Software Modularization Techniques 10
Chapter 3 Modularization with Multi-Pattern Clustering 16
3.1 Step 1: Preprocessing 17
3.2 Step 2: File Labeling 22
3.3 Step 3: Collection of Chain Dependency 28
3.4 Step 4: Hierarchical Agglomerative Algorithm 28
3.5 Step 5: Modification of Clustering Result 29
Chapter 4 Experimental Results and Analysis 32
4.1 Test Systems 32
4.2 Comparison Criteria 32
4.2.1 Weighted Modularization Quality 33
4.2.2 Metrics: Accuracy, Precision, Recall and F1-Measure 34
4.2.3 MoJoFM 36
4.3 Expert Decomposition Data 37
4.4 Experimental Results and Discussions 37
4.5 Observations and Suggestions for Software Development and Maintenance 47
4.6 Visualization Tool for Software Modularization - ModulePlot 50
4.7 Threats to Validity 53
Chapter 5 Conclusions and Future Work 57
Reference 59
Appendix 64

[1] N. Ruparelia, “Software Development Lifecycle Models,” ACM SIGSOFT Software Engineering Notes, Vol. 35, No. 3, pp. 8-13, May 2010.
[2] C. Hsu and C. Huang, “Comparison of Weighted Grey Relational Analysis for Software Effort Estimation,” Software Quality Journal, Vol. 19, No. 1, pp. 165-200, Mar. 2011.
[3] G. Bavota, A. De Lucia, A. Marcus and R. Oliveto, “Using Structural and Semantic Measures to Improve Software Modularization,” Empirical Software Engineering, Vol. 18, No. 5, pp. 901-932, Oct. 2013.
[4] W. Chu, C. Chang, C. Lu, Y. Chung, H. Yang, B. Qiao and H. Jiau, “Enhancing Software Maintainability by Unifying and Integrating Standards,” Advances in Software Maintenance Management: Technologies and Solutions, IGI Global, pp. 114-150, 2003.
[5] K. Peng and C. Huang, “Reliability Analysis of On-Demand Service-Based Software Systems Considering Failure Dependencies,” IEEE Trans. on Services Computing, Vol. 10, No. 3, pp. 423-435, May/Jun. 2017.
[6] C. Huang, M. Lyu and S. Kuo, “A Unified Scheme of Some Nonhomogenous Poisson Process Models for Software Reliability Estimation,” IEEE Trans. on Software Engineering, Vol. 29, No. 3, pp. 261-269, Mar. 2003.
[7] J. McCall, P. Richards and G. Walters, Factors in Software Quality: Final Report, Information Systems Programs, General Electric Company, 1977.
[8] B. Boehm, J. Brown, and M. Lipow, “Quantitative Evaluation of Software Quality,” Proceedings of the 2nd International Conference on Software Engineering, San Francisco, CA, Oct. 1976, pp. 592-605.
[9] Software Engineering - Product Quality - Part 1: Quality Model, ISO/IEC 9126-1, 2001.
[10] Systems and Software Engineering - Systems and Software Quality Requirements and Evaluation (SQuaRE) - System and Software Quality Models, ISO/IEC 25010, 2011.
[11] D. Parnas, “Software Aging,” Proceedings of 16th International Conference on Software Engineering, Sorrento, Italy, May 1994, pp. 279-287.
[12] C. Di Francescomarino, A. Marchetto and P. Tonella, “Cluster-Based Modularization of Processes Recovered from Web Applications,” Journal of Software: Evolution and Process, Vol. 25, No. 2, pp. 113-138, Sep. 2010.
[13] O. Maqbool and H. Babri, “Hierarchical Clustering for Software Architecture Recovery,” IEEE Trans. on Software Engineering, Vol. 33, No. 11, pp. 759-780, Nov. 2007.
[14] L. Bass, P. Clements and R. Kazman, Software Architecture in Practice, 1st ed. Upper Saddle River, NJ: Addison-Wesley, 2013.
[15] B. Meyer, Object-Oriented Software Construction, 1st ed. Upper Saddle River, NJ: Prentice Hall, 1997.
[16] S. Kada, D. Woods and R. Cole, “Design Methods and Code Structure: A Comparative Case Study,” Software Quality Journal, Vol. 2, No. 3, pp. 163-176, Sep. 1993.
[17] H. Sözer, B. Tekinerdoğan and M. Akşit, “Optimizing Decomposition of Software Architecture for Local Recovery,” Software Quality Journal, Vol. 21, No. 2, pp. 203-240, Jun. 2013.
[18] R. Pressman, Software Engineering, 1st ed. NY: Mcgraw-Hill, 2014.
[19] C. Otero, Software Engineering Design: Theory and Practice, 1st ed. Auerbach Publications, 2012.
[20] J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier, 2011.
[21] I. Witten, E. Frank and M. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann, 2016.
[22] S. Theodoridis and K. Koutroumbas, Pattern Recognition, 4th ed. Elsevier/Acad. Press, 2008.
[23] R. Edgar, “Search and Clustering Orders of Magnitude Faster than BLAST,” Bioinformatics, Vol. 26, No. 19, pp. 2460-2461, Oct. 2010.
[24] P. Andritsos and V. Tzerpos, “Information-Theoretic Software Clustering,” IEEE Trans. on Software Engineering, Vol. 31, No. 2, pp. 150-165, Feb. 2005.
[25] A. Kumari, K. Srinivas and M. Gupta, “Software Module Clustering Using A Hyper-Heuristic Based Multi-Objective Genetic Algorithm,” Proceedings of the 3rd International Advance Computing Conference (IACC), Ghaziabad, India, Feb. 2013, pp. 813-818.
[26] K. Praditwong, “Solving Software Module Clustering Problem by Evolutionary Algorithms,” Proceedings of the 8th International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhon Pathom, Thailand, May 2011, pp. 154-159.
[27] A. Saeidi, J. Hage, R. Khadka and S. Jansen, “A Search-Based Approach to Multi-View Clustering of Software Systems,” Proceedings of the 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Montreal, QC, Canada, Mar. 2015, pp. 429-438.
[28] S. Mancoridis, B. Mitchell, Y. Chen and E. Gansner, “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures,” Proceedings of the IEEE International Conference on Software Maintenance, Oxford, England, Aug./Sep. 1999, pp. 50-62.
[29] K. Praditwong, M. Harman and X. Yao, “Software Module Clustering as A Multi-Objective Search Problem,” IEEE Trans. on Software Engineering, Vol. 37, No. 2, pp. 264-282, Mar./Apr. 2011.
[30] R. Naseem, O. Maqbool and S. Muhammad, “Cooperative Clustering for Software Modularization,” Journal of Systems and Software, Vol. 86, No. 8, pp. 2045-2062, Aug. 2013.
[31] M. Shtern and V. Tzerpos, “Clustering Methodologies for Software Engineering,” Advances in Software Engineering, Vol. 2012, pp. 1-18, 2012.
[32] V. Tzerpos and R. Holt, “ACDC: An Algorithm for Comprehension-Driven Clustering,” Proceedings of the 7th Working Conference on Reverse Engineering, Brisbane, Qld., Australia, Nov. 2000, pp. 258-267.
[33] Z. Wen and V. Tzerpos, “Software Clustering based on Omnipresent Object Detection,” Proceedings of the 13th International Workshop on Program Comprehension (IWPC'05), St. Louis, MO, May 2005, pp. 269-278.
[34] N. Anquetil, and T. Lethbridge. “File Clustering Using Naming Conventions for Legacy Systems,” Proceedings of the Annual IBM Centers for Advanced Studies Conference, Toronto, Ontario, Canada, Nov. 1997, pp. 184-195.
[35] A. Corazza, S. Di Martino, V. Maggio and G. Scanniello, “Weighing Lexical Information for Software Clustering in the Context of Architecture Recovery,” Empirical Software Engineering, Vol. 21, No. 1, pp. 72-103, Feb. 2016.
[36] M. Saeed, O. Maqbool, H. Babri, S. Hassan and S. Sarwar, “Software Clustering Techniques and the Use of Combined Algorithm,” Proceedings of the 7th European Conference on Software Maintenance and Reengineering(CSMR '03), Benevento, Italy, Mar. 2003, pp. 301-306.
[37] S. Hasheminejad and S. Jalili, “CCIC: Clustering Analysis Classes to Identify Software Components,” Information and Software Technology, Vol. 57, pp. 329-351, Jan. 2015.
[38] N. Anquetil and T. Lethbridge, “Experiments with Clustering as A Software Remodularization Method,” Proceedings of the 6th Working Conference on Reverse Engineering, Atlanta, GA, Oct. 1999, pp. 235-255.
[39] M. Ceccato, M. Marin, K. Mens, L. Moonen, P. Tonella and T. Tourwé, “Applying and Combining Three Different Aspect Mining Techniques,” Software Quality Journal, Vol. 14, No. 3, pp. 209-231, Sep. 2006.
[40] S. Naim, K. Damevski and M. Hossain, “Reconstructing and Evolving Software Architectures Using A Coordinated Clustering Framework,” Automated Software Engineering, vol. 24, no. 3, pp. 543-572, Feb. 2017.
[41] S. Ebad and M. Ahmed, “Functionality-Based Software Packaging Using Sequence Diagrams,” Software Quality Journal, Vol. 23, No. 3, pp. 453-481, Sep. 2015.
[42] D. Doval, S. Mancoridis and B. Mitchell, “Automatic Clustering of Software Systems Using A Genetic Algorithm,” Proceedings of the 9th International Workshop Software Technology and Engineering Practice (STEP '99), Pittsburgh, PA, Sep. 1999.
[43] M. Fleck, J. Troya and M. Wimmer, “Search-Based Model Transformations,” Journal of Software: Evolution and Process, Vol. 28, No. 12, pp. 1081-1117, Dec. 2016.
[44] J. Davey and E. Burd, “Evaluating the Suitability of Data Clustering for Software Remodularisation,” Proceedings of the 7th Working Conference on Reverse Engineering, Brisbane, Qld., Australia, Nov. 2000, pp. 268-277.
[45] B. Fuglede and F. Topsoe, “Jensen-Shannon Divergence and Hilbert Space Embedding,” Proceedings of the International Symposium on Information Theory (ISIT), Chicago, IL , Jun. /Jul. 2004.
[46] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, 1st ed. Hoboken, NJ: Wiley-Interscience, 1990.
[47] R. Schwanke and M. Platoff, “Cross References Are Features,” Proceedings of the 2nd International Workshop on Software Configuration Management (SCM '89), Princeton, NJ, Oct. 1989, pp. 86-95.
[48] V. Tzerpos and R. Holt, “The Orphan Adoption Problem in Architecture Maintenance,” Proceedings of the 4th Working Conference on Reverse Engineering (WCRE '97), Amsterdam, Netherlands, Oct. 1997.
[49] H. Müller, M. Orgun, S. Tilley and J. Uhl, “A Reverse-Engineering Approach to Subsystem Structure Identification,” Journal of Software Maintenance: Research and Practice, Vol. 5, No. 4, pp. 181-204, 1993.
[50] F. Beck and S. Diehl, “On the Impact of Software Evolution on Software Clustering,” Empirical Software Engineering, Vol. 18, No. 5, pp. 970-1004, Oct. 2013.
[51] M. Hitz and B. Montazeri, “Chidamber and Kemerer's Metrics Suite: A Measurement Theory Perspective,” IEEE Trans. on Software Engineering, Vol. 22, No. 4, pp. 267-271, Apr. 1996.
[52] G. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information,” Psychological Review, Vol. 63, No. 2, pp. 81-97, 1956.
[53] I. Stavropoulou, M. Grigoriou and K. Kontogiannis, “Case Study on Which Relations to Use for Clustering-Based Software Architecture Recovery,” Empirical Software Engineering, Vol. 22, No. 4, pp. 1717-1762, Jan. 2017.
[54] E. Horowitz, Fundamentals in Data Structures in C++, 2nd ed. Silicon Press, 2007.
[55] T. Gilb and G. Weinberg, Software Metrics, 1st ed. Cambridge: Winthrop Publishers, 1977.
[56] M. Triola, Elementary Statistics, 12th ed. Boston: Pearson, 2012.
[57] A. Bluman, Elementary Statistics: A Step by Step Approach, 9th ed. McGraw-Hill Education, 2013.
[58] P. Newbold, W. Carlson and B. Thorne, Statistics for Business and Economics, 1st ed. Boston: Pearson, 2013.
[59] K. Black, Business Statistics, 1st ed. Hoboken, NJ: Wiley, 2012.
[60] M. Steinbach, G. Karypis and V. Kumar. “A Comparison of Document Clustering Techniques.” KDD Workshop on Text Mining. Vol. 400, No. 1, Aug. 2000.
[61] D. Jones, “The 7±2 Urban Legend,” MISRA C 2002 Conference, Oct. 2002.
[62] D. LeCompte,”Seven, Plus or Minus Two, Is Too Much to Bear: Three (or Fewer) Is the Real Magic Number,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 43, No. 3, Sage CA: Los Angeles, CA: SAGE Publications, 1999.
[63] T. Saaty and M. Ozdemir, “Why the Magic Number Seven Plus or Minus Two,” Mathematical and Computer Modelling, Vol. 38, No. 3-4, pp. 233-244, Aug. 2003.
[64] D. Card, V. Church and W. Agresti, “An Empirical Study of Software Design Practices,” IEEE Trans. on Software Engineering, Vol. 12, No. 2, pp. 264-271, Feb. 1986.
[65] M. Sokolova and G. Lapalme, “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing and Management, Vol. 45, No. 4, pp. 427-437, Jul. 2009.
[66] “Precision and Recall,” En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Precision_and_recall. [Accessed: 13- Jun- 2017].
[67] “Accuracy and Precision,” En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Accuracy_and_precision#In_binary_classification. [Accessed: 13- Jun- 2017].
[68] F. Valverde-Albacete and C. Peláez-Moreno, “100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox,” PLoS ONE, Vol. 9, No. 1, 2014.
[69] T. Sing, O. Sander, N. Beerenwinkel and T. Lengauer, “ROCR: Visualizing Classifier Performance in R,” Bioinformatics, Vol. 21, No. 20, pp. 3940-3941, Aug. 2005.
[70] M. Buckland and F. Gey, “The Relationship between Recall and Precision,” Journal of the American Society for Information Science, Vol. 45, No. 1, pp. 12-19, 1994.
[71] V. Tzerpos and R. Holt, “MoJo: A Distance Metric for Software Clustering,” Proceedings of the 6th Working Conference on Reverse Engineering, Atlanta, GA, Oct. 1999.
[72] Z. Wen and V. Tzerpos, “An Effectiveness Measure for Software Clustering Algorithms,” Proceedings of the 12th IEEE International Workshop on Program Comprehension, Bari, Italy, Jun. 2004, pp. 194-203.
[73] “LUMS-Software Engineering Research Group,” 2017. [Online]. Available: http://suraj.lums.edu.pk/~reverseeng/. [Accessed: 17- Mar- 2017].
[74] R. Burden, J. Faires and A. Burden, Numerical Analysis, 10th ed. Boston, MA: Cengage Learning, 2015.
[75] S. Raghunathan, A. Prasad, B. Mishra and H. Chang, “Open Source versus Closed Source: Software Quality in Monopoly and Competitive Markets,” IEEE Trans. on Systems, Man, and Cybernetics - Part A: Systems and Humans, Vol. 35, No. 6, pp. 903-918, Oct. 2005.
[76] S. Slaughter, D. Harter and M. Krishnan, “Evaluating the Cost of Software Quality,” Communications of the ACM, Vol. 41, No. 8, pp. 67-73, Aug. 1998.
[77] R. Koschke, “Atomic Architectural Component Recovery for Program Understanding and Evolution,” Proceedings of International Conference on Software Maintenance, Montreal, Quebec, Canada, Oct. 2002, pp. 478-781.
[78] E. Gamma, R. Helm, R. Johnson and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, 1st ed. Boston: Addison-Wesley, 1994.
[79] “MISRA - The Motor Industry Software Reliability Association,” Misra.org.uk, 2017. [Online]. Available: https://www.misra.org.uk. [Accessed: 16- Jun- 2017].
[80] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.
[81] M. von Detten and S. Becker, “Combining Clustering and Pattern Detection for the Reengineering of Component-Based Software Systems,” Proceedings of the 7th International Conference on the Quality of Software Architectures, Boulder, CO, Jun. 2011, pp. 23-32.
[82] R. Martin, Agile Software Development, Principles, Patterns, and Practices, 1st ed. USA: Prentice Hall, 2002.
[83] R. Martin, “The Dependency Inversion Principle,” C++ Report, Vol. 8, No. 6, pp. 61-66, 1996.
[84] E. Capra, C. Francalanci and F. Merlo, “An Empirical Study on the Relationship Between Software Design Quality, Development Effort and Governance in Open Source Projects,” IEEE Trans. on Software Engineering, Vol. 34, No. 6, pp. 765-782, Nov./Dec. 2008.
[85] R. Banker and S. Slaughter, “The Moderating Effects of Structure on Volatility and Complexity in Software Enhancement,” Information Systems Research, Vol. 11, No. 3, pp. 219-240, Sep. 2000.
[86] C. Hsu and C. Huang, “An Adaptive Reliability Analysis Using Path Testing for Complex Component-Based Software Systems,” IEEE Trans. on Reliability, Vol. 60, No. 1, pp. 158-170, Mar. 2011.
[87] Y. Zhao, Y. Yang, H. Lu, Y. Zhou, Q. Song and B. Xu, “An Empirical Analysis of Package-Modularization Metrics: Implications for Software Fault-Proneness,” Information and Software Technology, Vol. 57, pp. 186-203, Jan. 2015.
[88] B. Mitchell, “Clustering Software Systems to Identify Subsystem Structures,” Department of Mathematics and Computer Science Drexel University, Philadelphia, PA, USA, 2006.
[89] C. Wohlin, P. Runeson, M. Höst, M. Ohlsson, B. Regnell and A. Wesslén, Experimentation in Software Engineering, 1st ed. Berlin, Heidelberg: Springer, 2012.

(此全文未開放授權)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文