帳號:guest(3.145.174.168)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳立展
作者(外文):Chen, Li Chan
論文名稱(中文):預測社群自主管理的程式模組是否將被停止維護
論文名稱(外文):Lost Gems: Predicting Module Abandonment on Community Hosted Open-Source Repositories
指導教授(中文):雷松亞
指導教授(外文):Soumya Ray
口試委員(中文):徐茉莉
許裴舫
口試委員(外文):Galit Shmueli
Pei-Fang Hsu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:服務科學研究所
學號:103078508
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:74
中文關鍵詞:模組停止維護GitHubRubyGems資料探勘決策樹邏輯式迴歸隨機森林
外文關鍵詞:Module AbanodnmentRubyGemsGitHubData MiningDecision TreeLogistic RegressionRandom Forest
相關次數:
  • 推薦推薦:0
  • 點閱點閱:125
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在開源軟體工程的領域中,模組代表一組被打包好並且能夠執行特定作業的程式碼,也因為是開源的關係,其他任何的開發者都能夠將模組利用於自己的程式中。因為大部分的模組都是依賴於其他許多的模組之上,因此模組是否被維護者停止維運變成開發者在使用它之前的一項考量因素。此研究包含兩個目標:第一,我們希望能夠提供有意義的資訊來讓開發者作為評估模組是否被維護者所拋棄的參考依據;第二,我們希望幫助自行管理程序模組的社群檢視需要花龐大資源來收集的資料的價值。在此研究中,我們以 Ruby 程式語言作為研究及分析的對象。此外,我們希望能夠預測出 Ruby 模組在未來一年是否會被其維護者停止維護,我們認為 GitHub 及RubyGems 能提供我們 Ruby 模組全方面的資訊,因此我們便將此兩平台作為資料來源,並且利用資料探勘的方法建立分類模型。最後,我們發現能夠有效預測模組是否停止維運的分類模型,以及可能和其有關的因素。另外,我們也瞭解到花龐大資源來儲存昂貴的使用下載資訊的並不是必要的。更進一步地,我們替 Ruby 社群建立了線上儀表板系統,在系統當中,利用視覺化的方式呈現 Ruby 模組在 GitHub 及 RubyGems 的資訊,以及從我們建立的分類模型中產出對停止維運的評估指標。
Open source modules, which are software libraries, are often abandoned by their maintainers. This abaondonment becomes a serious concern for developers and for software projects at large. In this study, we want to first help developers recognize module abandonment in a data-driven manner. Second, we wish to help the sites which host modules to examine the value of preserving expensive usage data. We examine the modules of the Ruby programming language in this study. We try to predict if gems will be abandoned in the near future by analyzing data from GitHub and RubyGems using data mining techniques. We discover effective models to predict abandonment and factors associated with it. Moreover, we begin to percive that it may not be necessary to collect expensive usage data from independent module hosting communities to predict module abandonment. Using these results, we build an online dashboard to help Ruby developers recognize potential abandonment of modules. This study helps us to combined theoretical methods with practical use.
Chapter 1 Introduction - 1 -
Chapter 2 Modules and Gems - 5 -
2.1 How gems are created by developers? - 5 -
2.2 How are gems used by other developers? - 7 -
Chapter 3 Module Ecosystems - 9 -
3.1 Gem Dependencies - 9 -
3.2 Gem Abandonment - 11 -
3.3. Different Module Ecosystems - 12 -
3.3.1 GitHub Hosted Module Ecosystem - 13 -
3.3.2 Community Hosted Module Repository - 14 -
3.4 Recognizing Abandonment as a Practitioner - 16 -
Chapter 4 Research on Abandonment - 19 -
4.1 Previous Abandonment Research - 19 -
4.2 Research Gaps - 22 -
Chapter 5 Data Collection and Preprocessing - 25 -
5.1 How to Collect the Data - 25 -
5.2 Data Schema - 27 -
5.3 Data Preprocessing - 29 -
5.3.1 Data Conversion and Labeling - 29 -
5.3.2 Difficulty of Removing Contaminated Data - 34 -
5.3.3 Multiple Datasets - 35 -
5.4 Challenges - 35 -
Chapter 6 Data Exploration and Analysis - 38 -
6.1 Data Exploration - 38 -
6.2 Data Analysis - 39 -
6.2.1 GitHub-Only Dataset - 43 -
6.2.2 RubyGems-Only Dataset - 44 -
6.2.3 Combined GitHub + RubyGems Dataset - 46 -
Chapter 7 Findings and Discussion - 49 -
7.1 Discussion of Research Questions - 49 -
7.1.1 How Can We Effectively Predict Abandonment of Modules? - 49 -
7.1.2 Which Factors Might Correlate with Abandonment of Modules? - 52 -
7.1.3 Is GitHub Data Enough for Predicting Abandonment of Modules? Or Should Independent Module Repositories Collect Usage Data? - 55 -
7.2 Discussion of Objectives - 56 -
7.2.1 Helping Developers to Recognize Gem Abandonment - 57 -
7.2.2 Considering the Storage of Expensive Data - 59 -
Chapter 8 Limitations and Future Work - 60 -
Chapter 9 Conclusion - 61 -
References - 62 -
Appendix A - 66 -
Appendix B - 68 -

Banerjee, M., & McKeague, I. W. (2007). Confidence sets for split points in decision trees. The Annals of Statistics, 35(2), 543-574.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Cade. M. (2015, March 12). How GitHub Conquered Google, Microsoft, and Everyone Else. Retrieved July 11, 2016, from http://www.wired.com/2015/03/github-conquered-google-microsoft-everyone-else/

Crowston, K., Howison, J., & Annabi, H. (2006). Information systems success in free and open source software development: Theory and measures. Software Process: Improvement and Practice, 11(2), 123-148.

Crowston, K., Annabi, H., & Howison, J. (2003). Defining open source software project success. ICIS 2003 Proceedings, 28.

Dabbish, L., Stuart, C., Tsay, J., & Herbsleb, J. (2012, February). Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (pp. 1277-1286). ACM.

Erik debill, (2016). Modulecounts. Retrieved 29 June, 2016, from http://www.modulecounts.com/

Fundraising Progress [Digital image]. (n.d.). Retrieved June 29, 2016, from https://rubytogether.org/

Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters,27(8), 861-874.

Grigorio, F., Brito, D., Anjos, E., & Zenha-Rela, M. (2014, October). On systems project abandonment: An analysis of complexity during development and evolution of FLOSS systems. In 2014 IEEE 6th International Conference on Adaptive Science & Technology (ICAST) (pp. 1-8). IEEE.

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46(1-3), 389-422.

Hauge, Ø., Cruzes, D. S., Conradi, R., Velle, K. S., & Skarpenes, T. A. (2010, May). Risks and risk mitigation in open source software adoption: bridging the gap between literature and practice. In IFIP International Conference on Open Source Systems (pp. 105-118). Springer Berlin Heidelberg.

Koponen, T., & Hotti, V. (2005, May). Open source software maintenance process framework. In ACM SIGSOFT Software Engineering Notes (Vol. 30, No. 4, pp. 1-5). ACM.

Marlow, J., Dabbish, L., & Herbsleb, J. (2013, February). Impression formation in online peer production: activity traces and personal profiles in github. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 117-128). ACM.

Papazoglou, M. P., & van den Heuvel, W. J. (2003). Service-Oriented Computing: State-of-the-Art and Open Research Issues. IEEE Computer. v40 i11.

Rubytogetherorg. (2016). Rubytogetherorg. Retrieved 29 June, 2016, from https://rubytogether.org/

Schweik, C. M., & English, R. (2007). Identifying success and abandonment of free/libre and open source (FLOSS) commons: A preliminary classification of sourceforge. net projects.

Schweik, C. M., English, R., Paienjton, Q., & Haire, S. (2010). Success and abandonment in open source commons: Selected findings from an empirical study of sourceforge. net projects. In Proceedings of the Sixth International Conference on Open Source Systems (OSS 2010) Workshops.

Shmueli, G. (2010). To explain or to predict?. Statistical science, 289-310.

Shmueli, G., Patel, N. R., & Bruce, P. C. (2010). Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. Hoboken, NJ: Wiley.

Stewart, K., & Ammeter, T. (2002). An exploratory study of factors influencing the level of vitality and popularity of open source projects. ICIS 2002 Proceedings, 88.

Takhteyev, Y., & Hilts, A. (2010). Investigating the geography of open source software through GitHub.

Thung, F., Bissyande, T. F., Lo, D., & Jiang, L. (2013, March). Network structure of social coding in github. In Software maintenance and reengineering (csmr), 2013 17th european conference on (pp. 323-326). IEEE.

Tsay, J., Dabbish, L., & Herbsleb, J. (2014, May). Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering (pp. 356-366). ACM.

Vasilescu, B., Filkov, V., & Serebrenik, A. (2013, September). StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In Social Computing (SocialCom), 2013 International Conference on (pp. 188-195). IEEE.

Von Alan, R. H., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS quarterly, 28(1), 75-105.

Weiss, D. (2005). Measuring success of open source projects using web search engines.

Williams, C. (n.d.). How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript. Retrieved July 12, 2016, from http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/

Zakas, N. C. (2015, December 25). Why I'm not using your open source project [Web log post]. Retrieved June 29, 2016, from https://www.nczonline.net/blog/2015/12/why-im-not-using-your-open-source-project/
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *