帳號:guest(18.218.55.210)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):吳思葦
作者(外文):Wu, Sz-Wei
論文名稱(中文):釐清 Gains chart 與 Lift chart 之混淆以增進實務中的有效應用
論文名稱(外文):Clarifying Confusions about Gains and Lift Charts to Improve Their Current Underuse in Practice
指導教授(中文):徐茉莉
指導教授(外文):Shmueli, Galit
口試委員(中文):林福仁
李曉慧
口試委員(外文):Lin, Fu-Ren
Lee, Hsiao-Hui
學位類別:碩士
校院名稱:國立清華大學
系所名稱:服務科學研究所
學號:106078507
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:57
中文關鍵詞:增益圖累積增益圖分類排序資料探勘
外文關鍵詞:gains chartlift chartcumulative gainscumulative liftrankingclassificationperformance evaluationdata mining
相關次數:
  • 推薦推薦:0
  • 點閱點閱:168
  • 評分評分:*****
  • 下載下載:18
  • 收藏收藏:0
Gains chart及lift chart為檢驗資料探勘方法預測結果之評估標準,尤用以評估排序問題(ranking problem)。此二圖主要依分類結果之機率排序,以協助排名靠前的數據子集選擇特定門檻。即使gains chart與lift chart 已應用於許多領域,且常被教科書及期刊論文提及,兩者之間仍有許多術語及定義上的混淆處,造成使用上的困難或是錯誤解讀。因此,本論文研究旨在釐清上述混淆以增進gains chart及lift chart 在實務中的有效應用。
本研究先透過展示其他分類評估標準(如:準確率(accuracy)、ROC (Receiver Operating Characteristic) 曲線、敏感度(sensitivity)、特異度(specificity)等在文獻中的主導地位,以顯示gains chart及lift chart應用率相對低落之問題。再經本研究調查結果,此二圖之命名和定義在多數刊物及資料探勘軟體中經常混淆不清。故本研究乃以清晰、有條理的方式組織gains chart及lift chart之不同術語、計算方法、以及相關定義,藉以闡明其用途與再現性;繼而引入使用gains及lift數值的十分位圖、利潤圖、與非累積圖;且做為整合之用,我們創建了一個gainslift R語言之套件,提供清晰並一致的gains chart及lift chart。最後,本論文提出此二圖的三種主要用途,用於比較不同情況下資料探勘方法的預測結果,並以Kaggle平台的實際案例進行說明。在此實際案例中,我們亦提供使用gainslift套件的範例圖表。
Gains chart and lift chart are two useful data mining performance measures for evaluating ranking problems. These two charts are based on ranking the data by the classification probability, which then helps choose a threshold for targeting a subset of top-ranked data. Although deployed in some application areas, and mentioned in textbooks and papers, there are confusions in terminology and definition around gains and lift charts which leads to difficulty or wrong interpretations when using them. In this research, we clarify the above confusions to improve their current under use in practice.
We bring up this issue by showing the dominance of other classification evaluation criteria, such as accuracy, ROC curve, sensitivity, and specificity through our literature search. Our survey also shows that the naming and definition of gains chart and lift chart are often mixed up in both publications and data mining software. We organize the disparate terminology, computation approaches, and perspectives on gains and lift charts in a clear, methodic way to clarify their uses and reproducibility. Decile, profit, and non-cumulative charts using gains and lift values are also introduced successively. As an integration of this research, we created the gainslift R package to provide consistent and clear gains and lift charts. Finally, we propose three uses of the charts for comparing performance of data mining algorithms on different circumstances, and illustrate them with a practical case from the Kaggle platform. The example of gains and lift charts derived from our package are also provided in this case.
1. Introduction 6
2. Classification and Ranking Performance Evaluation 8
2.1 Measures of Classifier Performance 8
2.1.1 Confusion Matrix Measures 8
2.1.2 Receiver Operating Characteristic curve (ROC curve) 10
2.2 Measures of Ranking Performance 10
2.2.1 Area under ROC curve (AUC) 10
2.2.2 Gains Chart 11
2.2.2.1 Definition of a Gains Chart 11
2.2.2.2 Constructing a (Cumulative) Gains Chart 11
2.2.2.3 Computing the Gains Chart using a Confusion Matrix 13
2.2.3 Lift Chart 15
2.2.3.1 Definition of a Lift chart 15
2.2.3.2 Constructing a Lift chart 15
2.2.3.3 Computing the Lift Chart using a Confusion Matrix 18
2.3 Comparing the performance measures 19
3. Terminology Confusions and Underuse of Gains and Lift Charts 21
3.1 Literature Search 21
3.2 Data Mining Software 27
4. Extended Charts with Gains and Lift 33
4.1 Decile Charts 33
4.2 Non-cumulative Decile Charts 34
4.3 Profit Charts 35
4.3.1 Profit Gains Chart 36
4.3.2 (Additive) Profit Lift Chart 38
4.4 gainslift - Package Built in R Language 40
5. Illustrating the Benefit of using Gains and Lift Charts 44
5.1 Case Descriptions: Predicting Red Hat Business Value 44
5.2 Evaluating the Performance using Gains and Lift Charts 45
6. Conclusions and Future Work 51
6.1 Conclusions 51
6.2 Future Work 51
References 52
Appendix A 54

Bing, H., Xu, H. & Yujiang, O. (2013). Research of Using Fourier Series Fitting Cam Lift Curve Based on the Least Square Method. In 2013 Third International Conference on Intelligent System Design and Engineering Applications (pp. 1144-1147).

Brandenburger, T., & Furth, A. (2009). Cumulative gains model quality metric. Advances in Decision Sciences, 2009.

Flach, P. (2012). Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press.

Friedman, J., Hastie, T., & Tibshirani, R. (2017). The elements of statistical learning. New York: Springer series in statistics.

Jaffery, T., & Liu, S. X. (2009). Measuring campaign performance by using cumulative gain and lift chart. In SAS Global Forum (p. 196).

Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.

Jurczyk, T. (2019). Gains vs ROC curves. Do you understand the difference?, TIBCO Community, .

Keskustalo, H., Järvelin, K., Pirkola, A., & Kekäläinen, J. (2008, July). Intuition-supporting visualization of user's performance based on explicit negative higher-order relevance. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 675-682). ACM.

Laitone, E. V. (1989). Lift-curve slope for finite-aspect-ratio wings. Journal of Aircraft, 26(8), 789-790.

Li, Y., Murali, P., Shao, N., & Sheopuri, A. (2015, November). Applying Data Mining Techniques to Direct Marketing: Challenges and Solutions. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 319-327). IEEE

Ling, C. X., & Li, C. (1998, August). Data mining for direct marketing: Problems and solutions. In Kdd (Vol. 98, pp. 73-79).

Piatetsky-Shapiro, G., & Masand, B. (1999, August). Estimating campaign benefits and modeling Lift. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 185-193). ACM.

Piatetsky-Shapiro, G., & Steingold, S. (2000). Measuring lift quality in database marketing. SIGKDD explorations, 2(2), 76-80.

Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.

Shahinfar, S., Guenther, J. N., Page, C. D., Kalantari, A. S., Cabrera, V. E., Fricke, P. M., & Weigel, K. A. (2015). Optimization of reproductive management programs using lift chart analysis and cost-sensitive evaluation of classification errors. Journal of dairy science, 98(6), 3717-3728.

Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. John Wiley & Sons.

Singh, J. P. (2013). Predictive validity performance indicators in violence risk assessment: A methodological primer. Behavioral Sciences & the Law, 31(1), 8-22.

Zaki, M. J., Meira Jr, W., & Meira, W. (2014). Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *