帳號:guest(3.133.123.148)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):廖承哲
作者(外文):Liao, Cheng-Che.
論文名稱(中文):增量模型於不均衡實驗組之表現
論文名稱(外文):Uplift Model Performance With Imbalanced Treatment Groups
指導教授(中文):徐茉莉
指導教授(外文):Shmuli, Galit
口試委員(中文):林福仁
李曉惠
口試委員(外文):Lin, Fu-Ren
Li, Hsiao-Hui
學位類別:碩士
校院名稱:國立清華大學
系所名稱:服務科學研究所
學號:106078503
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:78
中文關鍵詞:精準行銷顧客分群增量模型隨機實驗A/B測試不均衡實驗與對照組比率
外文關鍵詞:Direct Marketingcustomer segmentationuplift modelrandomized experimentA/B testingimbalanced treatment-control ratio
相關次數:
  • 推薦推薦:0
  • 點閱點閱:275
  • 評分評分:*****
  • 下載下載:18
  • 收藏收藏:0
有鑑於企業資源有限,辨識出可被轉換的顧客非常重要,且降低過度行銷也能減少顧客的困擾。

增量模型(uplift model) 可將客戶分成四大群:轉換者(Persuadable)、堅決購買者(Sure Thing)、無動於衷者(Lost Causes)和請勿打擾者(Do-Not-Disturb)。增量模型透過隨機實驗(A/B測試)、結合預測模型識別最有可能被轉換的「轉換者」以達到精準行銷(Radcliffe 2007)。研究人員需進行A/B測試、記錄顧客反應,透過預測模型獲得增量分數(uplift score) 並將顧客分成四群。在文獻回顧當中有提到模型的預測表現在很大程度上取決於資料本身的性質和應用。此外,實驗當中需要相同大小的實驗組和對照組(Devriendt et al. 2018),然而在實務上往往出現不均衡的實驗與控制組,以便給更多的客戶「良好」的刺激物(treatment)、特別是VIP等級的客戶,因此有些論文也在未來研究的部分提到資料分配(distribution)不均衡的問題(Radcliffe 2007; Devriendt et al. 2018; Diemert et al. 2018)。

在本篇研究中,我們研究了增量模型於不均衡實驗組和對照組所引起的問題,特別是當實驗組有更多回應者(responders)的情況下、更貼近實務上的操作。我們將使用真實資料並根據(1)實驗和對照組比率(2)實驗組和對照組中的回應率來操縱它們,並使用增量圖(uplift chart) 和Qini曲線作為評估指標(Radcliffe and Surry 2011)。本研究旨在回答以下問題:

(a) 在實驗組中分配更多Ypre = 1是否會惡化/改善模型的表現?
(b) 是否存在其他也會影響模型表現的因素?
(c) 在模型中納入觀測目標於實驗前的回應(pre-treatment outcome, Ypre)是否會提高預測表現?
In applications such as direct (precision) marketing, there are limited resources and therefore it is important to identify the customers most likely to convert if targeted. Such accurate targeting can also reduce the burden on customers, by reducing the amount of irrelevant solicitations. When using a predictive model to contact customers, there are four types of outcomes: Persuadable, Sure Thing, Lost Cause, Do-Not-Disturb. Uplift modeling is used for figuring out the Persuadables, those customers who are most likely to react positively because of targeting (Radcliffe 2007).

Uplift modeling uses a randomized experiment (A/B test), and combines it with a predictive model to find out the Persuadables. Researchers need to conduct an A/B test and record the customers’ reactions, by doing so, they can obtain the uplift scores by applying the predictive model and classify customers into the 4 different segments. Previous work have proven the performance of uplift heavily depends on the nature of the data sets and the application. In addition, equal treatment and control group sizes are required (Devriendt et al. 2018), and several papers mention the problem of imbalanced distribution as the future works (Radcliffe 2007; Devriendt et al. 2018; Diemert et al. 2018). However, in practice, companies often prefer unequal treatment and control groups, to give more customers the “good” treatment. Moreover, they want to give their VIP customers more chance to get a “good” treatment.

In this research we study the problems arising from imbalanced treatment and control groups when using uplift modeling, and specifically, when the company assigns more responders to the treatment group than to that of control group. We will use real datasets and manipulate them in terms of (1) treatment-control ratio, and (2) the rate of responders in the treatment group and in the control group. We will use uplift chart and Qini curve as the evaluation metrics (Radcliffe and Surry 2011), used in uplift. This research aims to answer the following questions: (a) Does allocating more Ypre= 1 into treatment deteriorate/improve uplift modelling? (b) What factors influence the performance? (c) Does including pre-treatment response (Ypre) in the model improve performance?
Abstract 1
摘要 2
Acknowledgement 3
List of Tables 6
List of Figures 7
1.Introduction 8
2.Effect of Pretreatment Outcome In Direct Marketing 11
2.1.Uplift with no pretreatment outcome information (excluding Ypre) 12
2.2.Uplift with pretreatment outcome information (includes Ypre) 13
3.Literature Review 16
3.1.Data Processing Approach 16
3.1.1.One Model Approach 16
3.1.2.Two Model Approach 17
3.2.Models 18
3.2.1.Tree-Based Method 18
3.2.2.Distance-Based Method 20
3.2.3.Regression-Based Method 21
3.2.4.Logistic Regression 22
3.2.5.Lasso 22
3.3.Evaluation Method 22
3.3.1.Notation 23
3.3.2.Decile Table 23
3.3.3.Qini Curve 25
3.1.Measurement of treatment effect in econometrics: Difference-In-Difference 26
4.Experiment Design 29
4.1.Data sets 29
4.1.1.Voter-Persuasion 29
4.1.2.Kevin Hillstorm_MineThatData E-mail Analytics 30
4.2.Data Manipulation 31
4.2.1.Pre-treatment outcome (Ypre) Setup 31
4.2.2.Responders/ Non-responders Ratio (Ypost distribution) 32
4.2.3.Treatment /Control Ratio (T distribution) 32
4.2.4.Classification Algorithms 33
4.2.5.Approaches 33
4.2.5.1.Two Model Approach 33
4.2.5.2.One Model Approach 34
4.2.5.3.Uplift Special Approach 34
4.2.6.Combination of Different Settings 35
5.Experimental Result 37
5.1.Comparing Treatment and Control Ratios 37
5.2.Comparing Other Factors 39
5.2.1.Responder and Non-responder Ratios: Conclusion 40
5.2.2.Classification Algorithms 41
5.2.3.Modelling Approaches 42
5.3.Comparing Pre-Treatment Outcome (Ypre) Setups 44
6.Conclusion 46
7.References 49
8.Appendix A 52
Introduction of KL divergence and squared Euclidean distance 52
Kullback-Leibler Divergence 52
Squared Euclidean Distance 53
9. Appendix B 54
Uplift comparison graph 54

[1] Alemi, F., Erdman, H., Griva, I., & Evans, C. H. (2009). Improved statistical methods are needed to advance personalized medicine. The open translational medicine journal, 1, 16.
[2] Barry, T. E., & Howard, D. J. (1990). A review and critique of the hierarchy of effects in advertising. International Journal of Advertising, 9(2), 121-135.
[3]Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?. The Quarterly journal of economics, 119(1), 249-275.
[4] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[5] Devriendt, F., Moldovan, D., & Verbeke, W. (2018). A literature survey and experimental evaluation of the state-of-the-art in uplift modeling: A stepping stone toward the development of prescriptive analytics. Big data, 6(1), 13-41.
[6] Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
[7] Diemert, E., Betlei, A., Renaudin, C., & Amini, M. R. (2018). A Large Scale Benchmark for Uplift Modeling.
[8] Gallo, A. (2014). The value of keeping the right customers. Harvard business review, 29.
[9] Guelman, L., Guillén, M., & Pérez Marín, A. M. (2014). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. UB Riskcenter Working Paper Series, 2014/06.
[10] Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2015). Uplift random forests. Cybernetics and Systems, 46(3-4), 230-248.
[11] Gutierrez, P., & Gérardy, J. Y. (2017, July). Causal inference and uplift modelling: A review of the literature. In International Conference on Predictive Applications and APIs (pp. 1-13).
[12] Han, T. S., & Kobayashi, K. (2002). Mathematics of Information and Coding (Translations of Mathematical Monographs). Amer Mathematical Society.
[13] Hansotia, B., & Rukstales, B. (2002). Incremental value modeling. Journal of Interactive Marketing, 16(3), 35.
[14] Hillstrom, K. (2008). The MineThatData e-mail analytics and data mining challenge. MineThatData blog.
[15] Jaroszewicz, S., Ivantysynova, L., & Scheffer, T. (2008). Schema matching on streams with accuracy guarantees. Intelligent Data Analysis, 12(3), 253-270.
[16] Kane, K., Lo, V. S., & Zheng, J. (2014). Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. Journal of Marketing Analytics, 2(4), 218-238.
[17] Lewis, E. S. (1898). AIDA sales funnel.
[18] Naranjo, O. M. (2012). Testing a New Metric for Uplift Models.
[19] Radcliffe, N. J. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Market J Direct Market Assoc Anal Council, 1, 14-21.
[20] Radcliffe, N. J., & Simpson, R. (2008). Identifying who can be saved and who will be driven away by retention activity. Journal of Telecommunications Management, 1(2).
[21] Reichheld, F. (2001). Prescription for cutting costs. Bain & Company. Boston: Harvard Business School Publishing.
[22] Rzepakowski, P., & Jaroszewicz, S. (2010, December). Decision trees for uplift modeling. In 2010 IEEE International Conference on Data Mining (pp. 441-450). IEEE.
[23] Rzepakowski, P., & Jaroszewicz, S. (2012). Uplift modeling in direct marketing. Journal of Telecommunications and Information Technology, 43-50.
[24] Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. John Wiley & Sons.
[25] Su, X., Kang, J., Fan, J., Levine, R. A., & Yan, X. (2012). Facilitating score and causal inference trees for large observational studies. Journal of Machine Learning Research, 13(Oct), 2955-2994.
[26] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *