帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Yu, Jin-Sian
論文名稱(外文):Generalizing Essence Coding for Categorical Response and Continuous Responce with Heterogeneous and Correlated Errors
指導教授(外文):Cheng, Shao-Wei
外文關鍵詞:Full factorial designGeneralized linear modelLogistic regressionMaximum likelihood estimationNon-parametric model
  • 推薦推薦:0
  • 點閱點閱:9
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在全因子設計下,Huang (2020)提出了在反應變數服從常態分配、彼此獨立且具同質變異數時,本質編碼的定義及估計方法。但真實資料中的反應變數,可能並不完全滿足這些條件。本文從兩方面放寬這些條件。第一、假設連續型反應變數之間具有異質變異數或是共變異不為零,第二、假設反應變數為類別型且服從二項分配。對這兩種情況,我們分別探討如何推廣及應用本質編碼。在第一個推廣中,我們發現本質編碼的定義與估計方式,並不會隨著反應變數的變異數假設放寬而改變。在第二個推廣中,我們透過廣義線性模型來定義本質編碼,再利用最大概似估計法來估計本質編碼,並呈現廣義線性模型下本質編碼估計的解析解。最後,我們利用模擬數據驗證了我們在第一個推廣中的推導,並利用兩筆二元型真實資料來配適無母數模型,再將其視為真實模型來生成模擬數據,以衡量第二個推廣中估計方法的準確度。
Huang (2020) proposed methods for defining and estimating essence coding to capture maximum explained variation using a full factorial design. The model considered by Huang (2020) assumes that the response variable follows a normal distribution, with independence and homoscedasticity. However, in real-world scenarios, the response variable may not fully satisfy these conditions. This thesis relaxes these assumptions in two different aspects. First, we consider a continuous response variable with heteroscedasticity or non-zero covariances. Second, we assume the response variable is categorical and follows a binomial distribution. Under these two model extensions, we explore how to estimate and apply essence codings. In the first extension, we find that the definition and estimation of essence coding do not need to change when relaxing the variance and/or covariance assumptions of the response variable. The definition and estimation formulas remain the same as those in Huang (2020). In the second extension, we use a generalized linear model (GLM) for modeling and define essence coding within the GLM framework. We then estimate essence coding using maximum likelihood estimation and obtain an analytical solution for the essence coding estimates under the GLM. Finally, we conduct numerical simulations to validate our derivations in the first extension. Additionally, we fit nonparametric models to two sets of binary real data, treat these fitted models as true models, and generate simulated data from them to evaluate the accuracy of the proposed estimation methods in the second extension.
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
一、 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
二、 文獻回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 同質變異數下的本質編碼估計準則與本質效應 . . . . . . . . . . . . . . . 8
2.2 廣義最小平方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 邏輯斯迴歸 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
三、 研究方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 異質變異數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 二元型反應變數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
四、 數據分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 異質變異數結構下的本質編碼估計比較 . . . . . . . . . . . . . . . . . . . . 18
4.2 分類樹模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 類神經網路模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
五、 結論與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Detrano, R. C., Jánosi, A., Steinbrunn, W., Pfisterer, M. E., Schmid, J.-J., Sandhu, S.,Guppy, K., Lee, S., and Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. The American journal of cardiology, 64(5):304–10.

Faraway, J. (2014). Linear Models with R, 2nd edition. CRC Press.

Faraway, J. (2016). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, 2nd edition. CRC Press.

Huang, J.-M. (2020). Identifying essence codings and effects for multiple factors under full factorial designs. Master’s thesis, National Tsing Hua University, Hsinchu, Taiwan.

l3LlFF (2024). Banana Quality. Version 1. Retrieved March 6, 2024 from https://www.kaggle.com/datasets/l3llff/banana.

Seber, G. A. (1977). Linear Regression Analysis, 1st edition. Wiley, New York.

Styan, G. P. (1973). Hadamard products and multivariate statistical analysis. Linear Algebra and its Applications, 6:217–240.
第一頁 上一頁 下一頁 最後一頁 top
* *