|
王文中(2004)。Rasch 測量理論與其在教育 和心理之應用。教育與心理研究,27(4),637–694 王文中、陳承德(譯)(2008)。心理測驗(原作者:K. R. Murphy., & C. O. Davidshofer)。臺北市:雙葉書廊。(原著出版年:2001) 余民寧(2009)。試題反映理論IRT及其應用。臺北市:心理。 余民寧、謝進昌(2006)。國中基本學力測驗之DIF的實徵分析:以91年度兩次測驗為例。教育學刊,26,241-276 郭伯臣(2010)。測驗等化。載於譚克平等人(主編),測驗及評量專論文集-題庫建置與測驗編制(初版,102-134頁)。臺北縣:國家教育研究院籌備處。 臺灣PISA國家研究中心(2010,7月)。計畫概述。2016年1月28日,取自:臺灣PISA國家研究中心網頁:http://pisa.nutn.edu.tw/pisa_tw.htm Agresti, A. (1996). An introduction to categorical data analysis. New York: Wiley. Albano, A. D. (2014). equate: An R Package for Observed-Score Linking and Equating. R package version, 2. Allen, N. L., & Donoghue, J. R. (1996). Applying the Mantel-Haenszel Procedure to Complex Samples of Items. Journal of Educational Measurement, 33(2), 231–251. Bradley, D. R., Bradley, T. D., McGrath, S. G., & Cutcomb, S. D. (1979). Type I error rate of the chi-square test in independence in R× C tables that have small expected frequencies. Psychological Bulletin, 86(6), 1290-1297. Cheng, Y., Chen, P., Qian, J., & Chang, H.-H. (2013). Equated Pooled Booklet Method in DIF Testing. Applied Psychological Measurement, 37(4), 276–288. Chen, J.-H., Chen, C.-T., & Shih, C.-L. (2014). Improving the Control of Type I Error Rate in Assessing Differential Item Functioning for Hierarchical Generalized Linear Model When Impact Is Presented. Applied Psychological Measurement, 38(1), 18–36. DeMars, C. E. (2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70(6), 961–972. Donoghue, J. R., Holland, P. W., & Thayer, D. T. (1993). A Monte Carlo study of factors that affect the Mantel-Haenszel and standardization measures of differential item functioning. Differential Item Functioning, 137–166. Dorans, N. J. (1990). Equating Methods and Sampling Designs. Applied Measurement in Education, 3(1), 3-17. Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland and H. Wainer (Eds.), Differential item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum Associates. Dorans, N. J., & Holland, P. W. (2000). Population Invariance and the Equatability of Tests: Basic Theory and the Linear Case. Journal of Educational Measurement, 281-306. Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: An exploration across subpopulations and test administrations. Applied Psychological Measurement, 32(1), 81–97. Fidalgo, A. M., Mellenbergh, G. J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43–53. Fidalgo,A. M., & Madeira, J. M. (2008). Generalized Mantel-Haenszel methods for differential item functioning detection. Educational and Psychological Measurement, 68(6), 940-958. Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278–295. Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME Instructional Module on Booklet Designs in Large-Scale Assessments of Student Achievement: Theory and Practice. Educational Measurement: Issues and Practice, 28(3), 39–53. Goodman, J. T., Willse, J. T., Allen, N. L., & Klaric, J. S. (2011). Identification of differential item functioning in assessment booklet designs with structurally missing data. Educational and Psychological Measurement, 71(1), 80-94. Hidalgo, M. D. (2004). Differential Item Functioning Detection and Effect Size: A Comparison between Logistic Regression and Mantel-Haenszel Procedures. Educational and Psychological Measurement, 64(6), 903–915. Hu, B. S., & Chen, C. T. (2015, March). Applying Double Purification Procedure for Differential Item Functioning on Large Scale Assessments. Paper session presented at The Fifth Asian Conference on Psychology & Behavioral Sciences, Osaka, Japan. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum Associates. Kolen, M. J., & Brennan, R. L. (1987). Linear equating models for the common-item nonequivalent-populations design. Applied Psychological Measurement, 11(3), 263–277. Kopf, J., Zeileis, A., & Strobl, C. (2015). Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches. Educational and Psychological Measurement, 75(1), 22–56. Lee, H., & Geisinger, K. F. (2015). The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment. Educational and Psychological Measurement, 76(1), 141-163. Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages for PISA science items. International Journal of Testing, 9(2), 122–133. Little, R. J., & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods & Research, 18(2-3), 292–326. Li, Z. (2015). A Power Formula for the Mantel-Haenszel Test for Differential Item Functioning. Applied Psychological Measurement, 39(5), 373–388. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Magis, D., & De Boeck, P. (2014). Type I Error Inflation in DIF Identification With Mantel-Haenszel: An Explanation and a Solution. Educational and Psychological Measurement, 74(4), 713–728. Mazor, K. M. (1994). Identification of Nonuniform Differential Item Functioning Using a Variation of the Mantel-Haenszel Procedure. Educational and Psychological Measurement, 54(2), 284-291. Organization for Economic Cooperation and Development (OECD). (2014). PISA 2012 Technical Report. Paris: Author Parshall, C. G., & Miller, T. R. (1995). Exact Versus Asymptotic Mantel-Haenszel DIF Statistics: A Comparison of Performance Under Small-Sample Conditions. Journal of Educational Measurement, 32(3), 302–316. Preece, D. A. (1990). Fifty years of Youden squares: a review. Bulletin of the Institute of Mathematics and its Applications, 26(4), 65–75. Revelle, W. (2015). Using the psych package to generate and test structural models. Retrived from http://bioconductor.statistik.tu- dortmund.de/cran/web/packages/psych/vignettes/psych_for_sem.pdf Sandilands, D. A. (2014). Accuracy of differential item functioning detection methods in structurally missing data due to booklet design. (Unpublished doctoral dissertation). The University of British Columbia, Vancouver, Canada. Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194. Shih, C.-L., & Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33(3), 184–199. Su, Y.-H., & Wang, W.-C. (2005). Efficiency of the Mantel, Generalized Mantel–Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18(4), 313–350. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.),Test validity (pp. 147-170). Hillsdale, NJ: Lawrence Erlbaum Wald, A. (1943). Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large. Transactions of the American Mathematical Society, 54(3), 426-482. Wang, W.-C., Shih, C.-L., & Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72(4), 687–708. Wang, W.-C., & Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113–144. Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479–498. Woods, C. M. (2008). Empirical Selection of Anchors for Tests of Differential Item Functioning. Applied Psychological Measurement, 33(1), 42–57. Youden, W. J. (1937). Use of incomplete block replications in estimating tobacco-mosaic virus. Contributions from Boyce Thompson Institute, 9, 41–48. Youden, W. J. (1940). Experimental designs to increase accuracy of greenhouse studies. Contributions from Boyce Thompson Institute, 11, 219–228. |