There has been a great interest among EFL instructors concerning how to improve students’ English writing skills and undertake fair assessment of their products. In EFL argumentative writings, students are required to state their opinions following the logical discourse patters that differ from those of their L1’s (e.g., Connor,1996; Kaplan, 1966; Oi, 1986; 1997; 1999). This suggests the importance of assessing not only language use but also text organization and content. As an attempt to assess the degree to which the instructional goals of argumentative writing are achieved, the presenter has developed a modified scoring instrument that measures content and organization together specifically in terms of argument based on the framework of the Empirically derived, Binary-choice, Boundary-definition (EBB) approach (Koizumi &Hirai, 2008; 2013; Turner & Upshur,1995; 1996; 2002). The study investigates the sources of measurement error in the content and organization ratings with the EBB scale and the language use ratings with the checklist based on the Empirically-derived Descriptor-based Diagnostic (EDD) checklist (Kim, 2011). The present study attempts to examine the dependability of the rating scales using G-theory (Brennan, 1983; Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Shavelson & Webb, 1991) which can investigate the relative effects of multiple sources of variance in the test scores such as tasks and rater judgments (e.g., Bachman, 2004; Cumming, 1990; McNamara, 1990).
In this study 25 Japanese high school students wrote argumentative paragraphs on two prompts twice, before and after the instruction of the Toulmin model that included claims, data, counter arguments and rebuttals (Toulmin, 1958; 2003). This yielded a total of 100 writing samples. Each sample was rated by two raters randomly selected from four trained raters with master’s degrees in English language education. Each rater rated 50 responses, 25 each from the pre-test and the post-test on either of the two prompts.
Currently the dependability of the obtained ratings is being examined by conducting G-theory analyses. The data will be analyzed by employing various G-theory study designs for modeling the raters and tasks and the rating scales.
In this session, the presenter will discuss the key results of the G-theory analyses, focusing on the dependability of the scales for argumentative writing with pedagogical implications based on the results for practical use in classroom.