The current trend toward machine-scoring of student work, Ericsson and Haswell argue, has created an emerging issue with implications for higher education across the disciplines, but with particular importance for those in English departments and in administration. The academic community has been silent on the issue—some would say excluded from it—while the commercial entities who develop essay-scoring software have been very active.
Machine Scoring of Student Essays
is the first volume to seriously consider the educational mechanisms and consequences of this trend, and it offers important discussions from some of the leading scholars in writing assessment.
Reading and evaluating student writing is a time-consuming process, yet it is a vital part of both student placement and coursework at post-secondary institutions. In recent years, commercial computer-evaluation programs have been developed to score student essays in both of these contexts. Two-year colleges have been especially drawn to these programs, but four-year institutions are moving to them as well, because of the cost-savings they promise. Unfortunately, to a large extent, the programs have been written, and institutions are installing them, without attention to their instructional validity or adequacy.
Since the education software companies are moving so rapidly into what they perceive as a promising new market, a wider discussion of machine-scoring is vital if scholars hope to influence development and/or implementation of the programs being created. What is needed, then, is a critical resource to help teachers and administrators evaluate programs they might be considering, and to more fully envision the instructional consequences of adopting them. And this is the resource that Ericsson and Haswell are providing here.
MACHINE SCORING OF STUDENT ESSAYS
Truth and ConsequencesUTAH STATE UNIVERSITY PRESS
Copyright © 2006 Utah State University Press
All right reserved.ISBN: 978-0-87421-632-5Contents
Introduction Patricia Freitag Ericsson and Richard H. Haswell....................................................................................11 Interested Complicities: The Dialectic of Computer-Assisted Writing Assessment Ken S. McAllister and Edward M. White..........................82 The Meaning of Meaning: Is a Paragraph More than an Equation? Patricia Freitag Ericsson.......................................................283 Can't Touch This: Reflections on the Servitude of Computers as Readers Chris M. Anson.........................................................384 Automatons and Automated Scoring: Drudges, Black Boxes, and Dei Ex Machina Richard H. Haswell.................................................575 Taking a Spin on the Intelligent Essay Assessor Tim McGee.....................................................................................796 ACCUPLACER's Essay-Scoring Technology: When Reliability Does Not Equal Validity Edmund Jones..................................................937 WritePlacer Plus in Place: An Exploratory Case Study Anne Herrington and Charles Moran........................................................1148 E-Write as a Means for Placement into Three Composition Courses: A Pilot Study Richard N. Matzen Jr. and Colleen Sorensen.....................1309 Computerized Writing Assessment: Community College Faculty Find Reasons to Say "Not Yet" William W. Ziegler...................................13810 Piloting the COMPASS E-Write Software at Jackson State Community College Teri T. Maddox.......................................................14711 The Role of the Writing Coordinator in a Culture of Placement by ACCUPLACER Gail S. Corso.....................................................15412 Always Already: Automated Essay Scoring and Grammar-Checkers in College Writing Courses Carl Whithaus.........................................16613 Automated Essay Grading in the Sociology Classroom: Finding Common Ground Edward Brent and Martha Townsend....................................17714 Automated Writing Instruction: Computer-Assisted or Computer-Driven Pedagogies? Beth Ann Rothermel............................................19915 Why Less Is Not More: What We Lose by Letting a Computer Score Writing Samples William Condon.................................................21116 More Work for Teacher? Possible Futures of Teaching Writing in the Age of Computerized Writing Assessment Bob Broad...........................22117 A Bibliography of Machine Scoring of Student Writing, 1962-2005 Richard H. Haswell............................................................234Glossary..........................................................................................................................................244Notes.............................................................................................................................................246References........................................................................................................................................251Index.............................................................................................................................................262
Chapter One
INTERESTED COMPLICITIESS The Dialectic of Computer-Assisted Writing Assessment Ken S. McAllister and Edward M. White
She knew how difficult creating something new had proved. And she certainly had learned the hard way that there were no easy shortcuts to success. In particular, she remembered with embarrassment how she had tried to crash through the gates of success with a little piece on a young author struggling to succeed, and she still squirmed when she remembered how Evaluator, the Agency of Culture's gateway computer, had responded to her first Submission with an extreme boredom and superior knowledge born of long experience, "Ah, yes, Ms. Austen, a story on a young author, another one. Let's see, that's the eighth today-one from North America, one from Europe, two from Asia, and the rest from Africa, where that seems a popular discovery of this month. Your ending, like your concentration on classroom action and late night discussion among would-be authors, makes this a clear example of Kunstlerroman type 4A.31. Record this number and check the library, which at the last network census has 4,245 examples, three of which are canonical, 103 Serious Fiction, and the remainder ephemera. (Landow 1992, 193-194)
This excerpt from George Landow's tongue-in-cheek short story about "Apprentice Author Austen" and her attempts to publish a story on the international computer network, thereby ensuring her promotion to "Author," suggests a frightful future for writing and its assessment. The notion that a computer can deliver aesthetic judgments based on quantifiable linguistic determinants is abhorrent to many contemporary writing teachers, who usually treasure such CPU-halting literary features as ambiguity, punning, metaphor, and veiled reference. But Landow's "Evaluator" may only be a few generations ahead of extant technologies like the Educational Testing Service's e-rater, and recent developments in the fields of linguistic theory, natural language processing, psychometrics, and software design have already made computers indispensable in the analysis, if not the assessment, of the written word. In this chapter, we approach the history of computer-assisted writing assessment using a broad perspective that takes into account the roles of computational and linguistics research, the entrepreneurialism that turns such research into branded commodities, the adoption and rejection of these technologies among teachers and administrators, and the reception of computer-assisted writing assessment by the students whose work these technologies process.
Such a broad treatment cannot hope to be comprehensive, of course. Fortunately, the field of computer-assisted writing assessment is sufficiently well established that there exist numerous retrospectives devoted to each of the roles noted above-research, marketing, adoption, and use-many of which are listed in the bibliography at the end of this book. Our purpose here in this first chapter of an entire volume dedicated to computer-assisted writing assessment is to offer readers a broad perspective on how computer-assisted writing assessment has reached the point it occupies today, a point at which the balance of funding is slowly shifting from the research side to the commercial side, and where there is-despite the protestations of many teachers and writers-an increasing acceptance of the idea that computers can prove useful in assessing writing. This objective cannot be reached by examining the disembodied parts of computer-assisted writing assessment's historical composition; instead, such assessment must be treated as an extended site of inquiry in which all its components are seen as articulated elements of a historical process. This complex process has evolved in particular ways and taken particular forms in the past half century due to a variety of social and economic relations that have elevated and devalued different interests along the way.
In the following sections we trace this web of relations and suggest that theoretically informed practice in particular circumstances-what we will be calling "praxis"-rather than uncritical approbation or pessimistic denunciation ought to guide future deliberations on the place of computer-assisted writing assessment in educational institutions. Our hope is that by surveying for readers the technological, ideological, and institutional landscape that computer-assisted writing assessment has traversed over the years, we will help them-everyone from the greenest of writing program administrators to the most savvy of traditional assessment gurus-develop some historical and critical perspective on this technology's development, as well as on its adoption or rejection in particular contexts. Such perspectives, we believe, make the always difficult process of deciding how to allocate scarce resources-not to mention the equally dizzying process of simply distinguishing hype from reality-considerably more straightforward than trying to do so without some knowledge of the field's history, technology, and "interested complicities."
INTERESTED COMPLICITIES
The process of designing computers to read human texts is usually called natural language processing, and when these techniques are applied to written texts and specifically connected to software that draws conclusions from natural language processing, it becomes a form of writing assessment. Raymond Kurzweil (1999), an artificial intelligence guru specializing in speech recognition technologies, has a grim view of natural language processing, asserting as recently as the end of the last century that "understanding human language in a relatively unrestricted domain remains too difficult for today's computers" (306). In other words, it is impossible-for now at least-for computers to discern the complex and manifold meanings of such things as brainstorming sessions in the boardroom, chitchat at a dinner party and, yes, student essays.
The disjunction between the desire for natural language processing and the current state of technology has created a territory for debate over computer-assisted writing assessment that is dynamic and occasionally volatile. It is possible, of course, to freeze this debate and claim that it is divided into this or that camp, but such an assertion would be difficult to maintain for long. To say, for instance, that there are those who are for and those who are against computer-assisted writing assessment might be true enough if one examines its history only from the perspective of its reception among certain articulate groups of writing teachers. Such a perspective doesn't take into consideration, however, the fact that there are a fair number of teachers-and perhaps even some readers of this book-who are undecided about computer-assisted writing assessment; such people, in fact, might well like there to be a technology that delivers what computer-assisted writing assessment companies say it can, but who are ultimately skeptical. Nor does it consider the fact that natural language processing researchers frequently occupy a position that may be termed "informed hopefulness." Such a position neither denies the current limitations and failings of computer-assisted writing assessment nor rejects the possibility that high-quality (i.e., humanlike) computer-assisted writing assessment is achievable.
Another way the debate could be misleadingly characterized is as a misunderstanding between researchers and end users. Almost without exception, the researchers developing systems that "read" texts acknowledge that the computers don't really "understand" what they're seeing, but only recognize patterns and probabilities. Of course, the process of reading among humans-and virtually every other sign-reading creature-also depends on pattern recognition and probabilistic reasoning, but the human brain adds to this a wealth of other types of interpretive skills-sensory perception, associative thinking, and advanced contextual analysis, for example-that makes a vast difference between how computers and humans read. Nonetheless, end users see the fruits of natural language processing research, which is often very compelling from certain angles, and declare such computer-assisted writing assessment systems either a welcome pedagogical innovation or a homogenizing and potentially dangerous pedagogical crutch. This misunderstanding is often exacerbated by the people who commodify the work of researchers and turn it into products for end users. The marketing of computer-assisted writing assessment algorithms and the computer applications built around them is an exercise in subtlety (when done well) or in hucksterism (when done dishonestly). The challenge for marketers dealing with computer-assisted writing assessment is that they must find a way around the straightforward and largely uncontested fact that, as Kurzweil (1999) said, computers can't read and understand human language in unrestricted domains-precisely the type of writing found in school writing assignments.
Rather than trying to tell the story of the history of computer-assisted writing assessment as a tale of good and evil-where good and evil could be played interchangeably by computers and humans-we prefer to tell the history more dialectically, that is, as a history of interested complicities. The evolution of computer-assisted writing assessment involves many perspectives, and each perspective has a particular stake in the technology's success or failure. Some people have pursued computer-assisted writing assessment for fame and profit, while others have done it for the sake of curiosity and the advancement of learning (which is itself often fueled by the pressure of the promotion and tenure process). Some have pursued computer-assisted writing assessment for the advantages that novelty brings to the classroom, while others have embraced it as a labor-saving innovation. And some people have rejected computer-assisted writing assessment for its paltry return on the investments that have been made in it, its disappointing performance in practical situations, and the message its adoption-even in its most disappointing form-seems to send to the world: computers can teach and respond to student writing as well as humans. In its way, each of these perspectives is justifiable, and for this reason we believe it is important to step back and ask what kind of conditions would be necessary to sustain such a variety of views and to attempt to ascertain what the most responsible stance to take to such a tangle of interests might be in the first decade of the twenty-first century.
The development of computer-assisted writing assessment is a complex evolution driven by the dialectic among researchers, entrepreneurs, and teachers. The former two groups have long been working to extend the limits of machine cognition as well as exploit for profit the technologies that the researchers have developed. Teachers, too, have been driven to shape the development of computer-assisted writing assessment, mainly by their understandable desires to lighten their workloads, serve their students, and protect their jobs and sense of professional importance. All of these people have motives for their perspectives, and some have more power than others to press their interests forward. As a dynamic system-as a dialectic-each accommodation of one of those interests causes changes throughout the system, perhaps steeling the resolve of certain opponents while eliminating others and redirecting the course of research elsewhere. In general, all of the participants in this dialectic are aware of the interests at stake-their own and those of others-and have tended to accept certain broad disciplinary shifts (from computer-assisted writing assessment as research to computer-assisted writing assessment as commodity, for example) while fighting for particular community-based stakes that seem fairly easy to maintain (like having a human spot-check the computer's assessments). It is for this reason that we see computer-assisted writing assessment as being a dialectic characterized by interested complicities: each group-researchers, marketers, adopters, and users-has interests in the technology that have become complicit with, but are different from, those of all the others.
The remainder of this chapter briefly narrates this dialectic beginning in the English department. It is there that the analysis of texts has been a staple of scholarly activity since long before the advent of the computer and where, despite its reputation for textual conservatism, innovative academics have more consistently acted as the hub of activity for the inherently interdisciplinary work of computer-assisted writing analysis and assessment than anyplace else on campus. Additionally, many readers of this book will be members of English departments seeking to engage their colleagues in discussion about the meaning and implications of computer-assisted writing assessment. Such readers will be more able to talk with their colleagues, almost all of whom have a background in literature, if they are aware of the literary theories-theories of reading, as others may call them-that underlay response to and assessment of all texts.
NOTES FROM THE ENGLISH DEPARTMENT
When Lionel Trilling criticized V. L. Parrington in his 1948 essay "Reality in America," he did so in language that to proponents of computer-assisted writing assessment must now seem simultaneously validating and dismissive. Trilling notes cuttingly that Parrington's work is "notable for its generosity and enthusiasm but certainly not for its accuracy or originality" (1950, 15). To illustrate this criticism, Trilling complains that Parrington uses the word romantic "more frequently than one can count, and seldom with the same meaning, seldom with the sense that the word ... is still full of complicated but not wholly pointless ideas, that it involves many contrary but definable things" (17). In this barrage of barbs, Trilling implies that accuracy, accountability, and stability are crucial characteristics of all good writing.
Further, Trilling here, as elsewhere, articulates the formalism that had come to dominate American literary criticism in the late 1940s and 1950s. Though based on older models of European formalism, this innovation in literary analysis was optimistically termed by American critics "the new criticism" because it eschewed such impressionistic matters as morality, biography, and reader emotion for intense study of texts as objects containing meanings to be discerned through detailed examination and close reading. Such reading, with particular attention to metaphor, irony, ambiguity, and structure, would reveal the deep meanings within the text and allow the critic to announce those meanings with a certain scientific accuracy based wholly on the words in the work of literature. The few opponents of this approach complained that this dispassionate analysis was altogether too aesthetic and removed from the real and passionate world of literature and life, and that it rendered students passive before the all-knowing teacher who would unfold the meaning of a poem or a play as if solving a complicated puzzle that only initiates could work through. The charge of mere aestheticism, made fervently by Marxist and other critics with social concerns about the effects of literature, rings with particular irony now, as we look back to the new criticism as providing a kind of theoretical ground for computer assessment, an "explication de texte" also based on the belief that meaning-or at least value-resides wholly in the words and structure of a piece of writing.
(Continues...)
Excerpted from MACHINE SCORING OF STUDENT ESSAYS Copyright © 2006 by Utah State University Press. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.