Evaluation

"We're lab rats, and we're going to take over the world by writing stories!" - Subject on the first day of the study

Description of the Study

The study was conducted for three reasons. First, the study aimed to determine, using methods of the learning sciences, whether or not a system using a conversational computer character can help to improve children's story writing. The very small sample size would likely preclude conclusive results. Still, significant results from the study could indicate whether this research area is a fruitful one and could focus additional development of educational computer characters to help writing. Second, the study was meant to aid in software development, to provide feedback on the usability of the system and the way in which it would actually be used by children. The usability results could enable short-term improvement of the programs so stable, polished versions of them could be released to educators and students. Third, it was hoped that all three pieces of software would work well enough to afford some educational benefit to the students who wrote with them. The combination of this educational goal with the scientific and software development goals makes the quotation above particularly apt.

Accomplishing the first goal involves empirical evaluation of the writing improvement made by children using EddieEdit and that improvement made by children using StoryStages and E-Write. StoryStages was developed to be a control, and it has the same tutorial information about story writing as does EddieEdit. It simply lacks the conversational character interface. Since the particular planning and revising prompts used may not themselves be effective, E-Write provides an additional control. Because the study is to evaluate the cognitive effects of, not with, the software, the stories evaluated to determine improvement were written before and after the main sessions in which the software was used.

The study was conducted at a public elementary school in Cambridge, Massachusetts. The study ran from March 18 to April 3, 1998. It took place in the school's computer lab, where students used PowerMacintosh 5200/75 LC computers with 12 MB of RAM, running System 7.5.3.

Eighteen students from one grade 2/3 classroom participated in the study from start to finish. All of the stories written by children were saved for analysis. To preserve the anonymity of the subjects, the names of subjects were removed from stories and replaced with a code before anyone other than the experimenter examined them.

The complete procedure was as follows:

The parents or guardians of subjects read a letter that describes the study and includes a consent declaration.
The study was described to subjects, who were then given instructions and asked to fill out the pre-writing questionnaire.
Subjects each wrote a story during a class session, using a familiar school word processor, ClarisWorks. They were given forty minutes.
Subjects were assigned to three groups of six students each based on a rough estimate of their writing ability. This estimate was based on the one story they had each written, and was done to avoid extreme clustering of low- or high-ability students in one of the three groups.
At the next class session, the groups were assigned colors corresponding to E-Write (Red), EddieEdit (Blue), and StoryStages (Green). Each student wrote for the next eight classes using the same assigned system. The experimenter offered no assistance whatsoever in using the software, although did offer to intervene if the system needed to be restarted or if there was a chance of data loss. Subjects had to rely on the introductory and help screens, and on assistance from the teacher and staff. These types of help were uniform for all participants. The groups were as physically separated as possible within the computer lab, each occupying a row.
During the final class session, subjects again each wrote a story, using the same familiar school word processor, ClarisWorks. They were given forty minutes, the same amount of time to write as in step 3.
Subjects were thanked and asked to fill out the post-writing questionnaire.
The investigator explained to subjects what the goals and procedures of the study were, in detail.
After analysis of data, the results are also described to the teacher, parents, and subjects. Copies of all the stories written by a particular subject are given to that student on disk, along with improved versions of EddieEdit and StoryStages.

The author conducted the experiment. He did not provide help, either on use of the software or on writing, during the course of the study. However, it was necessary to interact with students during the writing time in several cases. When a student closed the writing program, which was on floppy disk, it could not be restarted by the student. The experimenter had to enter a code on that computer to disable a security feature, unlocking the floppy disk and allowing the execution of programs on floppy. This allowed the program to be run again. In one case a student erased his story unintentionally and the experimenter restored it from a backup. To not intervene in that situation would have interfered unacceptably with the third purpose of the study, to be an educational experience.

Students were not personally implored by the experimenter to write, but if they were distracting others or out of their seats they were told to return to their computers. While the row using E-Write was alone, those using StoryStages and EddieEdit sat in two rows back to back. Students would sometimes turn around to see what someone in the other group was doing or to talk to them. The experimenter tried to deal with this by walking in between a distracted student and that student's object of attention. The experimenter received frequent requests for help using the software and for help spelling words, but he directed both of these sorts of queries to others, saying the student would have to ask someone else.

The teacher, an intern who assisted the teacher, and the school's staff member in charge of the computer lab all were present during most of the study and provided this sort of help. The experimenter encouraged these educators to help students in the ways they normally would. Since the software is for classroom use and meant to be used in this context, it is consistent with all the purposes of the study to have such adults providing help.

The last day planned for the study fell on a Friday. After two weeks of writing every day, several of the children were uninterested in writing stories. During what was to be the final evaluation of their story writing ability, some children started to create newspapers, drew pictures, and began typing everything using a symbol font. The experimenter believed the result was not representative of their writing ability, and the teacher and intern agreed. Another writing session was scheduled and after a day off the children were more focused on story writing, although many of the students resented the additional session. The stories from this final session, rather than from the last scheduled session, were the ones used in the analysis.

Composition of the Groups

The 18 children who participated in the study from start to finish included both second and third graders. All were in the same elementary school class. The students were of varied races. The student sample consisted of 11 boys and 7 girls.

Answers on the questionnaires indicated a high level of previous computer experience. Specifically, all but three of the students wrote that they had a computer at home. Students were asked "Have you used computers before?" and could answer from 1 (no prior computer use) to 5 (constant computer use). All students answered either 3, 4, or 5. The mean value for students' answers was 4.3 with standard deviation of .8. Students were asked about the specific activities they had done on computers, and given a list of six activities as well as the option to write additional ones. All students indicated at least three activities. All of them wrote that they had played computer games. Thirteen of them wrote that they previously used the computer to write stories. In the "other" category, activities listed ranged form homework to "hacking." In reply to "Do you like to use computers?" all students answered either 3, 4, or 5 on a similar scale. The mean answer was 4.7 with a standard deviation of .6. On the first day of the study, the students used ClarisWorks, a word processor designed for adults. The ease with which they used the functions of this software was consistent with their reports of familiarity with computer.

Students reported that, on average, they had written "some" stories before - more than "a few" but less than "many." One student indicated he had not written stories before, while three reported that they had written "very, very many." The mean answer on a 1 to 5 scale was 3.0 with a standard deviation of 1.3. The group enjoyed story writing for the most part, although not as much as computer use. On a 1 to 5 scale the mean answer was 4.2 and the standard deviation .9. In answer to the question "How do you write them?" only eight indicated that they type them on a computer. Fourteen of the students said they write stories with a pen or pencil. This set of students includes some who also use the computer to write stories, as some students indicated more than one method.

The students were divided into the three groups (Red, Green, and Blue, to use E-Write, StoryStages, and EddieEdit) so that the distribution of grade level and gender was similar in each group. The Red and Green groups each had four boys and two girls, while the Blue group had three of each. The Red and Blue groups each had equal numbers of third and second graders, while the Green group had four second grade students and two from third grade.

Students were also assigned to groups so that the groups would be roughly equal in terms of average writing ability and similar in terms of the distribution of writing ability. This was done by means of a brief qualitative evaluation of their initial writing samples, done by the experimenter. The avoided extreme clustering of high- or low-ability students in a single group. The distribution across groups was clearly not identical, but this was not problematic. The important attribute that should be evenly distributed among the three groups is ability to improve, not ability to write. It is improvement in writing quality, not absolute writing quality, that is the variable under consideration. Of course very high-ability students who know how to plan, write, and revise already, who are comfortable with writing at length, and who can envision an audience would benefit little from the educational interventions in this study. However, it is safe to assume at this age that most students can improve their writing at least a little when helped with these things.

Students were first placed in three unlabeled groups as indicated above. The three groups were then randomly assigned as Red, Green, and Blue and associated with the three pieces of software.

Use and Usability of the Software

Observation of the students as they used the software highlighted some particular problems and successes. The difficulties they had provide guidance for making both high-level changes in design and minor refinements. The way in which they used the software also revealed how well the software's model of work flow was able to match up with and assist the cognitive process of the writer.

Children did at times converse with Eddie as they planned their stories. Children would bypass questions they didn't wish to answer but might answer others later. Sometimes, for instance, the questions about plot would all be answered "I don't know," but other replies would include description of elements which later appeared in the child's story. The two questions that ask the child to first brainstorm and then pick one or a few specific ideas usually resulted in puzzlement when they were first encountered. Children never wrote more than one story idea to begin with, so Eddie's request that the user pick one from the list was unnecessary.

At one point Eddie asks "Now would you like to talk about the characters, the place, the time, or the things that happen?" Eddie then begins asking about the specific story element the writer chooses. Eddie can understand replies such as "plot," "the things," "what happens," "the place," "the people," "the characters." One child, who was carefully reading all of the directions and thinking about each of his replies, typed in response to this question "yes." This is not a response Eddie understands in this context, so Eddie simply indicated his lack of understanding and asked the question again. The question is probably phrased poorly and this experience indicates that Eddie's handling of a response like "yes" is not adequate.

EDDIE: OK. Now we can talk about parts of the story. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: YES