nickm.com > older writing > SMthesis

Evaluation


"We're lab rats, and we're going to take over the world by writing stories!" - Subject on the first day of the study

Description of the Study

The study was conducted for three reasons. First, the study aimed to determine, using methods of the learning sciences, whether or not a system using a conversational computer character can help to improve children's story writing. The very small sample size would likely preclude conclusive results. Still, significant results from the study could indicate whether this research area is a fruitful one and could focus additional development of educational computer characters to help writing. Second, the study was meant to aid in software development, to provide feedback on the usability of the system and the way in which it would actually be used by children. The usability results could enable short-term improvement of the programs so stable, polished versions of them could be released to educators and students. Third, it was hoped that all three pieces of software would work well enough to afford some educational benefit to the students who wrote with them. The combination of this educational goal with the scientific and software development goals makes the quotation above particularly apt.

Accomplishing the first goal involves empirical evaluation of the writing improvement made by children using EddieEdit and that improvement made by children using StoryStages and E-Write. StoryStages was developed to be a control, and it has the same tutorial information about story writing as does EddieEdit. It simply lacks the conversational character interface. Since the particular planning and revising prompts used may not themselves be effective, E-Write provides an additional control. Because the study is to evaluate the cognitive effects of, not with, the software, the stories evaluated to determine improvement were written before and after the main sessions in which the software was used.

The study was conducted at a public elementary school in Cambridge, Massachusetts. The study ran from March 18 to April 3, 1998. It took place in the school's computer lab, where students used PowerMacintosh 5200/75 LC computers with 12 MB of RAM, running System 7.5.3.

Eighteen students from one grade 2/3 classroom participated in the study from start to finish. All of the stories written by children were saved for analysis. To preserve the anonymity of the subjects, the names of subjects were removed from stories and replaced with a code before anyone other than the experimenter examined them.

The complete procedure was as follows:

  1. The parents or guardians of subjects read a letter that describes the study and includes a consent declaration.
  2. The study was described to subjects, who were then given instructions and asked to fill out the pre-writing questionnaire.
  3. Subjects each wrote a story during a class session, using a familiar school word processor, ClarisWorks. They were given forty minutes.
  4. Subjects were assigned to three groups of six students each based on a rough estimate of their writing ability. This estimate was based on the one story they had each written, and was done to avoid extreme clustering of low- or high-ability students in one of the three groups.
  5. At the next class session, the groups were assigned colors corresponding to E-Write (Red), EddieEdit (Blue), and StoryStages (Green). Each student wrote for the next eight classes using the same assigned system. The experimenter offered no assistance whatsoever in using the software, although did offer to intervene if the system needed to be restarted or if there was a chance of data loss. Subjects had to rely on the introductory and help screens, and on assistance from the teacher and staff. These types of help were uniform for all participants. The groups were as physically separated as possible within the computer lab, each occupying a row.
  6. During the final class session, subjects again each wrote a story, using the same familiar school word processor, ClarisWorks. They were given forty minutes, the same amount of time to write as in step 3.
  7. Subjects were thanked and asked to fill out the post-writing questionnaire.
  8. The investigator explained to subjects what the goals and procedures of the study were, in detail.
  9. After analysis of data, the results are also described to the teacher, parents, and subjects. Copies of all the stories written by a particular subject are given to that student on disk, along with improved versions of EddieEdit and StoryStages.

The author conducted the experiment. He did not provide help, either on use of the software or on writing, during the course of the study. However, it was necessary to interact with students during the writing time in several cases. When a student closed the writing program, which was on floppy disk, it could not be restarted by the student. The experimenter had to enter a code on that computer to disable a security feature, unlocking the floppy disk and allowing the execution of programs on floppy. This allowed the program to be run again. In one case a student erased his story unintentionally and the experimenter restored it from a backup. To not intervene in that situation would have interfered unacceptably with the third purpose of the study, to be an educational experience.

Students were not personally implored by the experimenter to write, but if they were distracting others or out of their seats they were told to return to their computers. While the row using E-Write was alone, those using StoryStages and EddieEdit sat in two rows back to back. Students would sometimes turn around to see what someone in the other group was doing or to talk to them. The experimenter tried to deal with this by walking in between a distracted student and that student's object of attention. The experimenter received frequent requests for help using the software and for help spelling words, but he directed both of these sorts of queries to others, saying the student would have to ask someone else.

The teacher, an intern who assisted the teacher, and the school's staff member in charge of the computer lab all were present during most of the study and provided this sort of help. The experimenter encouraged these educators to help students in the ways they normally would. Since the software is for classroom use and meant to be used in this context, it is consistent with all the purposes of the study to have such adults providing help.

The last day planned for the study fell on a Friday. After two weeks of writing every day, several of the children were uninterested in writing stories. During what was to be the final evaluation of their story writing ability, some children started to create newspapers, drew pictures, and began typing everything using a symbol font. The experimenter believed the result was not representative of their writing ability, and the teacher and intern agreed. Another writing session was scheduled and after a day off the children were more focused on story writing, although many of the students resented the additional session. The stories from this final session, rather than from the last scheduled session, were the ones used in the analysis.

Composition of the Groups

The 18 children who participated in the study from start to finish included both second and third graders. All were in the same elementary school class. The students were of varied races. The student sample consisted of 11 boys and 7 girls.

Answers on the questionnaires indicated a high level of previous computer experience. Specifically, all but three of the students wrote that they had a computer at home. Students were asked "Have you used computers before?" and could answer from 1 (no prior computer use) to 5 (constant computer use). All students answered either 3, 4, or 5. The mean value for students' answers was 4.3 with standard deviation of .8. Students were asked about the specific activities they had done on computers, and given a list of six activities as well as the option to write additional ones. All students indicated at least three activities. All of them wrote that they had played computer games. Thirteen of them wrote that they previously used the computer to write stories. In the "other" category, activities listed ranged form homework to "hacking." In reply to "Do you like to use computers?" all students answered either 3, 4, or 5 on a similar scale. The mean answer was 4.7 with a standard deviation of .6. On the first day of the study, the students used ClarisWorks, a word processor designed for adults. The ease with which they used the functions of this software was consistent with their reports of familiarity with computer.

Students reported that, on average, they had written "some" stories before - more than "a few" but less than "many." One student indicated he had not written stories before, while three reported that they had written "very, very many." The mean answer on a 1 to 5 scale was 3.0 with a standard deviation of 1.3. The group enjoyed story writing for the most part, although not as much as computer use. On a 1 to 5 scale the mean answer was 4.2 and the standard deviation .9. In answer to the question "How do you write them?" only eight indicated that they type them on a computer. Fourteen of the students said they write stories with a pen or pencil. This set of students includes some who also use the computer to write stories, as some students indicated more than one method.

The students were divided into the three groups (Red, Green, and Blue, to use E-Write, StoryStages, and EddieEdit) so that the distribution of grade level and gender was similar in each group. The Red and Green groups each had four boys and two girls, while the Blue group had three of each. The Red and Blue groups each had equal numbers of third and second graders, while the Green group had four second grade students and two from third grade.

Students were also assigned to groups so that the groups would be roughly equal in terms of average writing ability and similar in terms of the distribution of writing ability. This was done by means of a brief qualitative evaluation of their initial writing samples, done by the experimenter. The avoided extreme clustering of high- or low-ability students in a single group. The distribution across groups was clearly not identical, but this was not problematic. The important attribute that should be evenly distributed among the three groups is ability to improve, not ability to write. It is improvement in writing quality, not absolute writing quality, that is the variable under consideration. Of course very high-ability students who know how to plan, write, and revise already, who are comfortable with writing at length, and who can envision an audience would benefit little from the educational interventions in this study. However, it is safe to assume at this age that most students can improve their writing at least a little when helped with these things.

Students were first placed in three unlabeled groups as indicated above. The three groups were then randomly assigned as Red, Green, and Blue and associated with the three pieces of software.

Use and Usability of the Software

Observation of the students as they used the software highlighted some particular problems and successes. The difficulties they had provide guidance for making both high-level changes in design and minor refinements. The way in which they used the software also revealed how well the software's model of work flow was able to match up with and assist the cognitive process of the writer.

Children did at times converse with Eddie as they planned their stories. Children would bypass questions they didn't wish to answer but might answer others later. Sometimes, for instance, the questions about plot would all be answered "I don't know," but other replies would include description of elements which later appeared in the child's story. The two questions that ask the child to first brainstorm and then pick one or a few specific ideas usually resulted in puzzlement when they were first encountered. Children never wrote more than one story idea to begin with, so Eddie's request that the user pick one from the list was unnecessary.

At one point Eddie asks "Now would you like to talk about the characters, the place, the time, or the things that happen?" Eddie then begins asking about the specific story element the writer chooses. Eddie can understand replies such as "plot," "the things," "what happens," "the place," "the people," "the characters." One child, who was carefully reading all of the directions and thinking about each of his replies, typed in response to this question "yes." This is not a response Eddie understands in this context, so Eddie simply indicated his lack of understanding and asked the question again. The question is probably phrased poorly and this experience indicates that Eddie's handling of a response like "yes" is not adequate.


EDDIE: OK. Now we can talk about parts of the story. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: YES

EDDIE: Sorry, I didn't understand. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: yes i do

EDDIE: Sorry, I didn't understand. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: Y

EDDIE: Sorry, I didn't understand. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: Yes

EDDIE: Sorry, I didn't understand. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: ok

EDDIE: Sorry, I didn't understand. Now do you want to talk about the characters, the place, the time, or the things that happen?

USER: characters

EDDIE: Who is going to be the main person in the story - the main character?


FIGURE 6. Eddie's limited conversational abilities are not enough to deal with this appropriate but unanticipated response from the student, who finally figured out how to respond with the help of the computer lab manager.

Although the on-screen directions indicated that the user should press return to finish typing a reply to Eddie, some children had difficulty with that. A "Tell Eddie" button may have helped to alleviate this problem. In general, the on-screen directions were too lengthy for children to read entirely. Yet the absence of graphics did not prevent children from finding commands on the menu bar or using the checklist or conversation boxes. It did not seem necessary to have graphical elements in the interface to make the software easily usable by children.

Many children skipped conversing with Eddie and began writing immediately. This was also the case with the writing prompts in StoryStages, although more writers used these initially. Few children used the revision prompts of Eddie during their first writing session. During the first few days, many closed the program while still in the writing phase and began a new story. They saved what they had done but never moved on to the revision phase. The process of moving from phase to phase and confirming that the move is desired was not particularly smooth and led to some hesitation at first. Still, children learned fairly quickly how to advance to the next phase.

Students did not complain about having to close the program and open it again to write a new story, although this process was somewhat cumbersome. It was worsened because the experimenter had to unlock the desktop each time the program had to restarted, since a security program would have otherwise prevented programs being run from floppy disk.

The lack of font control and more extensive control over styles led some students to complain during the study. Some students wanted to know if their story was longer than a page and not being able to see the pagination bothered them. To achieve additional effects, some students typed while holding down the option key to produce symbol characters. Other students pressed the function keys. This created squares at the insertion point on their screen. These squares were not printable characters, however, and did not show up on the printout. This surprised some students who had created patterns with the squares in their stories. One E-Write user had a more severe problem with the display. He reached an impasse when he became troubled by being unable to see the ascender on his lowercase d. The d was italic and at the end of the text, and Macintosh TextEdit clips a portion of italic characters when displaying them. The student called for help from the teacher, erased and re-typed the d several times, and did not do additional writing for several minutes while he tried to deal with the perceived problem. Good control over formatting and accurate WYSIWYG screen display were clearly important issues, perhaps even the dominant issues, for this group.

Writing Done in the Three Groups

EddieEdit and StoryStages were designed to be useful tools for writing stories, but their main purpose is educational. Even if a child does not use the software when writing a story, the experience of having used it should provide a lasting benefit. To discern whether such a benefit was seen, it is necessary to differentiate the cognitive effects with the software from the cognitive effects of the software (Salomon 1990). The stories written during the study shed light mainly on the cognitive effects with the software. Yet it is still useful to examine the stories written during the study. Stories are automatically saved in different sections of the story file as students enter a new phase. This means some interesting indicators of the writing processes of users may be captured in the files from EddieEdit and StoryStages. Saved along with the text of the stories are children's conversations with Eddie and their responses to StoryStages planning prompts. Also visible are the state of the stories at each phase of the writing process. So it is possible to see whether the StoryStages or EddieEdit students revised more during the revision phase as they used their programs. While this information only reflects how children write while using the tool, it can also provide clues about how they might be changing the overall way they think about writing.

Although responding to the prompts was not required, almost all the StoryStages and Eddie-Edit users did use the prompts to plan at least one story. None of the StoryStages users typed any answers to the revision prompts, but EddieEdit users did talk to Eddie about revision. Although they read everything Eddie had to say, none appeared to actually revise their stories based on his suggestions.


EDDIE: Hi again! I guess you wrote your story. I have a few more questions that might help you. Have you ever made changes to make one of your stories better?

USER: no

EDDIE: Really? Even great stories can be made better. I change mine a lot. Let me ask about the beginning of your story. Do you think your first sentence is really good?

USER: Yeah

EDDIE: Great! If you make it better it will be really super. If you want to, try to write one that is even better.

USER: no

EDDIE: OK, now let's talk about the end. That's what the reader will remember the most! Do you think the ending is really good?

USER: no

EDDIE: I bet you wrote a good ending. But if you don't like it that much, the best thing to do is write a better ending. Go ahead, think about how you could make it better. Then, write a better ending.

USER: ok

EDDIE: OK, now let's talk about characters - and how they feel and think. I'm sure you wrote about what the characters do. Did you also write about how they feel, and what they are thinking?

USER: Yes

EDDIE: That's very good! That's what makes characters seem real. If you want to, you can write more about how they think and feel.

USER: no

EDDIE: OK. Now, read your story. Is everything that happens very clear? Sometimes it helps to add some sentences to explain. That can make the story easier to understand. What could you add?

USER: Nothing

EDDIE: Well, that's about all I can tell you. Go over to the story window now and make all the changes you want to make. Then, come back and tell me: How many changes did you make?

USER: 54,000,000

EDDIE: Any changes you make can make the story better. I have to go now. I hope I helped you a little. Bye!


FIGURE 7. A typical conversation with Eddie about revision. At least the student's responses indicate he is reading Eddie's text.

The main result obtained from the saved StoryStages and Eddie-Edit data is that the sequential process model, in which students must start in planning mode, move to writing mode, and finish in revising mode without ever backtracking, is inadequate. Many users wrote large amounts even after they had "finished" their story - that is, completed the revision phase. Others began planning, progressed to the writing phase, and found that they wished to continue planning but could not. They had to begin another story and re-start their planning. In the StoryStages group, most quit the program without ever going to the revision phase, simply quitting when they felt done and starting another story. Because children were able to circumvent the unworkable model of sequential writing phases, sections of the saved stories could not be simply compared to see how much revision was done on each story. Students did not do revision while in the revision mode, although some made changes after writing. These stories do provide a important record, since they indicate that some of the children often wished to return to a writing process from an earlier phase. As children discovered the irreversible progression and the nature of the different phases, the EddieEdit and StoryStages users who used the planning feature started to do all of their planning before moving on. They stopped abandoning stories to start their planning process over. But some students skipped the planning phase after initial frustration with it, and since they were unable to go back, never learned to take advantage of it.

Stories Written at the Beginning and End

In this study there was one initial and one final writing session in which the children all used a familiar word processor, ClarisWorks. The hope was that by comparing the two sets of stories written during these sessions, it might be possible to get a rough idea of how much each group improved. Several studies which assess improvement in writing ability employ this method (Daiute 1985b, Keetley 1995). These two studies did span months rather than weeks. In two weeks, improvement may not be discernible. These studies also had the children hand-write stories at the beginning and end. Using a word processor for the initial and final sessions appeared to be a better option than asking children to write with pencil or pen. The students were all computer-literate, and the control group here was doing word processing throughout the study.

To evaluate improvement in writing, students wrote stories at the beginning and end of the study using a familiar word processor, ClarisWorks. This method of gauging writing improvement is problematic if there is only one initial sample and one final sample, as there were in the studies mentioned above and as there was in this study. The quality of writing during a particular session may vary greatly from day to day because students have a different expectation, mix of activities, and ability to concentrate each day. If students are placed in a very formal testing atmosphere in both situations, these variations can be diminished. However, such an evaluation may still do a poor job of measuring creative writing ability, since the quality of a creative writing sample on a given day can depend on many factors extrinsic to ability.

In an attempt to make the initial and final writing situations similar and fairly informal, students were led to the computer lab on both days and simply instructed "write stories." They were additionally told that it was OK not to finish, but that they should concentrate on writing during the whole session. The initial evaluation went fairly well. The first attempt at a final writing evaluation occurred after two weeks of daily story writing using the study's three programs. By this time students were bored with writing and easily distracted. Many of them played with ClarisWorks features that were absent from the study's three programs and did not write stories as instructed. So another session was held to attempt a better assessment. At this point many students were very resistant to writing more stories. One said "this project is over!" and others said they would refuse to write stories. No one of the three groups exhibited this reluctance to write more than the others, and in the end, they did all write stories during this second attempt at a final session.

This final writing session had some additional important differences from the initial session. The teacher provided help to students, typing for at least three students in the E-Write group and one student in the StoryStages group. So four of the students dictated portions of their stories rather than typing them. Students in the EddieEdit group received no typing help from the teacher.

The stories written at the beginning and end of the study were analyzed in two ways. The experimenter examined both quantitative features (number of words, number of characters, and average word length) and the evaluations that two story experts made of each story. Additionally, the presence or absence of 11 story elements, related to the 15 planning prompts, was noted for each story. The number of story elements each child used in the initial and final story were compared.

The two story evaluators were Kevin Brooks and Marina Umaschi Bers. Brooks holds a Master of Arts in communication. He tells stories professionally in the Boston/Cambridge area and has led workshops on storytelling. Bers holds a Master of Education in Media and Technology, and a Master of Arts in Media Arts and Sciences. Both develop interactive story systems and are doctoral candidates at the MIT Media Lab.

There were no criteria provided to the two story experts for evaluating the stories. They were given an instruction sheet that asked them not to consider how much they personally liked or disliked the subject matter of the stories. The sheet indicated that they should formulate their own criteria for ranking and that they would be asked afterwards to describe how they each ranked the stories. Neither of the evaluators had seen the prompts that StoryStages and EddieEdit used when they evaluated the stories.

The two evaluators formulated criteria that differed in details but were similar overall. Bers wrote in an email that she checked to see "if the stories had an introduction, a middle part, and an ending." She also considered the description of characters, "the building of a conflict and its resolution and ... the internal coherence of the story." She wrote that length was not one of her criteria. She added that some stories which were unfinished were nevertheless evaluated by her with a high score because she could see that they met her criteria. Brooks wrote that he used three main criteria:

1) Temporal coherency - do the events in the story have a clear temporal order.
2) Object detail - were there objects and places in the story that were described in more detail than barely necessary. ...
3) Motivation and causal connections - did the actions of the characters in the story make sense.

He added that he also considered whether or not the story was complete and came to "a sensible end," but that he didn't consider this as much as the three main criteria. Brooks wrote that spelling, punctuation, and length were not considerations.

In numerous studies Bereiter and Scardamalia "have always found number of words to correlate substantially with any indicators of quality or maturity applied to writing ... it seems to be a robust empirical generalization about school-age children doing school-type writing tasks" (Bereiter and Scardamalia 1982). This generalization did not apply strongly here, perhaps because the students' story writing was not very similar to a usual "school-type writing task" despite taking place during school. Length in words and Brooks's quality rankings were correlated with a coefficient of .63. Length in words was correlated with Bers's quality rankings only with a coefficient of .42. In contrast, the correlation coefficient of the two sets of quality rankings was .75, even though the evaluators worked independently using different criteria. Since length in words did not appear to be a good measure of writing quality, the change in story length from initial to final writing sample was not analyzed.

Over the long term, increases in average word length may provide a clue to the underlying linguistic development of elementary school writers. Increases in average word length were extremely slight for the two week period (.22 characters for all groups) and unrelated to the quality rankings by Bers and Brooks, with a correlation coefficient below .07. So although quantitative properties of the stories were examined carefully, none were worthy of statistical analysis.

Improvement
(10-point scale)
Average
Improvement

Variance in
Improvement


Rated by Brooks
E-Write 6 3 2 0 -2 3 2.01.0 7.6
StoryStages 1 0 0 1 1 3 1.01.0 1.2
EddieEdit 0 -2 1 1 2 0 0.31.0 1.9

Rated by Bers
E-Write 6 6 0 3 -1 3 2.81.0 8.6
StoryStages -4 1 -3 2 7 1 0. 71.0 15.5
EddieEdit -1 -2 1 0 1 -4 -0.81.0 3.7

FIGURE 8. Based on Brooks and Bers rankings, the E-Write group improved, the StoryStages group stayed about the same or improved slightly, and the EddieEdit group stayed about the same. The improvement differences were not significant (p < .33, p < .15).

The evaluators' rankings, being moderately correlated with a correlation coefficient of .75, did seem to merit further analysis. The two evaluators are highly qualified. These two independent evaluators, having different criteria for story quality, agreed to a great extent, strengthening the claim that their rankings are good indications of an underlying feature, story quality. Brooks and Bers both ranked the E-Write group as having the greatest improvement and the EddieEdit group as having the least, though variance in improvement was very high. The 1-10 raking scale had granularity of 1, leading to .5 uncertainly for each ranking. So the scores for improvement, the difference in rankings, had uncertainty of 1. Considering only the uncertainty, the results show approximately no improvement in the StoryStages and EddieEdit groups and improvement of about 2 or 3 gradations in the E-Write group. However, the variance within the three groups was almost as great as that between groups. A single factor analysis of variance in Bers's improvement rankings, using experimental group as the factor, was not significant (p < .15). For Brooks's rankings there was also no significance (p < .33).

To determine whether one group began to include more of the story elements described in the prompts, the stories written at the beginning and end were examined by the experimenter. For each story, the presence or absence of each of 11 story elements was noted. These elements, based on the 15 prompts in StoryStages and EddieEdit, were:

  1. Main Character
  2. Additional Characters
  3. Setting
  4. Description of Setting - for example, anything that would distinguish the "house" or "woods" mentioned from any other house or woods
  5. Time - anything to hint at when the story took place; "once" does not count, but "a long time ago" does
  6. Beginning - one event is mentioned in the story
  7. Complication or additional event - something else happens; it need not be a problem
  8. Resolution - characters do something to solve the problem or fail to solve it
  9. Ending - the story is wrapped up rather than stopping in mid-sentence
  10. Motivation - something suggests why a character takes action
  11. Reference to Feelings - something mentions or refers to how a character feels or what a character thinks

No story had all eleven elements, although one had 10 of them. One piece of writing did not describe any events or characters or mention a place or time, so it had none of these elements. After all the stories were examined for the presence or absence of these elements, the total number of elements present was figured for each story. For each student, this total for the final story was subtracted from this total for the initial story to yield the change in number of story elements. This was positive for students who included more story elements in their final story than in their initial story.


FIGURE 9. (Figure not available in HTML version) EddieEdit and StoryStages users did not, on average, include more story elements in their final stories than they did in their initial stories. The six students in each group performed differently. There was no discernible trend.

This measure was not meant to be an indicator of story quality, but it should be the case that students who learned about story elements from StoryStages and EddieEdit will be more likely to include those story elements when writing their final stories. Considering the total number of story elements is somewhat problematic because students were not told or encouraged to write complete stories. Students who did not complete their story always did not have an ending and often also lacked a resolution. Additionally, stories with a single character or that do not occur at a particular mentioned time can be very good stories. Still, if StoryStages or EddieEdit were particularly effective in enabling students to learn about story elements and encouraging them to incorporate those elements in their stories, there should be a visible difference in how many elements were included in the initial and final stories.

There was no such difference. On average, the EddieEdit group included .8 elements fewer at the end than at the beginning. StoryStages users included, on average, .3 more elements, and E-Write users .5 more elements. The variance within groups was very high, however, as Figure 8 shows. The EddieEdit group's overall decline in number of elements was due to a single student who said he could not think of anything to write during the final session, and did very little writing. On the other hand, only one student in the whole EddieEdit group included more story elements (4 more) at the end than at the beginning. As a result of the high within-group variance, there was no significance to the slight variation between the three groups of users (p < .73).


figure
figure
FIGURE 10. There were no clear trends showing that software affected high- and low-ability students differently, on average.

There may have been little average improvement becuase the software helped high-ability students but didn't improve, or perhaps even worsened, the writing of low-ability students, or vice versa. Although the small sample would preclude significance, to determine whether any such general trends were present, the mean scores of high-and low-ability students in each experimental group were plotted. Ability level was determined based on the rankings by Bers and Brooks of students' initial stories. Within each group, all of the students improved as ranked by Brooks. Some improved and some worsened, as ranked by Bers, with the most dramatic differece being between high-and low-ability StoryStages users. Given the high variance even within high- and low-ability groups, and lack of a similar trend in Brooks's rankings, it did not seem that the software affected high-and low-ability students differently.

The EddieEdit and StoryStages users, as a group, did probably produce stories that showed slightly less improvement in the final story writing session. The variation in all measures of quality and regarding use of story elements is extremely high within the three groups. It is almost equally likely that the differences in improvement between the EddieEdit, StoryStages, and E-Write groups is due to chance. Even if the EddieEdit and StoryStages groups did write stories showing less improvement, this does not mean they improved less. The teacher provided typing assistance to several E-Write users, provided this assistance to one StoryStages user, and did not provide this assistance to EddieEdit users. The insignificant differences in improvement that were observed probably reflect only this distribution of typing assistance.

The question remains as to why StoryStages and EddieEdit users did not use more of the story elements they read about to improve their stories. A further issue is why EddieEdit users did not show more improvement than students in the control groups, if indeed the conversational character interface does have particular benefits for young writers. In the former case, better design of software could better teach about story elements. In the latter, the subtle assistance that a conversational character may provide is likely too slight to measure over a two-week period.

The software used in this study may have been inadequate to teach students how to employ story elements in their writing. It was not designed primarily to teach about story elements. Instead, story elements were chosen to give Eddie something to talk about and to provide StoryStages with comparison information that it could present. Additional simple features could have more effectively communicated the importance of using story elements in writing. For instance, students were not encouraged to check through their plan after writing their story to make sure they included all the of elements. The use of story elements, although suggested, was not reinforced. It would also help to show students, either through the software or as part of a classroom activity, how the stories they enjoy use these elements effectively. The software should do more to encourage the use of story elements, and other activities should also help students learn about how to use these elements in their writing.

StoryStages and EddieEdit users also encountered difficulty with the unfamiliar features of the programs they used. Some features, particularly those relating to the progress through writing stages, were not well-designed in the original versions of the programs. These results serve as a warning that the educational interventions implemented in software, even if based on effective techniques, may be overwhelmed by other factors, such as problems with usability.

EddieEdit's particular interventions - whether sound or unsound - probably could not have had a measurable effect in two weeks. A longer-term study is needed to determine whether a conversational computer character can improve writing quality. An educational intervention that provides gradual assistance in acclimating a young writer to the particular nature of writing (as distinct from conversation) may simply not have an observable short-term effect on writing quality. Moving from command of oral language to the ability to write at length is a process that takes years, and looking at the changes that occur over two weeks provides little insight into whether a particular intervention is of long-term benefit. Whether a child has learned about particular story elements over a two-week period can reasonably be determined, but it would be hard to similarly assess a conversational character's subtle effects. If improvement in writing quality is to be the main metric for gauging the effectiveness of a program like EddieEdit, it must be kept in mind that the changes could be important but very gradual. Longer-term experience and more thorough initial and final evaluation is important.

Results From The Post-Study Questionnaire

Nineteen post-study questionnaires were distributed to students. An additional student was included in the Blue group after the first day of the study. Her initial absence meant that she provided no initial writing sample and did not fill out a pre-study questionnaire. She did use EddieEdit throughout the session and provided useful usability and scientific data by responding to the final questionnaire.

All questionnaires asked about how the student liked the program used during the study and what could be done to improve it. The questionnaires distributed to EddieEdit and StoryStages users included two questions not on the E-Write questionnaire: "What does 'revision' mean?" and "What are two things you think about when you plan a story?" The EddieEdit questionnaire also included the additional question "Who is Eddie?"


"There weren't enough fonts like on ClarisWorks."
-StoryStages User

"it doesn't show the pages"
-StoryStages User

"no fonts not enough sizes not enough tools"
-EddieEdit User


FIGURE 11. Answers to "What did you really NOT like?" most frequently mentioned the lack of formatting ability.

The complaints on the questionnaires were mostly about the lack of formatting ability, echoing the complains made verbally during the study. One EddieEdit user indicated he did not like planning. Except for this comment, and comments about things extrinsic to the software, all other complaints dealt with the lack of font control, pagination, and formatting features. In addition to those who wrote specific complaints, one StoryStages user who simply wrote that he did not like "everything" indicated that the program could be improved with "More fonts, tools, sizes, and features." So his dislike of the software probably also stemmed from the lack of formatting control.

Half of the students in the StoryStages group indicated that their favorite feature was the checklist box and its prompts. Two responded to "What was your favorite thing about it [StoryStages]?" with "the questions" and one wrote "It gave me lots of ideas." No StoryStages users specifically complained about the planning and revision prompts. One student in the EddieEdit group indicated he did not like planning, but another wrote that planning was her favorite feature because "if your writing a story you will all ready have everything planned out."

Students were asked if they liked the program they used and could answer from 1 ("No! It was lame.") to 5 ("Yes! It was really great!"). Students were also asked if they would use the program at home and could answer from 1 ("No way!") to 5 ("Yes! I'd use it a lot!"). In each of the three groups there was one student who gave the most negative reply to both questions. None of the E-Write users gave strong positive replies to both questions. In the StoryStages group, one student answered with the strongest affirmative to both questions. In the EddieEdit group, three of seven students answered with the strongest affirmative to both questions. So while each program had a strong detractor, EddieEdit had the largest proportion of enthusiasts.

There was certainly no consensus about the programs, and responses were very widely distributed. Even the difference between the like/dislike rankings of the two most differentiated groups, E-Write and EddieEdit, is not significant (p < .35). On average, though, EddieEdit and StoryStages were slightly preferred over E-Write. Students were ambivalent about E-Write. The mean like/dislike of E-Write was 3. StoryStages and EddieEdit were both better liked on average with mean like/dislike ratings of 3.7. This .7 difference on a 1-5 scale is not statistically significant and does not demonstrate the students preferred EddieEdit and StoryStages. Still, it is heartening to note that EddieEdit and StoryStages - which had unfamiliar features, asked children to do something besides the usual writing, and had several usability problems - were not ranked significantly lower than the simple word processor.

Students did not suggest any improvements related to EddieEdit's or StoryStages's planning and revision features. They did indicate that features like spell checking, text formatting control, and the ability to draw pictures should be added.

None of the students answered that it was hard to write with any of the programs. One EddieEdit user changed one of the answers to indicate that it was hard for her to write with any computer. The others all either indicated that the program they used made it easier to write or indicated that it was already easy for them to write stories.

Most students did not mention any story elements when asked about what they think of when they plan. The four StoryStages users who listed things wrote "What it will be about," "the beginning & the end," "killing and mystery," and "Mysteries and scary stories." The four EddieEdit users who listed things indicated "The ti[t]le and the words," "whats the story going to be like," "the place and the time" and "The beginning, middle, and end."


What does "revision" mean?

StoryStages Users

"I have no idea!"

"I dont no"

"I don't know"

"I don't know"

"I don't know"

"Beats me"

EddieEdit Users

"looking back and improving"

"Going over the story"

"I don't know"

"To see again."

"going over the story"

"I do Not know"

"When you change it to make it better"


FIGURE 12. Several EddieEdit were able to define revision at the end of the study, while no StoryStages users could.

Simply asking for the definition of "revision" produced the most striking result. EddieEdit and StoryStages had identical information about revision. StoryStages presented all of this information at once to a writer upon entering the revision phase. EddieEdit offered to begin a conversation about revision upon entering the revision phase. Only if the writer typed back to Eddie would the additional information about revision be displayed in the form of conversation. Yet at the end of the study, none of the six StoryStages users were able to define the term "revision," while five of the seven EddieEdit users offered good definitions.

Why did EddieEdit users learn something about this part of the writing process, while StoryStages users did not? The conversational character interface, although not resulting in improvement in writing quality, did seem to better engage EddieEdit users with the information about writing process that was provided. StoryStages users were presented with the whole list of revision prompts at once, but often just glanced at these and quit the program. EddieEdit users were invited to read this information a bit at a time, in conversational format, by typing replies to Eddie who would then respond with additional revision information.

The users of EddieEdit all did not get a good idea of who Eddie is, however. They noticed that someone called Eddie was addressing them. One of them - who consistently skipped the planning phase and did not converse with Eddie at all during the study - asked "Who is Eddie?" on the second day. Although they all noticed Eddie's presence, four of the seven wrote at the end of the study that they did not know who Eddie was. The others replied, "a character that's supposed to help you," "the story helper," and "the guy who asks your name in EddieEdit before y[ou] begin your story." None of the students mentioned that Eddie is a kid or an editor. It would have improved the interpretation of these answers if a comparison question like "Who is Bugs Bunny?" had been included. This way, it would be easy to see what type of reply a given child would give when asked to describe a known, well-developed character. Still, it seems clear even without a comparison question that EddieEdit users did not get a good impression of Eddie the character.

Users' inability to describe Eddie could be due to inadequate development of Eddie's character. However, it was probably instead a result of EddieEdit's failure to adequately express Eddie's nature. If the things Eddie says vary more and include mention of Eddie's profession and references to Eddie's age, users could get a better impression of Eddie the character. Eddie could use some additional quirks, as those he did display were noticed and commented upon by the children using EddieEdit. Still, these quirks should be carefully selected to allow for more identification with Eddie without distracting from the writing task. Eddie does not need hobbies or sports interests in order to be a well-defined character, but additional references to those aspects of his character that are defined would improve how he is portrayed. A simple graphical representation, even one that is not animated, could also call attention to who Eddie is while still letting users focus on the textual conversation, where they would learn more about him.


NEXT (Post-Study Software Development)