The Workshop on Computational Approaches to Linguistic Creativity has just concluded. I posted about the morning; here are my notes on the afternoon talks.
The first item for the afternoon was my invited talk, “Curveship: An Interactive Fiction System for Interactive Narrating” I worked a while to provide the paper to accompany my talk, trying to introduce IF, explain the basics of narrative variation, and get into at least some of the technical details of my system, including the string-with-slots representation, which I’ve been working on a great deal recently. I also tried to include handy references and pointers. Incidentally, I’ve been meaning to post more about Curveship, and I’d love to hear any questions you have about it at this point, even before I’ve properly introduced the system on this blog.
After my talk, we had more time for poster presentation; one poster was on author and character goals for story generation.
The “From Morphology to Pragmatics to Text” session concluded the day:
Andrew Goldberg presented work by three others on a ML algorithm to assess the creativity of sentences: outliers that are still meaningful. The Winconsin Creative Writing dataset was assembled and used. Using language modeling, word norms, and WordNet, the did partially predicted creativity scores. (Pointed out in the Q&A: All the non-creative sentences were much shorter, so you could just use one feature – length!)
Stefano Vegnaduzzo presented state-of-the-art work on complex adjectives – ones that are made of at least two words separated by a hyphen. These are frequent, as corpus analysis of Wikipedia and the Web shows. Two-word complex adjectives, identified with a part-of-speech tagger, were the focus. Morphological productive processes allow the unintentional, unlimited, regular creation of words; building complex adjectives is one. Checking for hapax legomena gives a measure of productivity within morphological categories: “non-X” was tops in both corpora. Realized and potential productivity were found, and found to be similar across corpora.
Allan Ramsay presented work on how the same words can have different meanings in different contexts. The sentence “I’m sorry I missed your talk” was one fixed text, along with “I’m sorry, Dave, I can’t do that.” It’s not because “sorry” is ambiguous. “Sorry” expresses a relationship between an individual and a state of affairs (which the individual wishes were not the case). There’s no first-order representation. The representation is extremely elaborate, but not too complex. Appropriate background knowledge is essential. One conclusion: A system that takes part in conversations will have to build meaning representations and carry out inference. (In Q&A, I learned that there’s more in the paper about being mistaken, lying, and using irony and sarcasm.)
One way to get at the papers from this workshop is by seeing the title and author information on the CALC-09 site and then using your favorite search engine to locate them online – I assume all, or at least almost all, have been placed online by authors. ACL also offers past workshop proceedings for purchase. Maybe the CALC-09 proceedings will be available that way, too?
The proceedings will be available online soon. You’ll be able to find the CALC articles at http://www.aclweb.org/anthology/
The proceedings of CALC-09 are online in their entirety now, and available for free. The ACL is awesome like that – thanks for the contribution to the accessible scholarly record and for the good example.
Video of most of my talk on Curveship is also online now. Here’s the first part. I find it surprisingly intelligible even without the slides … but I suppose I’m rather biased.