The Workshop on Computational Approaches to Linguistic Creativity (CALC-09) is taking place now at the University of Colorado at Boulder.
In the first session on metaphors and eggcorns, researchers reported on using natural language understanding techniques in innovative ways:
Beata Beigman Klebanov presented on the use of a topic model (LDA, latent Dirichlet allocation) to detect the most obvious or deliberate types of metaphor, which are discussions of one domain the terms of another and which were annotated by people in this experiment. For different k, metaphorical uses were found to be less frequent in the k most topical words in the discourse overall.
Steven Bethard presented work dealing with sentence-level conceptual metaphors from a psycholinguistic standpoint. In earlier work, metaphors were used as stimuli and subjects’ N400 brain waves, associated with anomaly, were recorded. This suggests that it’s important to know about metaphorical frequency, how often words are used in a metaphorical way. A support vector machine classifier was trained on an annotated corpus. LDA, with and without categories, was used to disambiguate metaphors, and to determine whether they are abstract or concrete.
Sravana Reddy presented “Understanding Eggcorns,” about linguistic errors caused by semantic reanalysis: entrée -> ontray, as first named on Language Log in 2003. Eggcorns are more related to folk etymology and puns than malapropism; there has been little study. Can the path of transformation be discerned? Error-detection is an application; also, humor generation. Using the Eggcorn Database and WordNet, a semantic network was built; context information was then added and other augmentations were made. A typology with five categories was developed based on the results.
Session 2 was on generating creative texts:
Ethel Ong presented work on pun generation using a pronouncing dictionary, WordNet, and (more effectively) ConceptNet. A system called TPEG extracted word relaltionships to build templates for pun generation, keeping the syntactical relationship but modeling semantic and phonetic word relationships as described in Kim Binstead’s work. Variables in the template model parts of speech, sound, and compound words.
Yael Netzer presented the Gaiku system for haiku generation. Constructed a haiku corpus, system to build templates. First try generated grammatical output, but didn’t have a good “story.” Story is a sequence of concepts: Butterfly, spring, flower. Word association information, not found in WordNet, was added. An analysis of haiku was done to see if it appears more associate than news text. The final generated haiku were evaluated in a “Turing test.”
Lyric generation in Tamil and syntactic constructions were discussed in the poster session presentations.
Note that paper titles and the full list of author names can be found on the CALC page.