Tuesday, January 10, 2012

A Proposed Examination of Co-Speech Gesture in American Sign Language to English Interpreting


Ten years ago I went on a trip to the promotion and production studio of a major television network.  At the time I was considering putting together an audition reel for commercial voiceover work.  While at the studio I had the opportunity to watch several voiceover actors recording promos for upcoming programming.  What struck me about their performances was their conspicuous use of co-speech gesture (CSG), presumably to help guide their inflection and intonation.  If the actor wanted to hold inflection steady during a line he would move his hand along a plane extending from just below eye level.  If the line called for him to inflect down and drive a sentence he would then thrust this hand down while making a fist.  If the script called for an upward inflection another  actor might send her hand upward like a Roman or Bellamy salute.
A brief review of training literature for voice actors turned up only one published source (Alburger, 2011) that explicitly suggests using gesture to enhance performance while working; however, interviews I conducted with voice actors confirmed that co-speech gesture in voice acting is an important aspect of the performance that is both intentional and largely spontaneous.  Alsburger, as well as the actors interviewed, note that for naturalistic copy, for example scripts that portray normal conversational speech, CSG usually mimics normal conversational CSG.  The kind of CSG I witnessed at the television studio is used for “sports promos and hard hitting ‘news-y’ stuff” (Rodd, personal communication).  It [CSG] does seem to help push the copy along and honor the punctuation.” (Rodd, personal communication, 2011).  Another actor related a comment that also touches on similarities between voice actors and interpreters, “Sometimes when I really get into a read, especially a longer narrative read, my hands almost work like a conductor leading an orchestra. They help me keep the timing, and when I need to go big, my movements are bigger. When I need to go small my movements are smaller.” (Hutchinson, personal communication, 2011).

These observations were consigned to memory until recently when on an interpreting assignment, I became conscious of my own use of co-speech gesture while interpreting by way of a consumer’s comment.  The consumer noted that I gestured more when interpreting a particularly dense concept from American Sign Language (ASL) into English.  This comment caused me to recall the voiceover actors I had seen in the past and to wonder if interpreters and actors were using similar, largely unconscious, techniques in order to aid their inflection and intonation.  The link between gesture, cohesion, prosody, and emotive production are discussed by Eidsvik (2006) in examining mirror neurons and their link to language production.  Eidsvik notes that many cognitive language processes, including gesture, are knitted together in the compact part of the brain known as Broca’s Area, indicating that the link between language production and gesture is inherent.  Eidsvik goes on to say that mirror neurons respond to viewing a gesture to trigger an empathic response that mimics that of the person making the gesture.  It then stands to reason that when trying to convey the affect found in a script one would feel compelled to use gestures that are, in memory, associated with that message.  Like voice over work, interpreting involves a kind of performance.  In both cases the person speaking is conveying someone else’s words and is tasked with doing so while conveying tone and inflection consistent with the original intent.  I wondered if interpreters’ use of CSG included a similar cohesive function as it appeared to serve for actors. Like voice actors, interpreters are also not specifically trained in the use of specific co-speech gestures to elicit specific results.  Instead they are left to do or use what may come naturally.  In subsequent observation of interpreters at work I believe that the use of co-speech gesture may be consciously produced and serve three functions, 1) to help with “punctuation” and timing in the source language output, 2) to facilitate the retrieval of lexical items, and  3) to elicit feedback from communication participants.
This research proposal aims to examine the production of co-speech gesture in ASL-English categories and to examine their function in the rendering of a target message.  In this study, I will follow work done by Casey and Emmorey (2009) in examining whether the use of co-speech gesture by interpreters working from American Sign Language to English pattern like gestures that emerged in the spontaneous English language production by bimodal bilinguals (people who are fluent in both a manual and a spoken language) as they converse in a non-interpreting context. If the CSG patterns found in English target language production of bilingual bimodal interpreters differ from those found by Casey and Emmorey, it is possible that there is an effect of fluency, or of comfort level, on the inhibition of gesture. There may also be evidence that the highly demanding task of interpretation results in different patterns of co-speech gesture than what emerges during spontaneous discourse.

Literature Review

            To date no research has been published regarding the use of co-speech gesture during interpretation; however, there is ample research on the function of co-speech gesture in cognition (Cassel, 1998; Wesp et al, 2001; Feyereisen, 2006; Casey and Emmorey, 2009).  Cassel (1998) lays a foundation for the study of CSG saying,

A growing body of evidence shows that people unwittingly produce gestures along with speech in many different communicative situations.  These gestures have been shown to elaborate upon and enhance the content of accompanying speech… Gestures have also been shown to identify underlying reasoning processes that the speaker did not or could not articulate (Cassel, 1998: 191). 

It is this identification of unarticulated cognitive processes that is of interest here, specifically I would like to determine whether the use of CSG by interpreters is quantitatively different than use by general language users in terms of either amount or function.  Since interpreting is a cognitively different task than spontaneous language production (Christoffles & de Groot, 2005; Grosjean, 2011), we may expect to see differences in the use of CSG by interpreters.  Of further interest is whether the amount of CSG produced during interpretation is correlated to the cohesiveness of the target language output. That CSG could impact quality, or give insight into the cognitive act of interpreting, is supported by Casey et al (forthcoming) who note that, “gesture creation…affects both language production and comprehension” (3). If the use of CSG impacts speech production and comprehension, it is worthy of study for what it may reveal non-linguistic aspects of successful interpretation.  The understanding of the role of gesture during interpretation could inform both the practice and training of interpreters.  Conversely, inhibition of CSG may lead to disfluencies and/or interfere with lexical search (Wesp et al, 2006).  If this is true, research on CSG use by interpreters may indicate that student and novice interpreters could benefit from information on the role of CSG while interpreting. 
            Co-speech gesture has also been shown to occur during specific cognitive tasks, including lexical search, recall of concepts, supporting the rhythm of language production, engagement, and as mentioned above, comprehension and production of language (Wesp et al, 2001).  All of these individual cognitive tasks are involved in the larger task of interpretation.  Further, Nagpal et al (2011) note that research shows CSG is primarily used to help speakers access language to aid production and are produced even when there is no audience able to see the gestures.  They further posit that people will gesture more when producing their L2 since it is harder to produce the language in which the speaker is less proficient.  This concept of gesture as related to effort of formulation and production also implies that interpreter use of CSG could relate to the richness of their interpretations.
Wesp et al (2001) note that, “spatial imagery serves a short term memory function during lexical search…gesture keeps the pre-lexical concept in memory while lexical search is happening” (p. 591).  Streeck (2009) describes how CSG is used to engage interlocutors via two phases of lexical search gesture one in which the searching party discourages participation of other parties, often accompanied by a shift in eye gaze, and a second phase in which “collaboration is sought.” (Streeck, 2009: 108).  Interpreters rarely shift their gaze from the signer during ASL-English interpreting and so one might predict that the two phases described by Streeck could involve smaller and then larger gestures with the seeking of collaboration being directed at either the signer or the team interpreter.  Lexical search behavior may also take the form of deictics, which can reference the immediate physical space, an unseen real space, or a conceptual space (Streeck, 2009).  Feyereisen (2006) describes two types of co-speech gestures.  The first type, representational or iconic gestures, represent "visual or dynamic features of the referent" and evoke mental activation of images, both visual and motor. Representational gestures depict an image when produced. The second type, nonrepresentational gestures, do not depict a particular referent; rather, they are produced with a single non-representational form regardless of the content of the message. Non-representational gestures are sometimes called beats because they are tied to the rhythm and stress that occurs during speech production.  These nonrepresentational beat gestures are represented, in an extreme form, by the actions of the voice actors discussed earlier in this paper.  These are also the gestures I predict will serve a cohesive function for interpreters. 
Casey & Emmorey (2009) hypothesize that representational gesture use by balanced bimodal bilinguals is influenced by activation of the right parietal cortex, which may be involved in processing spatial information.  The authors support this by citing research shows that in bimodal bilinguals this area of the brain is activated when producing spatial prepositions in English whereas monolingual English speakers did not activate the right parietal cortex when using the same prepositions.  The authors also hypothesize that bimodal bilinguals may produce more deictic gestures than monolinguals when discussing route, mapping, or other spatial information. 
Casey et al (forthcoming) further note that use of CSG helps with recall and spatial cognition.  The authors note research that found adults use of CSG while describing events helps with their recall of those events in both the short and long terms.  The authors also posit that gesture rates related to learning a manual language may improve cognitive abilities by adding a manual component to the encoding of events in memory. These hypotheses suggest that interpreters who are receiving manual language input and producing spoken language output may also use representational and deictic gestures as cohesive aides when discussing spatial information.  If this is the case then research on co-speech gesture as it relates to working memory during simultaneous interpretation could be another viable research topic that could build on the research proposed here.
Studies describe a difference between representational and nonrepresentational in terms of the concepts these gestures help language users produce.  Representational gestures are more often used to recall and describe spatial concepts as in Wesp et al’s (2001) examination of participants describing a painting to another person.  In this study the authors suggest that CSG increases when the speaker is attempting to describe something visual but that “…in search for synonyms or lexical search for nonspatial concepts, the need for gesturing is reduced” (Wesp et al, 2001: 593). Feyereisen (2006) reinforces this concept, saying that since they lack content nonrepresentational gestures likely aid memory through emphasis of the sentence they co-occur with by adding a visual and motor component to the spoken language.  Feyereisen’s (2006) study examined how a speaker’s use of CSG impacted sentence recall by an observer. Feyereisen found that sentence recall was enhanced most, compared to recall without gesture, when presented with representational gestures, but was also improved when presented with nonrepresentational gesture. Feyereisen highlights an especially germane concept related to ASL-English interpreting, 
 It is now well established that sentences that refer to actions like shaking a bottle or stirring a cup of coffee  are recalled and recognised in higher proportion if the subjects perform the action during verbal presentation (subject-performed tasks or SPTs), by comparison with merely reading or listening to the sentences (so-called verbal tasks or VTs). Sentences are also better recalled if subjects only see the experimenter performing the action (experimenter-performed tasks or EPTs). (198)

This relates to cross modality interpreting in that the way ASL presents many of the types of concepts Feyereisen describes is through the use of iconic signs, which may be considered representational gestures or “classifier constructions” (Casey and Emmorey, 2009).  It is possible that comprehension of an ASL source message is easier due to the representational gesture and the semantic/lexical information coexisting as one item.
Two of the studies discussed above, Casey and Emmorey (2009) and Casey et al (forthcoming) comprise the primary foundation of the study proposed below.  Casey and Emmorey (2009) found that the co-speech gesture rates among native bimodal bilinguals, people who grew up with both a manual language (ASL in this case) and a spoken language (English) were statistically similar to the CSG rates of English monolinguals.  They found that the bimodal bilinguals used more iconic gestures, more character viewpoint gestures, and a greater variety of handshapes.  The two groups were equal in their use of deictics and two-handed gestures where each hand represented an different entity (i.e. two people juxtaposed in space). The monolingual control group used more beat gestures.
            Casey and Emmorey (2009) posit that the equal gesture rates suggest that ASL signs occur in the place of co-speech gestures, rather than in addition to them.  Further, they suggest that the use of ASL signs when speaking to a monolingual non-user of ASL is not part of the normal CSG system but rather an inability to suppress ASL, the participant’s L1, while speaking English.  The authors note,

…our findings are consistent with Emmorey et al.’s hypothesis that the locus of
lexical selection for all bilinguals is relatively late in language production. If the architecture of the bilingual language production system required that a single lexical representation be selected at the preverbal message level or at the lemma level, we would expect no ASL signs to be produced when bilinguals talk with non-signing English speakers” (Casey & Emmorey, 2009). 

If this is true for all bilinguals second language learners of ASL could exhibit similar behaviors as the experimental group analyzed by Casey and Emmorey.
In the discussion of their findings Casey and Emmorey (2009) suggest that “late acquisition of a second language may affect co-speech gesture in ways that differ from simultaneous acquisition of two native languages” (304). Casey et al (forthcoming) also report that the co-speech gesture rate increases for second language learners of ASL after one year of ASL instruction.  Along with this increase in overall use of CSG the forthcoming paper specifies that these new ASL users increased their use of representational gestures and used at least one ASL sign during their language production.  The authors suggest that the reason for these findings involve cognitive processes associated with learning a manual L2 and an inability to suppress their L2.  Though they do not mention the possibility of these findings being due simply to exuberance exhibited by L2 ASL users after one year of instruction they do acknowledge the following:
Another possibility is that sign production while speaking does not reflect a failure to suppress ASL, but rather an increase in the repertoire of conventional gestures. Under this hypothesis, ASL learners have not begun to acquire a new lexicon (as have the Romance language learners), but instead have learned new emblematic gestures (akin to “quiet” (finger to lips), “stop” (palm outstretched), “good luck” (fingers crossed), or “thumbs up”) (Casey et al, forthcoming: 19).

Research comparing early L2 ASL learners to monolingual English speaking controls is in its early stages. I believe that comparing all of the groups discussed so far, native bimodal bilinguals, early L2 learners of ASL, monolingual English speakers, as well as bimodal bilinguals who acquired L2 ASL as adults would be descriptive of CSG use by bimodal bilinguals.  Examining the CSG rates and types used by the bilingual adult L2 acquisition group would make a good lead in to the study I am proposing and would fill the gap between the groups discussed when Casey et al say, “It appears that both life-long and short-term exposure to ASL increases the use of meaningful, representational gestures when speaking.” (16)
Finally, Casey et al (forthcoming) close by noting that learning a manual language may stimulate a stronger link between language and gesture.  This, along with the research on the functions and cognitive effects of CSG on recall, cohesion, and affect (tone and inflection) suggest that a study of CSG use by interpreters could provide insight into how the cognitive process of interpreting manifests in CSG.

Data from samples of two groups (students and experienced interpreters) will be examined under two conditions.  The first participant group is interpreting students, specifically senior undergraduates and second year masters students in interpreting programs. Interpreting students are unique in the field in terms of their ability to film themselves during live interpreting scenarios.  Interpreting programs offer students opportunities for interpreting practice with live participants, which presents a measure of ecological validity to the data.  Metzger (1999) suggests that mock interpreting scenarios with live participants provide viable data for the study of interpreters.  Another advantage for using students lies in their experience with being recorded.  Students in many interpreting programs regularly record themselves, which may help overcome issues from the observer’s paradox.  Finally, students present a sample for two possible comparisons.  The first is the comparison of students’ use of CSG with that of experienced interpreters.  If the CSG rates of interpreters compares favorably with the rates found in spontaneous language production the second is a comparison of advanced (near graduation) interpreting students’ use of CSG with the results of Casey et al (forthcoming) finding that one year of ASL instruction increased the rate of CSG in students.
The second group of participants will consist of experienced interpreters.  All participants in this sample would be nationally certified and have at least five years of professional experience.  These characteristics are consistent with the generally accepted notion of “experienced” in the North American ASL-English interpreting community.  This sample can present some challenges in terms of data collection.  For one, the confidential nature of interpreting often precludes recording of actual job situations.  It is possible that permission to record interpreters at some public events may be obtained.  Additionally, there may be reluctance on the part of the interpreters to be publically videotaped.
Both the student and experienced groups will be comprised of second language learners of ASL.  The reason for selecting this group is to remove the possible confounding factor of L1 suppression.  Casey and Emmorey (2009) looking at the use of CSG by bimodal bilinguals for whom ASL is their first language posit that, ASL signs produced by bimodal bilinguals during spoken language production are separate from the cognitive process that produces co-speech gesture and are instead intrusions caused by a failure to suppress the speaker’s manual language.  Using only interpreters for whom ASL is their L2 would avoid the L1 language intrusion factor though it is possible that with less experience suppressing the manual modality language L2 ASL users may show more lexical items in their CSG than L1 ASL users do during spontaneous language production.

            In the proposed study, I will examine interpreters’ use of gestures under two types of interpreting conditions ‑ authentic interpreting situations and interpretations produced using a recorded source text in a lab.  While looking at various types of texts with interpreters working both in teams and alone would be interesting and provide the most realistic data, for this study I propose looking at interpreters working in teams of two, interpreting monologues by deaf ASL users.  This condition best replicates the working conditions under which interpreters normally work.  As such it would be the easiest for which to obtain real world recorded data, which could also be replicated under lab conditions.  Recordings of interpreters working with live source texts would provide ecological validity to the data as well as an opportunity to examine the use of CSG in interpreters in authentic settings.  Use of a recorded source text would allow for control of the source message and a reduction in variables as a means of standardizing test conditions in a way that is more scientifically valid.  Also, use of a recorded source text in lab conditions may allow for easier data collection due to the difficulties of scheduling and obtaining the consent of multiple real world parties as required in the recording of real world data.

            Gesture coding will be based on the coding scheme devised by Casey and Emmorey (2009).  Casey and Emmorey coded for “…ASL signs versus non-sign gestures; iconic, deictic, and beat types; character and observer viewpoints; and handshape form.” (2009: 296).  Casey and Emmorey (2009) define ASL signs as “identifiable lexical signs (e.g., CAT or BIRD) or classifier constructions that a non-signer would be unlikely to produce. For example, a bent V handshape is used in ASL as a classifier for animals.” (297).  Iconic gestures are any gestures that look like what they represent, i.e. mimicking driving a car or tracing the outline of an object.  Deictic gestures point to a referent that is either present in the physical space where the language production is taking place or to a conceptual referent for example, pointing down when referring to China.  Character/observer viewpoint gestures are “produced from the perspective of a character, i.e., produced as if the gesturer were the character. For example, moving two fists outward to describe Sylvester swinging on a rope.” (Casey and Emmorey, 2009: 297).  Handshape form examines whether the gesture is made with a handshape that is typically associated with ASL (i.e. the “ILY” handshape) or with a gesture that is common among North American English speakers, (i.e. the “rock on” gesture).  As Casey and Emmorey (2009) note, these categories are not mutually exclusive and one gesture may fall into more than one category.
            Beat gestures are commonly found in spoken language at points of emphasis, to provide or illustrate some rhythmic feature of the discourse, and in word search behaviors (Casey and Emmorey, 2009; Wesp, 2001; Feyereisen, 2006; Streeck, 2009).  A preliminary analysis I conducted on one interpreting sample showed the use of deictic, character viewpoint, ASL signs and fingerspelling and beat gestures.  The most common gesture types were beat and ASL signs and fingerspelling.  The ASL signs and fingerspelling seemed to be exclusively used between the interpreting team as a means of getting clarification or confirmation of the signed source message.  I believe the beat gestures could be a candidate for further analysis.  The interpreting sample I examined suggests simple beat gestures, emphatic beat gestures, lexical search gestures, and cohesive beat gestures.  Simple beat gestures are the kind of simple rhythm gestures associated with everyday spoken language production.  Emphatic beat gestures seem to serve a similar function as the gestures used by voice over actors.  Lexical search gestures accompany parts of the interpretation where the interpreter is clearly searching for a word or phrase to match a SL concept that they comprehend but are struggling to reformulate.  Cohesive beat gestures appear to help guide the interpreter through transitions and relationships between propositions. An example of a cohesive beat gesture would be an interpreter flipping their palm orientation while moving their hand from one side of neutral space to the other while interpreting a SL message about contrasting concepts.  Casey and Emmorey (2009), and Casey et al (forthcoming) decline to speculate as to the impetus for the CSG found in the bimodal bilinguals they studied even when their data might suggest that a beat gesture may be associated with something like a lexical search behavior as in this example of an iconic-beat gesture, “one participant held out his hand imitating Sylvester holding out a tin cup and bounced hishand with the accompanying speech ‘and takes his um tin cup’.” (Casey & Emmorey, 2009: 297).  Such an analysis may prove to be outside the scope of the proposed project as well.  However, I believe it presents fertile ground for research and presents the next step in this branch of study.

Casey, S. & Emmorey, K. (2009) 'Co-speech gesture in bimodal bilinguals'.  Language and Cognitive Processes, 24:2, 290 - 312

Casey, S., Emmorey, K., & Larrabee, H. (Forthcoming). The effects of learning American Sign Language on oo-speech gesture.

Cassel, J. (1998) “A FRAMEWORK FOR GESTURE GENERATION AND INTERPRETATION” in Computer Vision for Human-Machine Interaction, Cambridge University Press

Eidsvik, C. (2006). Voice and gesture within the context of mirror neuron research.  The Journal of Moving Image Studies, 5, pages here.

Feyereisen, P. (2006) “Further investigation on the mnemonic effect of
gestures: Their meaning matters,” in EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY 2006, 18 (2), 185±205

Grosjean, F. (2011), “Those Incredible Interpreters,” in Psychology Today, September 14, 2011

Nagpal, J., Nicoladis, E., & Marentette, P. (2011). Predicting individual differences in L2
speakers' gestures [Electronic version]. International Journal of Bilingualism.

Streeck, J. (2009) “Gesturecraft: The Manu-facture of Meaning”
John Benjamins Publishing Company

Wesp, R.; Hesse, J.; Keutmann, D.; Wheaton, K. (2001) “Gestures Maintain Spatial Imagery,” in The American Journal of Psychology; Winter 2001; 114, 4; Research Library pg. 591