The Development of Gaze Following in Bonobos: The Social ...



The Development of Gaze Following in Bonobos: The Social Construction of Space

Chris Johnson & Kate Branson * Dec 2006

Goals of Study:

To trace the ontogeny of gaze following in bonobo infants, from 0 to 18 months, in videotaped dyadic interactions with other bonobos. Proficiency at “gaze following” involves responding to an observed change in the head orientation of another by next looking directly at the target of that animal’s attention. To track the emergence of this proficiency, we will micro-analyze both sequential and simultaneous examples of shared attention to available targets throughout the 18 months. In all samples, attention will be tracked in 4 modalities - body, mouth, hand, and head/eyes - for both animals. We will also track the changing views the infant has of the other individual orienting, and operating, along various trajectories through the shared space. Bonobo infants are frequent mimics, apparently highly motivated to reproduce the attentional access they observe in others, giving us a rich source of data on shared attention. We aim to document the ways in which socially-distributed patterns in the configuration of active attentional modalities constrain the opportunities available to the infant for learning about shared attention. We also hope to learn about the emergence of the infant’s strategic use of attention-getting gestures. By studying the practices that eventually enable the infant to guide its search behavior based only on observed changes in the head orientation of others, we hope to model how the space that these animals share is organized, changes, and is describable through such communal practices.

Methods:

Our plan is to take two basic types of data. One is to collect a great many, very short samples of attention-sharing events, more-or-less randomly across the 18 month period. The other is to select certain “representative” samples, involving longer and more complex interactions, and generate multi-modal, micro-analytic descriptions of how they unfold. In both cases, we will be analyzing video of dyadic interactions in which both parties attend to the same Target, and the infant has at least one modality’s access to that fact. These “Same Target” events may be simultaneous, or sequential (with the infant as either model or mimic), and the infant’s access to the other’s attention may occur at its outset or after-the-fact. For a given “event”, we aim to record all attentional moves or states pertinent to the (ultimately) shared Target.

Four Attentional Modalities – Body, Mouth, Hand, and Head/Eyes – will be tracked through every interaction, for both animals, to determine which modalities come into play, in what order and configuration, as the infant matures. On our datasheet (currently in Excel – see attached), each Modality is a column in which we enter any changes in either Target of attention, or type of attentional Access (head turn, reach, approach, etc.).

TARGETS can be other animals (divided into parts like A’s Face [front], A’s Head [back], A’s Body, A’s Mouth, A’s Hand), inanimate features of the environment (Food, detachable Object, and Substrate), or the same point Offscreen.

ACCESS is divided into Moves and States. Moves include: To X, From X, and Shift On X (where X is a specified Target)[1]. All such Moves are presumed salient (attention-getting) to any animal with Access to them. States include At X (attention on X), Open To X (positioned within reach or facing toward the target, without actual physical or gaze contact) and Closed to X (positioned without access to the target). States are not presumed to be salient (except, perhaps, in the case of Eye Contact), although we expect to show that they do, in time, become so.

Currently, any change in Access or Target generates a new line in the database, which is marked with its location on the video (in minutes:seconds:frames ). We take an additional column of information on the visual Modality in the infant in which its View of the other’s Head and Body orientations are recorded. We also rate the level of Arousal (low, medium, high) of the interaction, in part presuming that high arousal events might be particularly memorable. We also track the participants’ Inter-Animal Distance which plays a role, for instance, in determining whether a Body, Hand or Mouth has access to a social Target.

While we will be using essentially the same datasheet on both the many short and the few long samples, our plans for analysis of these two data sets differ. The short samples will be located randomly in the archive (perhaps the first such event in each fixed-time interval of “good” tape??). When an instance of shared focus of attention, in any modalities, is identified, the moves that proceed it will be tracked back to the point where one or the other, or both, animals first turned their attention toward that Target. This span of tape constitutes an “event”. Such events can be analyzed in terms of the proportions of time the different modalities are actively engaged, their varying tendencies to be co-activated, any patterns of propagation of target information across modalities, and across individuals, over the course of these events, and how all these change as the infant matures. The infant’s view of the other’s orientation will also be tracked to ascertain what types of orientation information is available to her at different phases of her development, how her access to such information changes, and how effectively (??) she comes to make use of that information.

At this point, we presume that the 18 month period can be divided into (at least) three phases, based on the type of constraint the mom exercises on the infant’s mobility: “On Mom”, “Hand-Held”, and “Free”. We predict that certain configurations of access will typify each of these phases and constrain how the next phase develops. We also predict that we will see a certain level of proficiency develop within each of these phases, where the infant shifts from being relatively passive & opportunistic, to being more proactive & efficient in gaining the relevant access (see Theory Challenges, below).

By taking essentially the same data as above, expanded over time, on a few “Special” examples, we also hope to produce a more ethnographic account of the nature of these interactions at various phases of development. At the moment, what constitutes a “Special” event is one that we find particularly interesting or telling, such as examples that are especially effective or ineffective, that seem particularly strategic, or that involve elaborate, cascading sequences of coordinated attention (See Challenges, below). By positioning such events in the historical timeline, we hope to identify elements, derived from the above database, that typify phases, or expertise within a phase, and show how they come to be integrated into effective collaborations. For example, we might be able to look at “gaze alternation” (repeatedly turning head between other and target), or “touch other with hand”, when the infant is first exposed to them performed by others, when they first emerge in the infant’s own repertoire, and when they are finally used by the infant as attention-getting gestures. We might also directly compare elongated segments, from different periods in time, to obtain some relative measures of the complexity or efficiency of those periods. In fact, we expect that many interesting discoveries that we have not yet anticipated await us in these longer, situated samples. Plus, analyzing them is particularly satisfying. In the short segments, we have acquiesced to trading off interesting context and background information for the large numbers we need to see trends across populations of interactions. But, in the longer segments, the rich context in each lets us fill out our story schemas in a most gratifying way. Guess that’s why we call them “Special”! :-)

Methodological Challenges:

Sampling Sampling presents a variety of problems. Our biggest stumbling block at the moment concerns the “short” samples. While we feel fairly strongly that analyzing a large number of relatively brief events over a long period of time is important to our being able to establish how patterns in modality activation and the propagation of target information change with development, we also recognize that there are limits to the number of minutes in our lives! Following such events to “completion” (as we intuit that – not clear yet how we do, although we tend to agree…) can take in the vicinity of a minute, and, frame by frame, over hundreds of samples, that is just too long!

When the events involve a mother and infant just sitting together looking around (i.e. moving Heads), we can get interesting and relevant samples that only last a couple of seconds. But, many examples of “attend same Target” involve changes not just in Head, but in Body (such as locomoting toward a Target) and in Hand (such as touching the Target). These are the ones that take quite a bit longer – and we’ve got lots of them on tape! And its not just the overall duration that’s at issue here. There’s also the fact that Head changes tend to happen more often and at a faster rate, so there can be several of them (including away from the Target) in the time it takes the Body and Hand to finally make contact with the Target. How pertinent are these Head changes within one Target event? Would the information generated by trying to answer that question be worth the effort? Is there a feasible – and warranted - way to take the data we need without it??!

Whenever we start going in such circles (which is all too frequently!) we try to get back to basics. Our intuition about when an event is “complete” seems to be based on two things: 1) a (fairly) consistently shared Target and 2) that both the distal (visual) and proximal (somatosensory) Modalities often come into play. The latter don’t, of course, always come into play – as in the “Head only” events mentioned earlier. But they often do, and since understanding how the Modalities relate to one another is an important goal for us, clearly we need to record something of this. Plus, gaining physical contact with a Target (food, object, social other) is likely an important part of the reinforcement that helps fuel the learning that is taking place here. So, if we take the analyses we proposed above for this “many short samples” database seriously, we will need data on:

1) Tendencies for modalities to be activated and co-activated

2) Patterns of propagation of a target across participants’ modalities

3) How these change with age

We need to work out the best way to take only the data we absolutely need for these analyses and no more, so we can feasibly accomplish them. Could we somehow track only the relative timing of the first appearance of each co-targeted modality for each animal??? Or might vacillations within or across modalities be a useful measure of hesitancy or inefficiency??? Clearly there are several decisions that still need to be made…

Other sampling issues also trouble us. How many is “a lot”? Should we sample at some kind of fixed intervals through the 18 months? Given how much the length of the “short samples” may vary, how close in time can two samples be and still be treated statistically as “independent”? (See further comments on this, below.) Above we said that an event “starts” with the first accessed Move directed to the ultimately shared Target – but in practice, this is often sticky. For example, what about events that occur as a part of a long session of co-attention to a plaything, or during a prolonged foraging bout? We feel a little more confident about identifying the end point of an event as being when as many Modalities as are going to come into play for a given Target all finally do come into play (as when the infant goes and picks up and mouths a stick she saw someone else holding and mouthing). But, as addressed above, this too has its problems. And we haven’t even begun to talk about the longer segments, whose sampling criterion at the moment consists of “segments we find interesting”, and whose more “qualitative” analysis (hate that term – see more below) is still somewhat vague to us… Good thing we’re throwing this workshop!

Help with Statistics and/or Computational Modeling – Any volunteers?

Theoretical Challenges:

Apprenticeship & Dynamical Systems In attempting to study the development of the social coordination of attention, we have found that an “apprenticeship” model can be useful. This is probably because the patterns that are sometimes described as emerging during apprenticeships, in the human literature, provide nice, assimilable examples of dynamical systems at work. For example, the basic idea that mastery in a domain involves first skills that are fragmented and uncoordinated, changing to ones that are multifaceted and well integrated, and finally to the strategic deployment of isolated skill components is not unlike a description of emergent changes in any dynamical system. We are interested in getting more specific about this relationship, and even perhaps fashioning more of our analyses in terms of dynamical systems concepts (attractor states, bifurcations, etc.)

Goals Coordinating attention toward a single Target seems, at least eventually, to involve goals. If there is a goal, how do you measure “success”? “proficiency”? Is a goal the same as an efficiently attained outcome? Or reliably attained? If imitation is a goal of these animals, perhaps duplicating an observed behavior in itself = success. Do the goals change over time? Why is it that gaze following seems to be goal oriented, but that most of the shared attention experience the infant has does not – what’s different? Is the notion of a goal even necessary, or can we somehow find behavioral criteria that enable us to evidence coordination without presuming a goal?

Gesture What are the behavioral, or better yet, contextual criteria that we can use to distinguish when a gesture serves as a communicative act, strategically altering the attention of others, versus when the same act serves to provide the actor with perceptual access? Even if acts are always both, there are no doubt variations of each that can facilitate one of these functions, perhaps at the expense of the other. Interestingly, in an earlier version of our current database, we had, for each animal, not only the 4 columns for the Modalities of their attentional Access, but 4 additional columns for the Modalities of their Acts. It didn’t take much piloting to discover that the latter were redundant, as long as we added “moves” that changed attentional access to the former. The synonymity of act and access is not trivial, and will be important in developing embodied models of both imitation and inter-subjectivity.

Representation Might behavior be predictably different if 1) the infant comes to represent a “probable activity corridor” extending in front of another animal, along which the infant will search for the target vs. 2) the infant does a mental transformation of its own position & orientation to match that of the other and then envisions what it would see from that point? Might the second – the adoption of an “empathetic perspective” - follow from the first? And how is all this related to the “cosmological perspective” in which no participant’s point of view is privileged?

Technical Challenges:

Spreadsheet vs. Relational Database We have actually been pretty satisfied, so far, with using Excel in conjunction with iMovie, but people have proposed to us that using a relational database – like Access, Observer, Mangold, or even Matlab might be better for us. We are open to hearing more, but will need some convincing that the advantages are sufficient to make us willing to overcome the learning curve associated with a new program. At the moment I (Chris) feel fairly confident that I can get Excel to do the analyses I have in mind – although one thing I’ve yet to learn in Excel is how to send the output of a calculation done here, to some cell over there…?

Posting & Publishing Video Another area in which we need educating concerns posting video to a website (using Flash?). We’d also like to discuss how (and whether) to control copying or other types of access to such “public” video (see Ethics and Politics, below). We will also need these skills when we come to publishing this study (and others!), since making video data increasingly available is an important to the development of videographic research.

Sociological Challenges:

The ethics, and politics, of sharing information Our first instinct is to share video. The world needs to see what we see. But we recognize that there are legitimate concerns about staking claims to “raw” footage - proprietary rights , competition for first citation, risking representations of your own work being mis-represented, etc. As uncertain as we are about all this, we are at least interested in learning what we need to know about providing and controlling access. “Free sampling”, of course, is currently presenting intriguing conundrums in the music industry, in copywriting policy, in the (precious) anarchy of the e-universe, etc. And (no surprise) at the same time, across the homeland, access control is a finely-honed security tool. Just how we want to participate in all this is still not clear to us (we’d frankly just rather not!), but interfacing our work with the world is important, so we need to consider such things…

Convincing Reviewers The down-side of doing cutting edge (read: non-mainstream ;-) work is trying to get anybody to even pay attention, let alone have their minds changed. Getting published can be a bitch, and we’ve only made it harder on ourselves by doing things differently than most journal contributors do. Plus, reviewers pride themselves on their beliefs, and we are largely asking them to set all that aside. It is within this ecology that we’ve co-created and are, for the time being, stuck with, that we are faced with the harrowing task of mounting an argument for the legitimacy of our methods – methods that they already think they understand and consider sub-scientific. We damn well better be prepared!

The better we understand what it is we’re actually doing, the better off we’ll be. Thus this workshop.

In addition, for our own sakes (and, we feel, for the sake of us all , given that, politically speaking, we fear that the need for systems thinking on this planet is dire! ) we consider it part of our job to whittle away at the way people think about how science is done. Traditional statistics, for example, arose as a mathematical adaptation to studying populations, and are not well geared to studying particular individuals or development. And, of course, once you start talking about “systems” and “emergent properties” and such, you will leave most practicing scientists – at least in the social sciences – behind. Plus, if you want to include detailed, play-by-play descriptions in your Results section, journals that typically publish “hard” science will quickly refer you to “softer” publications that “accept that kind of thing”… Thus, these are practical concerns as well as philosophical ones.

Consider, for example, the problem of repeated measures on the same subject and the statistical “independence” of events. This is one that comes up for us in our “many small samples” database. Having an “N” of (by some counts) only four subjects already puts us in “anecdote” territory in some journals. But, presuming we can get past that hurdle, the hundreds of samples we will generate on each infant will still be subject to exclusion from some (legitimizing) tests, by some reviewers, due to their “lack of independence”. On the one hand, we want to cry “Yes, they are NOT independent – and that’s the point!” But, upon more sober reflection, we would also want to argue that the intent of that restriction was to assure that the outcome of one sample does not determine the next. Given the complexity of the system under study, we believe that, as long as multiple moves, by both parties, towards other targets, have intervened, we should be allowed to treat the next “shared target” event as independent from the last. Deciding what constitutes “multiple moves” is not trivial, of course, and warranting our sampling criteria is a critical goal for us, in part because we want to be able to use such tests. In some respects, we are interested in making claims about a population here – but that population is not primates, or bonobos, but a set of interactions over an apprentice’s infancy. The kinds of gross trends - like changing proportions of active modalities, or an infant’s likelihood of initiating an interaction - that these tests are good at substantiating, make up an important part of the story that we are trying to tell.

They are also insufficient to tell that story. This is where the rant about the Quantitative / Qualitative Fallacy kicks in. For one thing, given the amount of work involved in generating a multi-dimensional, multi-party, micro-ethological account of a long, complex interaction, we resent the idea that such effort shouldn’t “count” as quantitative! But in a scientific tradition that still hasn’t figured out how to address individual differences, that treats development as the attainment of landmark states, and that values holding context constant, such effort can only ever produce a “qualitative anecdote”. What such tales can tell us, that percentages and correlations can not, is how – how a dyad co-constrain one another, how transitions happen, how elements are integrated into an effective coordination. They also may be necessary for grappling with the issue of “strategic behavior”. For an act to qualify as strategic, as efficient or effective toward a goal, some assessment of value-in-context is required. While such values can be given behavioral faces – in approach and avoidance, in persistence towards or the stability of outcomes, in the promptness, elaborateness, smoothness, or even contentiousness of interactions - it is only in situated samples of sufficient length and complexity that such faces become meaningful – to us, and probably to the animals as well. In determining which acts facilitate which outcomes – whether we are talking about an animal positioning itself, gesturing to another, or two animals acting in concert – context is required, since success can only be measured in reference to local conditions. No behavior can succeed or fail – only behavior-in-context. To paraphrase Bateson in Mind and Nature, for an act to be adaptive, an environment that selects it is required. It is that ecological interaction between act and environment that only the conscientious anecdote can provide.

-----------------------

[1] Moves “To” are further differentiated in the Body Modality as To, Pivot or To, Locomote, (for when there is no vs. some change in Inter-Animal Distance) and within the Hand Modality as To, Touch and To, Grasp (since these may be important distinctions for the development of gesture and tool-use)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download