Film as dynamic event perception: Technological development forces realism to retreat

Autor: Heiko Hecht
[erschienen in: IMAGE 3 (Ausgabe Januar 2006)]

Schlagwörter: Filmwahrnehmung, Realismus

Disziplinen: Psychologie, Filmwissenschaft

I entertain the thesis that a human need holds the key to understanding event perception in film. Bazin entertained that photographs freed western painting from its obsession with realism. I extend this position by claiming that it is a basic human need to always have one medium that stands for the quintessential way to pictorially render reality. Only the medium that produces the currently most realistic renditions will have to be obsessed with realism. When motion pictures replaced still photography as the superior medium, photographs were - in turn - freed from the burden of realism. Movies will only be caught in this role until a superior medium - maybe virtual reality environments - become mainstream. This chapter assesses the remaining differences between natural viewing and motion pictures from the point of view of dynamic event perception. It takes a closer look at the perceptual regularities that constitute natural events, and the extent to which the same regularities can be captured in film. It then explores the violations of these regularities that occur in motion pictures. Some of these violations, such as the camera position at the time of recording differing from the spectator's viewpoint, cannot be helped. Other violations, such as temporal cuts and jumps between scenes, could be avoided. This opens up the question why directors choose to violate some laws of natural viewing while they stay away from violating others. Among these self-imposed limitations that the director chooses for her or his work are spatio-temporal constraints and causality constraints. I argue that directors have violated almost every single spatio-temporal law that holds for natural events. The causality of natural events, on the other hand, is rarely touched in film: Objects do not spontaneously assemble out of dust, things fall down rather than up, etc. Thus, as progressively as directors play with place, time, and viewpoint, they are extremely conservative when it comes to the causality of events. Even cartoons and science fiction movies only scratch the surface and violate but a few minor causal laws. Does the psychology of dynamic event perception forbid serious violation of event causality in film? Or do directors merely follow self-imposed constraints because they are using the medium whose function it is to depict reality?

1. Introduction: the reality of film

When Daguerre announced the invention of his photographic plate technique 1839, many artists considered it to be the perfect tool to achieve easily what naturalistic painting had sought to achieve all along, namely the realistic rendition of views of the world (e. g. Scharf, 1983). As a consequence, the photographic approach to realism has fundamentally changed the world of pictorial art. Painters no longer attempted to render the world naturalistically, but they started experimenting with the medium of painting. First hesitantly as witnessed by impressionist distortions of shape and color, and then more and more extremely, as for instance with the cubist disposal of linear perspective. André Bazin has called photography ”the most important event in the history of plastic arts” (Bazin, 1967, p. 16) precisely because it has put painting in a position to ”recover its aesthetic autonomy” by freeing it from its ”obsession with realism” (ibid.). This liberation was so thorough that there seems to be hardly any pictorial venue in paining that still awaits exploration. Experimentation with the medium of pictorial art has reached its conceivable extreme with Kasimir Malevitch's White on White (Museum of Modern Art, New York, 1918), or certainly with a blank canvas if we want to count it as pictorial. Thus the old medium painting has entered an experimental stage while the task of naturalistic depiction was ceded to photography.

Deviating from Bazin, who did not see a difference between still photography and film as far as the liberation aspect is concerned, I venture that the invention of motion pictures -- maybe datable to Edison's demonstration of his kinetoscope at the Chicago World Exhibition in 1893 (Münsterberg, 1916) -- has freed still photography from the pressure of being the most naturalistic form of rendering. According to classical film theory (see e.g. Anderson, 1996), the ability to display a progressive sequence of photographs as opposed to mere snapshots constituted a qualitative jump regarding the goal of naturalistic capture of the world. Siegfried Krakauer (1960) even speaks of the redemption of physical reality and a natural superiority of film over still photography. To him the ability to record movements amounts to the achievement of capturing actuality. Interestingly, he also notes that the plain recording of a real-world event may not seem as real as a carefully staged event, which can produce a better-than-real illusion of reality. In analogy to the relation between painting and photography, the new medium of film may have freed still photography from the task of naturalistic depiction. And indeed the examples of experimental photography abound (see e.g. the photomontages by László Moholy-Nagy or the ”rayographs” by Man Ray). We may say that by succeeding still photography in the claim to best naturalistic rendering, film has opened the venue for photography to become experimental. Obviously this does not mean that still photographs are no longer used for purposes of creating likenesses. Photographs, nowadays fulfill a twofold role: They took over the role of naturalistic rendering and they also experiment with the medium, and arguably the extremes have been explored here as well. Take for instance the superimposition of two view-points in some of Marcel Duchamp's photographs (Gould & Shearer, 1999) or the ”forgery” of real objects in others (Peterson, 2000). Duchamp claimed to have photographed regular objects, such as a snow shovel, but it turns out the objects were manipulated and dysfunctional.

Given the vast changes in our notion of what constitutes a realistic pictorial rendition over the last 200 years, it might be the case that our views will change, once more, with the advent of virtual environments. However, it might also be the case that fundamental ecological constraints of event perception set the limits for the development of realistic rendition. The transition from the arrested or frozen optic array, which is constituted by a still photograph, to the progressive optic array (see Gibson, 1979) means nothing less than the discovery of the ecological way to render events. One of James Gibson's great insights is that photographs are a special case of an arrested or frozen optic array and that the motion picture has to be regarded as ”the basic form of depiction” (1979, p. 293). That is, film more than anything before it taps into basic aspects of natural perception. -- But what constitutes natural perception? For a cognitive scientist, it is generally undisputed that the visual system has evolved to perceive progressing visual events in order to act intelligently upon them (see e. g. Shepard, 1994). If we consider natural perception of events and the underlying psychological processes, we will be able to identify where film deviates from these and what the perceptual consequences of such deviations entail. In the following pages I take the most important ecological insights and apply them to an analysis of motion pictures. I will first describe the ecological principles by which the perceptual process is taken to arrive at the immediate experience of objects and their properties such as mass, forces acting on them, etc. Then I will analyze where film must, for structural reasons, violate certain of these perceptual processes and where film has the choice or artistic freedom to violate others.

2. Kinematic specification of objects and events

Still picture perception is always in a dilemma: No matter how expertly a still photograph is manufactured, the mapping between real-world object and its depiction is asymmetric. While an object - when photographed from a given vantage point - can only produce one particular photo, any given photo is compatible with an indefinite number of 3-D scenes, of which it could be a photograph. Figure 1 illustrates this asymmetry or inherent underspecification of the referent by the photograph.

Figure 1: The image at the top is underspecified because it can represent an indefinite number of 3-D objects. Two conceivable interpretations are shown:

Interpretation 1 suggests one bi-colored planar object. Interpretation 2 suggests two separate overlapping objects. Only additional vantage points, as provided in the middle and bottom panels can solve the underspecification problem.

The photo at the top could represent one bi-colored object, or it could represent two separate overlapping objects. Only additional information such as additional vantage points of the camera can resolve the underspecification problem, as illustrated by the bottom panels. For complex objects two view points may not suffice, but certainly, as soon as continuous motion is introduced, all disambiguities disappear and the object is uniquely specified. In Gibsonian terms, the observer has picked up the invariant structure of the object. Thus, natural perception hinges on the presence of motion, which gives us progressive views. This ability of motion to disambiguate the shape of objects is often referred to as the principle of structure-from-motion (SFM). It is irrelevant for SFM whether motion is introduced by a displacement of the observer or by displacement or rotation of the object. While in still images additional vantage points are only possible by tricks such as double exposure or cubist decomposition, the movie naturally takes advantage of SFM and disambiguates the situation as soon as the camera moves around the object or the object moves with respect to the camera.

However, according to ecological theory, perceiving is above all the direct pick-up of affordances (Gibson, 1979), that is of the possible uses and functions of objects such as being reachable or providing a support surface for the observer. These affordances require that we are able not only to perceive the motions or kinematics of objects but also the more complex dynamic variables such as mass, friction, force, momentum, and energy. It is the perception of dynamic variables that makes complex event perception possible. Physicists group dynamic events together in the field of classical mechanics. The collision of two billiard balls would be an example of such a dynamic event. From a perceptual point of view, the involved variables are not as easily accessible as are non-dynamic variables such as size, length, or color. Nonetheless, many everyday situations require that we make judgments about dynamic events. The visual system is able to do so by virtue of a fundamental principle of event perception that also underlies film perception. It is the principle of kinematic specification of dynamics (KSD). KSD states that direct perceptual qualities emerge when the dynamics of a situation are sufficiently specified by its kinematics (for a description of the principle see Runeson & Frykholm, 1983; for a discussion of the difficult theoretical status of the concept see Hecht, 1996). For example, in the case of the colliding billiard balls, the specific velocity changes of incoming and exiting balls can only be obtained with one particular mass ratio and coefficient of restitution. KSD claims that people perceive the mass ratio directly based on the kinematics of the event (i.e. changes in the velocity vectors between pre- and post-collision phases). Thus, the introduction of motion into pictures not only solved the underspecification problem, but for the first time provided direct visual access to the world of dynamic events, which had hitherto been undepictable. In many ways, the visual invariants that define objects in the real world can be easily extracted from moving images (Gibson, 1979).

Applied to film, the KSD principle makes the claim that as long as a particular invariant is given in the kinematics of the depicted motion, it allows the visual system of the beholder to extract these invariants in a direct manner. And in fact, studies demonstrate that even extremely simplified movies, say of an actor lifting a box of unknown weight (Runeson & Frykholm, 1981; Bingham, 1993), provide the observer with remarkably accurate judgments of dynamic facts, such as the approximate weight of the box. Thus, the kinematic information in a simple movie is sufficient to let us perceive whether the actor used an empty or a filled suitcase. One could say that the ability of the visual system to relate event kinematics to dynamic properties in natural viewing is a prerequisite to creating believable movie worlds. Thus, what makes event perception possible in the first place allows movies to be as natural as they are. This basic truth is both a boon and a plight to the ecological approach. It is a boon because it explains why movies are as powerful as they are. A large number of invariants remain the same in natural and movie viewing. It is a plight because the visual system is confronted with something everyone but Gibson would call a cue-conflict: Some invariants may specify an object that can be picked up and used as food while other invariants (the projection surface, the unnatural lighting, etc.) specify 2-D patterns on an immovable large screen. Since the ecological approach usually assumes a uniqueness directness in specification that is lacking here, it has a difficult time to explain how two contradictory things can be specified at the same time. It is important to note that Gibson would not agree that the visual system is in a permanent state of internal conflict as long we watch movies but rather he speaks of a particular form of dual awareness (1979, p. 292 f) in this case. Maybe events specified in film are best thought of as a duplication of reality, although Gibson may not have liked that term either.

3. Depicting events

To understand the role of event perception for the analysis of motion pictures, we need to consider the basic relationship between the world and its depiction. To do so, let us take a look at the theoretical space of depictions regarding to what extent they approximate the real world. At the bottom of this theoretical space (see Figure 2) we find pre-Renaissance painting that conveys symbolic meaning but also some qualitative information about spatial relationships, as for example established by one surface occluding another. The discovery of linear perspective introduced a leap in fidelity, while photography provided a speedy automated way to conserve the optic array, which was already approximated by linear perspective. Motion pictures represent another qualitative step as argued above. I hold that even at the extreme end of this theoretical space it is impossible to imagine the perfect depiction that is no longer distinguishable from its referent. Using the conceivably best virtual environment system (VE) with close to infinite resolution and perfect response characteristics (no lag-times) we continue to receive information about being in a physical world that differs from the visual one. Such information is provided mostly by other senses, such as odors, gravity, wind. In other words, the unity of the senses is still lacking. However, all the visual affordances and invariants are present in the VE. And most of these are retained in the movie whenever the camera takes over the ability to explore, which has been lost by the observer.

Figure 2: From bottom to top, the pyramid of visual techniques shows what each of them added to pre-perspectival painting.

Thus, the movie has obvious shortcomings compared to the (unreachable) perfection of a VE, and certainly compared to natural world. But in these shortcomings, I claim, lie the movie's unique ability to violate the laws of nature. In the space between poor drawing and perfect VE, the movie has sufficient veridicality to convey rich invariants very similar to those available in natural events, and it need not obey the laws of nature or the laws of interactivity that constrain VE. Hence, the movie is the perfect medium to create fictional worlds defying the laws of nature. Strangely, the fictional worlds that have been created by film directors are mostly very conventional and natural law abiding. To prove this point, I need to scrutinize violations to natural event perception once in terms of what is possible in principle and then in terms of what violations are commonly made.

But first let me point out that I am not concerned with some attributes of the medium that may contribute to the realism of the depiction but which are of minor importance with respect to event perception. One of these is the loss of information that is part of any depiction. The still image is limited in resolution, the area it subtends (visual angle), the frame that is chosen, the two-dimensionality of the material that usually serves as the image surface, the position of the observer. All these limitations also apply to the motion picture. I believe it is fair to neglect these shortcomings. They characterize pictures and lead to the strange duality inherent in the perception of both still and moving pictures (Gibson, 1978). The duality lies in the realization that two kinds of invariants are offered simultaneously, those that specify the flat surface in the room upon which changing patterns are cast, and those that specify the objects in these patterns. When fully immersed in a movie (see e. g. Slater & Wilbur, 1997) we may forget about the first set of invariants temporarily. The second set of invariants holds likewise for natural and depicted events.

It is established then that movies preserve a large number of invariants that are identical to those provided by natural events. But what exactly is an event? Gibson's (1979) attempt to define it has remained the best psychology has to offer to this date (but see Cutting, 1981; Hecht, 2000a). Gibson argues that ”we should begin thinking of events as the primary realities and of time as an abstraction from them” (p. 100). To begin with, events are disturbance of optical structure. Consequently, depicted events have to be judged with respect to how they provide the same disturbances as do natural events. Among all disturbances, Gibson first identifies internal events, which are equivalent to a displacement of the point of observation. They are internal in the sense that they depend solely on the observation point. They are completely dependent on the position changes of the observer. Gibson sets them aside because he is mainly concerned with world events. However, they are extremely important for the film maker. Let's look at the example of a head turn to the observer's right. Optical texture to the left will disappear and new things will appear toward the right. This process is reversible and there is a striking similarity to a camera pan. The awareness of the world outside the current field of view is somewhat independent of the current view. Gibson (1979, p. 118) speaks of a sliding sample of the ambient array. This and other internal events are part and parcel of natural viewing as well as of film viewing. They have the unique characteristic that everything in the field of view moves in synchrony. And quite obviously, the experience of the efferent signal from the brain to move the head with respect to the world that is normally responsible for any change in the field of view, does not seem to be required to enjoy a movie. Moreover, other sensory modalities such as vestibular cues to motion play an inferior role and their lack is often not even detected. For instance, performance and experience in flight simulators is only marginally improved when body accelerations are imitated by putting the whole simulator on a moving platform (see e.g. Bürki-Cohen, Boothe, Soja, DiSario, Go & Longridge, 2000). From visual information alone we have no trouble noticing self motion even when we are being moved passively. This also explains such illusions as vection, which you might have experienced in a train when you cannot tell whether your train is departing or whether the train on the adjacent platform is moving. Thus, the sufficiency of information to specify the internal event of locomotion appears to be responsible for the fact that movie-goers have no trouble placing themselves where the camera is and interpreting its motion as repositioning the own vantage point.

The other type of events are external events, which comprise all optical disturbances that are not caused by the observer and therefore must emanate from outside causes. External events are grouped into three main varieties, changes in the layout of surfaces as produced by changes in an object's position or orientation, changes in color and texture of surfaces as for instance produced by a fruit ripening, and changes in existence of surfaces, such as an object breaking into pieces. According to the above mentioned principle of KDE, film is able to uniquely specify most, if not all of these external events. That is the kinematic information that specifies the dynamic events of occlusion, collision, disintegration etc. is preserved in film. Interactivity is missing, but external events are sufficiently specified. Filmic events may lack uniqueness because they are accompanied by contradicting invariants specifying the canvas, but they allow the viewer's visual system to identify the external event that is represented.

4. Violations of the rules of natural viewing

Given this general mode of human perception combined with the artificial nature of motion pictures, it is possible to violate most of the laws that govern external events. I claim that this ability is unique to motion pictures and responsible for a good part of their fascination. In other words, the true power of a motion picture lies in its ability to specify events that are impossible in the natural world. These cases, which I call violations of natural viewing invariants, produce emotions like joy and magic. To support this claim I will now take a closer systematic look at different violations of internal and in particular external events. These violations are the key to understanding filmic event perception.

4.1. Violations in frozen renditions

The underspecification of the 3-D scene to which a picture refers translates into the fact that it is possible to construct a corresponding 3-D scene from most pictures, albeit strange looking ones. Hence, in frozen renditions conventional rather than objective violations prevail. If we create a distorted perspective rendition of a shoe carton, its sides may no longer look parallel, but who is to say that such a strange carton cannot exist. We have violated conventional wisdom of shoe cartons but not any natural law. Even size violations may depict a possible world. If we stand 50 cm in front of a photograph in which a matchbox subtends 1 m we are either confronted with a magnification or we have assumed the wrong viewing distance. Interestingly, the visual system is very tolerant with regard to assuming incorrect viewing distances although there are noticeable differences between altered viewing distance and a shot with a magnifying lens (see Lumsden, 1980). So strictly speaking, these are no violations. We can conceive 3-D worlds that correspond to these interpretations. True pictorial violations in frozen arrays are rare but they do of course exist, and have been discovered by artists consistent with my earlier claim that still images leave no more room for exploration. Prominent examples are the drawings of Maurits Escher or the paintings of René Magritte. The ”Waterfall” by Escher (Figure 3) violates global depth relations while some of Magritte's paintings violate the law of occlusion (i. e. the fact that closer objects visually cover up objects positioned behind them). This is for instance the case in ”Le blanc seing” (Figure 4) where the lady on the horse should be occluded by the tree in front of her.

Because of the inherent ambiguity of still pictures, violations observed in them are quite rare. The true violations just mentioned, however, could in principle be transferred to motion pictures. Imagine water running down Escher's waterfall and the mill's wheel turning. What a strange paradoxical world we were to enter in such a movie. More extreme even, the animated version of Magritte's rider: imagine the lady's surface texture on every other tree behind which she rides. Let us note that these violations are of such fundamental nature that it is not surprising we cannot readily think of any motion picture that has explored them. At the same time there is nothing that prevents experimental film to do so. The film work of early cinematographers such as Hans Richter, Fernand Léger, and Maya Deren took steps in this direction of ”animated painting” (Deren, 1960), but this venue soon became abandoned. This might explain why some of Maya Deren's short films such as ”Meshes of the Afternoon” (1943) with its causality-defying events to this day appear revolutionary, although practical technical obstacles to such experimentation have ceased to exist. Before the invention of computer animation pictorial violations had been much harder to implement then they are today.

Figure 3: M. C. Escher: Waterfall (1961). Note the inconsistencies between the top and bottom of the supporting columns. Subtle violations of depth relations create the paradoxical outcome of a perpetuum mobile in the viewer's mind.

What cannot be explained by practical arguments is that other conceivable violations of a basic pictorial nature have neither been attempted. For example, I know of no film that has used false perspective and deliberate distortions to explore its effects in a consistent manner. And this in spite of entire books on alternatives to linear perspective that might produce more veridical renditions (e. g. Barre & Flocon, 1968). To my knowledge, false perspective distortions have only been employed when inevitable as side-effects of extreme focal lengths, or for short sequences to indicate inebriation or dream states of the protagonist, such as in Alfred Hitchcock's ”Spellbound” (1945).

Figure 4: René Magritte: Le blanc seing (1965)

4.2. Violations in dynamic renditions

In addition to the radical violations that play with pictorial integrity, moving images open up a whole new realm of possible violations. It seems worthwhile to classify them as follows. It turns out that edited film always violates one or more tenets of event ecology.

Violations of space

When seated in a cinema, we typically do not assume a position that recreates the optic array we would have encountered had we been where the camera was. First, we assume ”wrong” positions in front of the canvas and end up viewing the image from too close, too far or from the wrong angle. Most of the time the viewer does not notice and certainly does not mind. The visual system seems to ignore or compensate for many non-rigid transformations that result as a consequence of the ”wrong” position. At least the linear perspective projections normally viewed in this manner are robust in the face of the distortions (Cutting, 1987; Kerzel & Hecht, 1997; Yang & Kubovy, 1999). Many other inconsistencies, such as replacing a camera approach with a zoom usually go unnoticed as well. These internal events appear to be interpreted correctly when they suggest observer motion and they tend to be ignored when they introduce some unwanted distortions. This might explain why Gibson does not elaborate on internal events and why directors have not experimented with them by distorting spatial layout and other tricks.

Other spatial violations that are solely possible in film where a 3-D space is defined through motion (SFM). Take, for instance, cases that suspend the fundamental truth that 2 objects cannot occupy the same space. In Ivan Galeta's ”Two Times in One Space” people split, which gives them a phantom like quality (see Bordwell & Thompson, 1997).

Violations of time

Cuts usually introduce violations of the natural flow of time. Be it that the observer is teleported to a different place at the same time or that a scene from the past is presented as a flashback. Unlike any other possible violation of natural event perception, temporal violations have become the most widely used and discussed film technique (Bordwell & Thompson, 1997). A typical action movie contains as many as 2000 shots and more breaking the natural flow of events with every cut. Time lapse and slow motion are used subtly in almost every (action) movie to emphasize and de-emphasize parts of the action or to make scale models appear more natural. And in Godfrey Reggio's ”Koyaanisqatsi” (1983) temporal compression and dilation have been used to the fullest range. Interestingly, repetitions are used much less frequently. An early player with the time violation of repetition was Leni Riefenstahl in her documentary on the Berlin Olympics (”Olympia” 1938) containing such shots as a series of short cuts showing several athletes in the moment of soaring off the high bar without ever showing a landing in-between. At the level of the storyline where the film constructs time, duplication and fragmentation are common tools such as in parallel action sequences or in fragmented action, as in Edwin Porter's ”The Life of an American Fireman” (1903).

A more complex temporal violation is the reversal of time, which happens frequently in terms of storyline violations (i. e. flashbacks or flash forwards into the future) but hardly ever at the level of single action units. I remember vividly when after movie presentations in grade school we begged our teacher to show part of the movie backwards, which has become a lot more difficult in the age of video. We took enormous pleasure in this temporal reversal, presumably because of the resulting causality violations (see below). However, the fact that low-level time reversals are almost never found in Hollywood movies - with the possible exception of animated cartoons - suggests that children soon mature beyond this stage just as they mature beyond playing peek-a-boo once object permanence has developed (e. g. Flavell, Miller & Miller, 1993).

Thus, high-level temporal violations have become standard repertoire while their low-level counterparts remain the exception. Hochberg and Brooks (1996) point out that what makes cuts visually comprehensible is not a conventionalized film grammar, but rather the avoidance of unnatural apparent motion effects. This is supported by findings that the visual system is very forgiving when scene changes are introduced during eye-movements, or during other visual disturbances. This so-called phenomenon of change-blindness (O'Regan, Rensink & Clark, 1999) suggests that observers fail to notice even large objects that are added or removed from the visual field during eye-movements and presumably also during cuts. The internal representation of events is so sparse that disruptions are easily tolerated. This can explain the tolerance for high-level violations but not the system's sensitivity to low-level violations. A scene played backwards is just as smooth as its forward counterpart but it looks wrong and - for some of us - funny. I claim that reversals only look wrong if event causality is violated (see below).

4.2.1 Violation of internal events

A number of optical changes specify what Gibson called internal events. Normally internal events go hand in hand with changes of viewing direction, head position and locomotion. In motion pictures the camera replaces the head and records the optical changes in the eye's stead. Changes in camera position, angle, focal length etc. all contribute to filmic internal events. Contrary to Hochberg and Brooks' (1996) claim that some movement-produced information is ignored or contradicted by film makers, as for instance in a trucking shot, it might be better to think of the camera as an omniscient observer. In the movie theater the spectator can locomote in ways she normally does not or does not have the means to (levitate, walk backwards without looking in that direction, fly, shrink to fit into a keyhole, etc.). Under the premise that the observer attends to the window into the film world provided by the canvas, is the director confined to playing with these optical changes never exceeding what an infinitely fast, movable and scalable observer could do? Or can some internal events be created that violate what is possible to such an idealized observer?

The first attribute of an internal event discussed by Gibson is dynamic occlusion consisting in the progressive deletion and accretion. It occurs at occlusion edges but not at edges inside an object, such as color boundaries. For instance, when a car moves on a road some texture gets deleted at the front of car, and at the same rate texture gets uncovered behind the car. Precisely because of this smooth deletion on one side and accretion on the other does the visual system signal a moving object with its direction of motion pointing in the direction of the texture deletion. I believe that it is inconceivable to have systematic deletion and accretion but no motion or vice versa to have motion but not texture change. Thus, we are dealing with a universal law of perception. The invariant of accretion and deletion cannot be violated in film.

Perspective transformation and the apparent foreshortening of objects when different viewpoints are assumed by the camera, on the other hand, can be manipulated. It would make for a very strange world indeed, if every time the camera moves to the left one particular object were to behave as if the camera had moved to the right. However, note that such violations are not impossible, the object could have turned at exactly the same time the camera did. Likewise changes of perspective in parts of the scene are consistent with parts of the visual world warping. Such violations have been used for dream scenes and the like.

We have already touched on the next internal event, magnification and minification. It stands out because the zoom is the only standard technique specifying an internal event that is not reproducible by the above omniscient observer. All other internal events do not violate what this observer could experience. An observer approach (dolly shot) should cause dynamic occlusion but a zoom does not .The fact that nonetheless zooms do not look strange or unnatural is remarkable. It might be explained by the familiarity with binoculars or - more likely - by the failure of observers to discriminate the subtle differences between zoom and actual approach.

In summary, those invariants specifying internal events that can be violated are typically not violated by film makers with the exception to achieve special unrealistic effects. Presumably the transformational invariants that specify observer motion need to be left untouched in order to prevent the observer from attending to the fact that she is not actually where the camera is.

4.2.2 Violation of external events

We follow Gibson in his conviction that the same motion-based invariants that solve the underspecification problem in natural viewing can also be present in motion pictures. However, at the director's liberty they do not have to be. Classical invariants no longer need be invariant. It becomes evident that we need to analyze those cases where the film no longer provides the same invariants. In other words, for each external event we need to determine first whether it can be violated in film and if so how the violation changes the perceptual outcome. It turns out that pretty much every invariant property normally specified by changes in the layout of surfaces, in their color and texture, and in their changes in existence can be modified or destroyed in film:

Gibson (1979) emphasizes the importance of naturally occurring terrestrial events for human perception and action (see also Flach, Lintern & Larish, 1990). Fowler and Turvey (1978) extended the notion of (external) event and defined it to be the minimal system - consisting of the actor and her environment - that will adequately describe skilled performance (for recent discussions of the concept of event see Stoffregen, 2000; Hecht, 2000b). For our purposes it suffices to note Gibson's distinction of reversible events (e. g. the bounce of a ball) from events that are irreversible in time (e. g. shattering of a glass). Film can of course easily reverse events that are not reversible in nature by playing the reel in backward direction. All external events, reversible and irreversible, are specified either by changes in color and texture of surfaces, by changes in surface existence, or by layout changes.

Color and texture changes normally go hand in hand, therefore color should not be treated as a secondary quality. Examples are ripe fruits turning red or wood blackening in the fire place. It is immediately evident that these changes can be easily manipulated in motion pictures. However, as they often happen slowly they may escape the viewer's attention.

Surface existence changes occur when objects change state, such as ice melting or facades crumbling. Film directors have played with surface existence. For instance, the robot T-1000 in Terminator 2 is made of ”liquid metal” and can reconstitute its solid shape after being liquefied. However, such play does not constitute a violation of invariants that specify surface existence. It is readily visible when the transformation from liquid to solid happens. On principle grounds, a solid surface cannot be specified to be liquid at the same time. That is, surface specification is a truly universal mapping that cannot be violated.

Layout changes constitute the most important perceptual events for our purposes, as they happen on the time scale to which we are most sensitive and because the can be easily violated in film. Gibson grouped layout changes into rigid object displacements, collisions, non-rigid object deformations, surface disruptions, and surface deformations. Layout changes are due to complex forces and normally make these underlying forces visible. For instance a sudden displacement that speeds up in the vertical followed by a sudden stop and a deformation is clear evidence for an object that has fallen. Moreover, we can easily see from the layout change alone whether the object was very light or heavier (Hecht, Kaiser & Banks, 1996), and whether it was animate or inanimate (Gelman, Durgin & Kaufman, 1995). Collisions of two objects can specify their mass ratios, surface deformations give away material properties, and surface disruptions (cracking, disintegration) specify whether we can safely walk on it or if we can pick it up and throw it. The number of examples is endless.

It is important to note that while layout change is specified at an incontrovertible level, the significance of the layout change is easily changed and manipulated arbitrarily in film. We can have the hero walk on water, the cannonball can make a detour, and bullets can be caught with bare hands. While the ingredients of layout are uniquely specified, the level of layout change that constitutes meaningful events is no longer uniquely specified in movies. I prefer to call such violations at the level of meaning causality violations. I claim that these violations are the most important category of violations unique of motion pictures. At the same time, most Hollywood movies use the ability of violating causality with great caution.

5. Violation of event causality

At the higher level of meaningful perceptual events the director can use layout specification to create countless external events that defy the very causal laws that govern our world. For layouts can be specified that are inconsistent with almost any law of physics that we can think of, such as the law of gravity or the law of energy conservation. The foremost violations of this nature can be found in animated cartoons, probably because the violations were cheaper to produce this way. If Wile E. Coyote - after running off a cliff - remains suspended in mid-air for an instance before ”remembering” the law of gravity and then inevitably falling to the ground, the visual system has the choice of a) reinterpreting the timing of the scene and conclude an immediate fall, b) question the pervasiveness of gravity, or c) decide that the situation is unecological. Presumably, the scene is funny just because c) is concluded. Otherwise we would probably not take any particular notice. And as a matter of fact, the temporal suspension of gravity has to be timed just right for the effect to be noticed. Hecht & Kerzel (2000) have presented observers with a computer-animated scene of a basket ball propelled toward the floor and rebounding at an angle. Upon varying the ball's deformation such that it happened several frames too early or too late, observers rated the early deformation as natural as the canonical event, whereas the delayed deformation looked goofy. This is evidence that the visual system anticipates the mechanics of animated events even if the animation is rather crude. It must have some knowledge of the basic laws of mechanics at a very basic level. Thus, Wile E.'s fall is anticipated. The brief discrepancy between anticipation and visual evidence produced the humorous effect. Ironically, less subtle and systematic causal violations are more likely to be found in animations created to assess naive knowledge of the real world by means of filmic creation of impossible worlds, rather than in box office movies. For instance the animation of impossible trajectories described by a beer keg dropped from an airplane in mid-flight (Kaiser, Proffitt, Whelan & Hecht, 1992) found that conceptual and perceptual biases can be closely related.

Basically, film can specify an indefinite number of layout changes and combine them such as to violate all causal relations that govern complex natural events. One can make the case that the underspecification problem that Gibson's approach so nicely solved for natural scenes is not only unresolved in film at the level of causal interactions, it is even exacerbated because a new class of possibilities arise, the unecological. Take, for instance, the bullet that slows down in front of Keanu Reeves in ”The Matrix” (Wachowski Brothers, 1999) and then can easily be plucked out of the air by him. Is a real bullet specified at the moment the trigger is pulled and a fake one as it gets close to Reeves? Is the thin medium air specified at first suddenly replaced by an invisible thicker medium? Does the bullet have a propulsion of its own, or does the hero have strange powers? The plot makes us believe the latter but without knowledge thereof we are at a loss. The situation is no longer uniquely specified. Here we touch on a major assumption that the viewer has to make. The assumption that we live in a terrestrial environment and that because we have evolved in it, certain things cannot be the case. As reasonable - and unnecessary - as this assumption may be in the real world, it is no longer mandatory in the realm of film. The underspecification problem is wide open again as soon as we have to drop the assumption of a terrestrial environment.

Maybe as film viewers we basically do not want to part with this assumption. We have yet to encounter a movie that carries through the consistent violation a basic law, such as the law of gravity. Imagine a movie where instead of gravity the following law holds: Everything works normally as long as objects are in contact with the ground plane, but as soon as they loose contact they fall upward until they touch another surface or else disappear forever into the sky. We would walk around but never lift both feet off the ground in the outdoors. We would need no garbage collectors and lifting someone off the ground would be murder. Such an alternate world would be strange and powerful once the viewer buys into it. However, current cutting-edge movies are not pursuing this venue, as if the viewer could only tolerate minor modifications of terrestrial physics, and those only if limited to heroes and magic situations.

What would happen if we were forced to do away with the terrestrial environment assumption in a thorough manner? Would our visual system be at a complete loss? Could this be the reason why many of these effects have not yet been explored by film makers? If we apply a realist interpretation to the principle of direct specification (KSD) in ecological theory, we have to conclude that the visual system would be at a loss once ”impossible” events are specified. Indirect approaches also predict that we should have trouble perceiving such events. Indirect perception assumes that the visual system is an inference machine that solves the underspecification problem by picking the most likely interpretation. It can do so because it relies on knowledge about the world that the organism has acquired throughout the course of evolution (Shepard, 1994). The visual system has internalized many of these laws and therefore deviations from them should produce striking effects.

However, I do not believe that our visual system is constrained to perceiving ecologically possible events. To the contrary, it is extremely flexible and plastic. We have no trouble understanding footage taken of astronauts floating in weightlessness and we can get used to objects that fail to fall down. Fears that the visual system might not be able to handle speeds of locomotion exceeding that of a horse turned out insubstantial when fast railroads came along. And fears that the visual system might not support spatio-temporal violations when cutting from one scene to another were likewise misguided. This would mean that strictly speaking Gibson's realist position can no longer be applied to avant-garde motion pictures. Gibson may not have not have realized the dissolving power that movies could have on his realist position.


Let us go back to the question posed at the beginning: What is the best pictorial rendition of reality? I claim that most directors strive for such a superior rendition and that they are aware of small violations of event invariants that are required for this purpose. This is why we see mostly films that violate a select few event invariants to a small and tolerable extent. Just as painters knew that in certain instances linear perspective had to be violated, do directors learn to create a film world that looks most realistic. In painting and photography, for instance, spheres far from the central camera axis should be depicted as ellipses, but they look more natural when they are painted as circular areas, which is exactly what Renaissance painters did (Pirenne, 1970). In analogy, objective shooting of a real scene will not always produce the best rendition of an event.

Experimental film aside, movies strive for a high degree of realism. Even the animated cartoon attempts to make things look natural (deMarchi & Amiot, 1977), although it sometimes plays with its possibilities. Kracauer (1960) calls this the realistic tendency of film, and in the same train of thought states that ”What holds true of photographic film does of course not apply to animated cartoons. Unlike the former, they are called upon to picture the unreal - that which never happens.” (p. 89). Thus, at least for realist film theory there seems to be a division of labor between experimental photography and realism-oriented film. Kracauer might have agreed with us suggesting that cartoon films have the great opportunity of violating many basic causal laws, but seems to think that this should not be done. Indeed, cartoon directors have only scratched the surface of what is possible. And when they did scratch it was for funny effects rather than to create unreal worlds. A notable exception is Disney's ”Fantasia”, which attempted to create a visual analogy of sound. For instance the section on Bach's ”Toccata and Fugue in D Minor” was used to inspire a series of entirely abstract images: shapes dance around completely defying gravity, there is no story, the silhouette of Stokowski dissolves into blotches of color, place and time lose their narrative meaning, terrestrial causality is inexistent, objects are reduced to their traces, etc. In this respect Fantasia was (and still is) a highly revolutionary film. Its flop at the box office when it was released in 1940 seems to prove the point that it was basically an experimental film (see Culhane, 1983). Its recent sequel (Fantasia 2000), however, is more of a success, maybe because its flying whales appear less revolutionary after 60 years of animated cartoon evolution.

Thus, realist film only deviates from true rendering (of event causality) in order to make things normally unseen visible, to emphasize the small by making it big, the transient by rendering it visible (see the revealing function of film, Kracauer, 1960). But we do not have to be realists, neither in film nor in reality, to benefit from an analysis of natural event perception and its potential violations. Interestingly, if we do not follow Gibson in his realism but rather assume that the visual systems needs to interpret and infer its precepts in all cases, be it natural vision or filmic events, hardly anything will change in our analysis. The visual system is then confronted with a discrepancy between well-ingrained inferences in natural viewing and less ingrained or inconsistent inferences in the case of watching a movie.

Since the advent of virtual environments, we have a new generation of visual renditions that may well be - short of simulation at the neural level - the final step to visual realism. In VEs the internal events are qualitatively different than in the movies. The visual scene changes with head and eye movements almost the way they do in natural viewing. In other words, in film the efference-afference coupling is broken, the head- and eye movements that I make while watching a movie have no consequences for the visual scene as long as I keep the screen in sight. In VE's the illusory visual world is a function of our real movements. Obviously, most extraretinal cues that normally accompany vision are still absent (vestibular stimulation, tactile feedback, kinesthetic cues, fluid shift in the body etc.). Notwithstanding the remaining differences between natural and VE viewing, I suggest that this qualitative step in the visual media should make realist film theoreticians reconsider the role they reserve for traditional movies. We may well be at a turning point where movies be freed from the burden of being the best technique of pictorial rendering that we have. Once freed from this task maybe they can lead us into new (experimental) domains of events that are physically impossible.


I have shown that facts and theories of ecological event perception can explain why film has yet to undergo the stage that painting and still photography have undergone already, namely the unconditional experimentation with the limits of the medium. As Gibson (1979) has noted, the main difference between the perceptual awareness provided by film compared to that provided by real events lies in the lack of intentionality and interaction that occur natural when we look at an object, walk up to it, touch it etc. Granted this difference, however, optical events are usually created to be as similar to real events as possible. The reason for this lack of adventurous spirit, I claim, is constituted in the need to always have one medium of depiction that fulfills the necessity to render naturalistically. To this date film represents this medium. Note that this need corresponds to Bazin's (1967) notion of the psychological need for realism. Thus, in contrast to his belief, I suggest a qualitative jump in the realism achieved by still photography, by film, and ultimately by virtual environments. While these media share many aesthetic features, they are vastly different from the point-of-view of perceptual psychology. The importance of motion was only appreciated comparatively recently (see Cutting, 2000) and the importance of action for perception is often relegated to ecological psychology. At least from an ecological perspective, the three visual media are vastly different in terms of the provisions for realism that they make.

With the new medium of virtual environments around the corner of mainstream entertainment, will non-interactive film be succeeded by interactive VEs in its role? And will traditional film hence become a medium for experimentation? Inferring from the past we can make this prediction once the main function of the motion picture, to tell a story, can be accomplished in VE. A story in this new medium has to be interactive, that is the spectator has to be able to manipulate the outcome of the story, or in the case of historical VEs should at least be able to move around on the Waterloo battlefield as combat rages. The latter might be easier to accomplish than a spectator-contingent story development, which would require programs for the robot-like agents reacting to the spectator's moves. Certainly once we are able to create interactive simulations of the quality envisioned in ”Star Trek” with help of the holodeck will the movie screen look completely outdated. In this case we could look forward to decades of testing the limits and exploring new violations of ecological event structure in movies. I hope that my reflection on what constitutes such violations in the face of realism can be used to analyze this potential development. Maybe the innovations in some music video clips are the beginning of film freeing itself from obeying event causality. We might be in the midst of film loosing its role of being the prime medium to render reality. On the other hand, motion pictures may have reached a degree of realism that approximates a ceiling that cannot be surpassed in terms of what is needed for a ”perfect” depiction of reality. But as mentioned above, we tend to be conservative until the next innovation teaches us otherwise.

While experienced realism holds the key to evaluating event depiction, the present chapter is not meant to come down on one side of the debate between realists and formalists (Singer, 1998). Rather, I have investigated the realism of film from a mostly ecological standpoint of event perception. My mission was to discover whether Münsterberg (1916, p. 185) was right in stating that ”While the moving pictures are lifted above the world of space and time and causality and are freed from its bounds, they are certainly not without law.” I hope to have shown how exactly movies violate temporal laws all the time, spatial relations much less frequently, and event causality surprisingly little. Unlike in painting or still photography, many of the possible causal violations have not (yet) been explored even by experimental film.

Given the even larger number of unresolved issues in the study of filmic event perception that we have touched upon, it is hard to understand why the psychological study of film is so limited. Maybe the recent interest in realism in VEs can change this, for two reasons. First, the study of realism has already taken many interesting turns in the context of VE displays. A number of measures for presence, albeit problematic ones, have been suggested and explored (e. g. Singer & Witmer, 1999). They could easily be applied to the study of realism and the lack thereof in motion pictures. Or they could be exploited to explore movie-specific questions such as the viewer's preference for one or the other of consciously indistinguishable shots (dolly vs. zoom). Second, the envisioned victory of VE for the prize of the most realistic rendition tool should free us researchers from the self-imposed assumption that film is so similar to natural viewing that it does not need to be studied separately from the real world. After all, most experiments on ”natural” vision these days use computer displays, which are movies at best.

I think it is fruitful to ask about event perception in film in terms of violations of natural event regularities. It offers a unique criterion to place a director's efforts into a space of what can in theory be done with the medium of film and what the director has chosen to do. If the director attempted to recreate natural event perception as closely as possible, as Evces (1994) suggests Orson Wells did in ”Touch of Evil” (1958), we can gauge if he really minimized as many of the violations as he could. On the other hand, we can now start to understand why temporal violations have been explored to the fullest, and why many causal violations have thus far not been tampered with. Is what has become a convention in the temporal domain waiting to become one in the causal domain, or is there a mainstream need to approximate reality as closely as possible in all other but the temporal domain? We may have to wait until virtual environments have become the mainstream source of visual entertainment and traditional film can become more of an experimental art form.


I am grateful for the inspiring discussions and helpful hints provided by Joseph Anderson, Bettina Friedl and Robert Schwartz.


Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale, IL: Southern Illinois University Press.

Barre, A., & Flocon, A. (1987). Curvilinear perspective: From visual space to the constructed image. Berkeley: University of California Press. (First French edition published 1968).

Bazin, A. (1967). The ontology of cinema. In A. Bazin, What is cinema? (Vol. 1, pp. 9-16). Berkeley, CA: University of California Press.

Bingham, J. P. (1993). Scaling judgments of lifted weight: Lifter size and the role of the standard. Ecological Psychology, 5, 31-64.

Bordwell, D., & Thompson, K. (1997). Film art: An introduction (5th edition). New York: McGraw-Hill. (1st edition 1979)

Bürki-Cohen, J., Boothe, E., Soja, N., DiSario, R., Go, T., and Longridge, T. (2000, May). Simulator fidelity: The effect of platform motion. Proceedings of the Royal Aeronautical Society, International Conference on Flight Simulation --The Next Decade (pp. 23.1-23.7.). London, UK.

Culhane, J. (1983). Walt Disney's Fantasia. New York: H. N. Abrams.

Cutting, J. E. (1981). Six tenets for event perception. Cognition, 10(1 sup 3), 71-78.

Cutting, J. E. (1987). Rigidity in cinema seen from the front row, side aisle. Journal of Experimental Psychology: Human Perception and Performance, 13, 323-334.

Cutting, J. E. (2000). Images, imagination and movement: Pictorial representations and their development in the work of James Gibson. Perception, 29, 635-648.

DeMarchi, S., & Amiot, R. (1977). Alles über den Zeichentrick- und Animationsfilm. München: Gemsberg. (original title: Le dessin animé d'amateur et l'animation, 1959)

Deren, M. (1960). Cinematography: The creative use of reality. Daedalus, 89(1), reprinted in G. Mast, M. Cohen & L. Braudy (Eds.), Film theory and criticism: Introductory readings, 4th ed. 1992 (pp. 59-70). Oxford University Press.

Evces, M. (1994). Touch of evil and ecological optics: Toward a demystification of conventional film editing practice. Journal of Dramatic Theory and Criticism, 8(2), 103-109.

Flach, J. M., Lintern, G., & Larish, J. F. (1990). Perceptual motor skill: A theoretical framework. In R. Warren, A. H. Wertheim, etal. (Eds.), Perception and control of self-motion. Resources for ecological psychology, (pp. 327-355). Hillsdale, NJ: Lawrence Erlbaum.

Flavell, J. H., Miller, P. H., & Miller, S. A. (1993). Cognitive development. Englewood Cliffs, N.J.: Prentice-Hall (3rd ed.).

Fowler, C. A., & Turvey, M. T. (1978). Skill acquisition: An event approach with special reference to searching for the optimum of a function of several variables. In G. E. Stelmach (Ed.), Information processing in motor control and learning, pp. 1-40. New York: Academic Press.

Gelman, R., Durgin, F. & Kaufman, L. (1995). Distinguishing between animates and inanimates: Not by motion alone. In D. Sperber, D. Premack, and A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 151-184). Oxford: Clarendon Press.

Gibson, J.J. (1978). The ecological approach to the visual perception of pictures. Leonardo, 11, 227-235.

Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.

Gould, S. J., & Shearer, R. R. (1999). Boats and deckchairs. tout-fait: The Marcel Duchamp Online Studies Journal, 1(1). (http://www.ToutFait.com/issues/issue_1/Articles/boat.html, accessed Oct. 13, 2000).

Hecht, H. (1996). Heuristics and invariants in dynamic event perception: Immunized concepts or non-statements? Psychonomic Bulletin and Review, 3, 61-70.

Hecht, H. (2000a). The failings of three event perception theories. Journal for the Theory of Social Behaviour, 30, 1-25.

Hecht, H. (2000b). Are events and affordances commensurate terms? Ecological Psychology, 12, 57-63.

Hecht, H., Kaiser, M. K., & Banks, M. S. (1996). Gravitational acceleration as a cue for absolute size and distance? Perception & Psychophysics, 58, 1066-1075.

Hecht, H. & Kerzel, D. (2000). The way the ball bounces: Trick-film wisdom versus perceptual knowledge. Manuscript in preparation.

Hochberg, J., & Brooks, V. (1996).The perception of motion picture. In M. P. Friedman & E. C. Carterette (Eds.), Cognitive Ecology, 2nd ed., (pp. 205-292). NY: Academic Press.

Kaiser, M. K., Proffitt, D. R., Whelan, S., & Hecht, H. (1992). Influence of animation on dynamical judgments. Journal of Experimental Psychology: Human Perception and Performance, 18, 669-690.

Kerzel, D., & Hecht, H. (1997). Grenzen der perzeptuellen Robustheit bei perspektivischer Verzerrung. Zeitschrift für experimentelle Psychologie, 44, 394-430.

Kracauer, S. (1960). Theory of film: The redemption of physical reality. New York: Oxford University Press.

Lumsden, E. A. (1980). Problems of magnification and minification: An explanation of the distortions of distance, slant, shape, and velocity. In M. A. Hagen (Ed.), The perception of pictures: Vol 1. Alberti's Window: The projective model of pictorial information. New York: Academic Press.

Münsterberg, H. (1916). The photoplay: A psychological study . New York: D. Appleton.

O'Regan, J. K., Rensink, R. A., & Clark, J. J. (1999). Change-blindness as a result of ”mudsplashes.” Nature, 398, 34.

Peterson, I. (2000). An artist's timely riddles: Deploying scientific methods to understand a Dada artist's provocative creations. Science News, 157(1), 8.

Pirenne, M. H. (1970). Optics, painting, and photography. Cambridge University Press.

Runeson, S., & Frykholm, G. (1981). Visual perception of lifted weight. Journal of Experimental Psychology: Human Perception and Performance, 7(4), 733-740.

Runeson, S., & Frykholm, G. (1983). Kinematic specification of dynamics as an informational basis for person-and-action perception: Expectation, gender recognition, and deceptive intention. Journal of Experimental Psychology: General, 112(4), 585-615.

Scharf, A. (1983). Art and photography. London: Penguin Books. First edition 1968.

Shepard, R. N. (1994). Perceptual-cognitive universals as reflections of the world. Psychonomic Bulletin & Review, 1, 2-28.

Singer, I. (1998). Reality transformed: Film as meaning and technique. Cambridge, MA: MIT Press.

Singer, M. J., & Witmer, B. G. (1999). On selecting the right yardstick Presence: Teleoperators and Virtual Environments, 8, 566-573.

Slater, M., & Wilbur, S. (1997). A framework for immersive virtual environments (FIVE): Speculations on the role of presence in virtual environments. Presence: Teleoperators and Virtual Environments, 6, 603-616.

Stoffregen, T. A. (2000). Affordances and events: Theory and research. Ecological Psychology, 12, 93-107.

Yang, T., & Kubovy, M. (1999). Weakening the robustness of perspective: Evidence for a modified theory of compensation in picture perception. Perception & Psychophysics, 61, 456-467.