Binding Three Kinds of Vision



Pictorial cues, together with motion and stereoscopic depth fields, can be used for perception and constitute ‘three kinds’ of vision. Edges in images are important features and can be created in either of these attributes. Are local edge and global shape detection processes attribute-specific? Three visual phenomena, believed to be due to low-level visual processes, were used as probes to address these issues. (1) Tilt illusions (misperceived orientation of a bar caused by an inducing grating) were used to investigate possible binding of edges across attributes. Double dissociation of tilt repulsion illusions (obtained with small orientation differences between inducer and bar) and attraction illusions (obtained with large orientation differences) suggest different mechanisms for their origins. Repulsion effects are believed to be due to processes in striate cortex and attraction because of higher level processing. The double dissociation was reproduced irrespective of the attributes used to create the inducing grating and the test-bar, suggesting that the detection and binding of edges across attributes take place in striate cortex. (2) Luminancebased illusory contour perception is another phenomenon believed to be mediated by processes in early visual cortical areas. Illusory contours can be cued by other attributes as well. Detection facilitation of a near-threshold luminous line occurred when it was superimposed on illusory contours irrespective of the attributes used as inducers. The result suggests attribute-independent activation of edge detectors, responding to real as well as illusory contours. (3) The performance in detecting snake-like shapes composed of aligned oriented elements embedded in randomly oriented noise elements was similar irrespective of the attributes used to create the elements. Performance when the attributes alternated along the path was superior to that predicted with an independent channel model. These results are discussed in terms of binding across attributes by feed-forward activation of orientation selective attribute-invariant cells (conjunction cells) in early stages of processing and contextual modulation and binding across visual space mediated by lateral and/or feedback signals from higher areas (dynamic binding).

A brief view of the problem

“Segmentation is one of the most difficult tasks in image processing” (Gonzalez & Woods, 1993, p. 413)

When we open our eyes, the visual impressions seem to impinge on us with no delay. No time consuming processes are required and no intermediate processes mediating the visual experience are revealed to us. Seeing is such an everyday experience that it may be taken-for-granted until one attempts to explain it. One fundamental task for the visual system is figure-ground segmentation where it is decided what image parts belong to the same object. This task has been shown to be extremely difficult, which stands in sharp contrast to our visual impressions of what constitutes figure and ground in normal viewing conditions. How is this computationally difficult problem solved in the brain? Visualizing an image in some way to prevent the normal interpreting processes to operate can highlight the problem. Transforming an image from the normal representation in luminance levels to a landscape representation can do this (Morgan, 1996). Figure 1a is easily seen as two trees with branches and leaves, and the depth relationships are easily recognized because the image mimics a typical retinal image resulting from natural viewing conditions. The brightness or luminance levels from Figure 1a are drawn as heights of hills across the image in Figure 1b preserving the information[1]. Nevertheless, inspection of Figure 1b shows that the visual system is unable to use this landscape representation to make a figure-ground separation and recognize the trees.

Fragments of information in images are somehow bound together in the visual system resulting in perception of shapes and patterns. There are several attributes that can be used as information-bearing media in this process. Attributes, as used throughout the text, refer to image surface properties. The most widely recognized among these are the pictorial ones (luminance-, color-, and texture-fields). Artists use spatial modulations of these attributes to mimic retinal images resulting in illusions of depth and shape on canvas. Another attribute is the motion-field in visual images, resulting from the relative motion between the eye and environmental layout. Furthermore, we are equipped with two eyes providing us with two partly overlapping, slightly different images of the surroundings. Still, we see only one fused cyclopean image from which we obtain stereoscopic depth perception.