Finding Clarity in a Chaotic World: How a Network Learns to Recognize Objects and Generate Invariant Representations

Published on September 25, 2023

Imagine you’re trying to make sense of a jigsaw puzzle that keeps changing its pieces and colors. It’s a tough task, but your brain is up for the challenge! In a similar way, the ventral visual processing hierarchy of the cortex handles the complex job of perceiving objects under different conditions and learning how to fill in missing information. A recent study reveals how a multilayered predictive coding network accomplishes this feat by minimizing prediction errors locally. By training on sequences of transforming objects, the highest area of the network becomes finely tuned to recognize object identity, no matter their precise positions – just like specialized neurons in macaque monkeys’ brains. What’s more, this network also replicates the observed hierarchical timescales of visual processing. The study suggests that the faster decorrelation of error neurons compared to representation neurons may provide insights into the neural correlates of prediction errors. Additionally, the network boasts impressive generative capabilities, able to reconstruct object images even when parts are occluded. This research offers an exciting step forward in our understanding of how neural networks learn and perceive objects in a dynamic world. To explore the full study, check out the link below!

The ventral visual processing hierarchy of the cortex needs to fulfill at least two key functions: perceived objects must be mapped to high-level representations invariantly of the precise viewing conditions, and a generative model must be learned that allows, for instance, to fill in occluded information guided by visual experience. Here, we show how a multilayered predictive coding network can learn to recognize objects from the bottom up and to generate specific representations via a top-down pathway through a single learning rule: the local minimization of prediction errors. Trained on sequences of continuously transformed objects, neurons in the highest network area become tuned to object identity invariant of precise position, comparable to inferotemporal neurons in macaques. Drawing on this, the dynamic properties of invariant object representations reproduce experimentally observed hierarchies of timescales from low to high levels of the ventral processing stream. The predicted faster decorrelation of error-neuron activity compared to representation neurons is of relevance for the experimental search for neural correlates of prediction errors. Lastly, the generative capacity of the network is confirmed by reconstructing specific object images, robust to partial occlusion of the inputs. By learning invariance from temporal continuity within a generative model, the approach generalizes the predictive coding framework to dynamic inputs in a more biologically plausible way than self-supervised networks with non-local error-backpropagation. This was achieved simply by shifting the training paradigm to dynamic inputs, with little change in architecture and learning rule from static input-reconstructing Hebbian predictive coding networks.

Read Full Article (External Site)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>