A neural network model for adaptive timing control!

Published on October 5, 2022

Imagine you’re trying to navigate a series of obstacles, but the possibilities are endless. How do you learn through trial and error in such a complex environment? Researchers have been studying this phenomenon using a timing production task. They discovered that humans and animals adjust their behavior based on feedback to explore new possibilities. This adjustment happens at two time scales: long-term memory drifts and short-term feedback-driven adjustments. Previously, researchers described this process using a mathematical model called the reward-sensitive Gaussian process (RSGP). However, they lacked a neurobiological understanding of how these adjustments happen in the brain. In this study, scientists developed a mechanistic model using recurrent neural networks (RNNs) that simulates the process and incorporates variability and reinforcement. Unlike other models, this neural network can estimate uncertainty and distinguish between task-relevant variability and unexplained variability. This research provides insights into how our brains adjust timing based on feedback and opens up new possibilities for brain-inspired reinforcement learning in continuous state control.

How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. Positive or negative feedback was provided after each response based on the timing accuracy. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these two key features of timing variability with an augmented Gaussian process, termed reward-sensitive Gaussian process (RSGP). In a nutshell, the temporal covariance of the timing variable was updated based on the feedback history to recreate the two behavioral characteristics mentioned above. However, the RSGP was mainly descriptive and lacked a neurobiological basis of how the reward feedback can be used by a neural circuit to adjust motor variability. Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks (RNNs). While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating internal variability and external reinforcement in a Bayesian framework. We show that the model can generate the temporal structure of the motor variability as a basis for exploration and exploitation trade-off. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning (RL) in continuous state control.

Read Full Article (External Site)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>