A Neuroengineer’s Guide on Training Rats to Play Doom

Viktor Tóth
Mindsoft
Published in
25 min readNov 22, 2020

Before elaborating on the how, let me briefly clarify the undertaking and the necessary hardware and software components that enable automated training of animals on complex tasks.

Project description

Train rodents to play Doom II with full cerebral control. In a VR environment, teach rats or mice to kill demons and find the exit point by decoding their motor intent and translating it into in-game actions.

I would not expect anyone, especially those from the field of neuroscience, to believe that such an arduous experiment is manageable, unless it can be realized as the result of recent scientific and technological advancements.

Project requirements

Chronic implants of a (high bandwidth) neural decoding and stimulation device, an AI agent proficient in playing Doom, a rodent-VR environment and semi-sophisticated learning models that enable decoding (easy problem) and stimulation (hard problem) of the rodent motor cortex to induce desired behavior dictated by the AI.

The big picture: AI instructs the stimulation model to induce brain activity that is associated with the right action (e.g. fire the shotgun), the brain state is decoded and the stimulation model receives a feedback on how close the stimulated action is to the one intended by the agent. The decoded action vector is translated into in-game actions. Game state is pulled from Doom and passed to the reward scheme, which portions the positive feedback for the rodent in the form of sucrose water.

Training rodents on any task involves manual supervision. As an example, in the dead simple, well-established Morris water maze task, a human assistant is required to be present to save and show the rodent a slightly submerged platform in the water tank if the subject is unable to locate it. Designing a more complicated spatial memory trial (like playing Doom) may enrich the experiment protocol to an unreasonable extent, which in turn invites failure.

The core idea is the following: automate the process by which the animal is steered towards the desired actions and follow up with automated rewards to solidify such behavior. Train a reinforcement learning (RL) model to perform the given task, and stimulate the animal if stuck to induce behavior identical to the AI’s. Reward may be given from e.g. a programmed water spout, while stimulation has to be bioelectronic and cerebral to scale well with task complexity and behavioral diversity expected from the rodents (e.g. movement, opening doors, shooting).

Outline

In the following I intend to prove the viability of this research project, outline an experimental design and dive into the following:

  • rodent VR experimental setups,
  • perceptual and cognitive limitations of rodents,
  • behavioral actions to associate with in-game actions,
  • brain interfacing hardware requirements,
  • decoding and stimulation modeling,
  • and training RL agents on Doom.

Teaching rodents to play Doom is not just a cool one-off venture that may lead to rat Twitch streamers or e-sport teams. For one, it introduces an experimental paradigm that allows for complex spatial memory and behavioral animal experiments to be conducted without overwhelming the experimentalist. Moreover, it is a proof-of-concept for the employed hardware and software (decoding, stimulation): the experiment bootstraps the subject from physical behavioral actions to develop a neural interface to perform arbitrarily intricate behavior in a virtual environment, which eventually leads to higher bandwidth, low latency communication between meat and machine. The stimulation component in combination with the AI coach enables accelerated learning, which framework in essence is translational to humans.

Bootstrapping from established behavioral actions to neural interfaces. Don’t ask the user to explicitly train the decoding model by performing dull tasks; rather, learn to associate brain signals to actions implicitly in the background, and always try to predict ahead what the user intends to do, from muscle movements to abstract actions. (The chip is just an illustration, not FDA approved yet.)

Goal-directed learning in VR could set us up on the path of building neural interfaces to our already existing devices, tools and mediums of communication.

Rodents in VR

Both rats and mice have been entertained in VR environments plenty of times [1–9]. Experiments often constrain the animal to run on a 1D corridor, but some allow movements on a 2D plane [1–5].

Rodent VR: top [7], rear view [10] illustrations and a commercial setup.

The usual setup is the following: the rodent is positioned with its head or body fixed on top of an air-cushioned spherical ball in front of a wide curved screen that should cover as much of the visual field as possible [1]. Rotation of the ball is captured by optical sensors gutted out from computer mice. Rotation signals are translated into movements in the virtual world. A water tube and optional air puffs may be arranged facing the rodent to deliver rewards and punishments respectively.

In some settings the animal is allowed to move around without its head fixed [2]. Body-fixation allows the rodent to turn its head around, which requires a wider azimuth and elevation screen coverage. Other experiments [3, 37] let the animal move and rotate freely, but they are way too complicated and a dead-end in the evolution of VR environments if you ask me. To exploit the auditory finesse of rodents, we should place stereo speakers in front, which works for both the head- and body-fixation setups.

Body-fixation is preferable over head-fixation as it introduces less stress. Illustration taken from [2].

Previous VR environments used an air-supported ball to track movements. In our case, the animal takes action in the virtual world by movement intent only. Regardless, the ball should be present at least in initial training sessions to synchronously record neural and movement activity right inside the experimental environment, which recordings may be used to train the neural intent decoding learning model.

Visual acuity

Arguing from first principles, we need to establish the capabilities of rodents to visually perceive and segment key entities in the game, demons and levers for instance, as their presence demands different behavioral response: e.g. demons are shot from the distance, levers or buttons are interfaced from up close.

Visual acuity in rodents is assessed by measuring the ability of spatial frequency discrimination given a contrast level [11], i.e. the highest number of lines the animal can differentiate in 1 degree of its visual field. Rats at best can do around 1 cycle/degree (2 lines in 1°), while mice cap at 0.5 [12] at maximum contrast. Training rats over mice has other advantages, including the fact that the rat model is just way more established in behavioral research and compare in learning performance at least on par with mice [13]. From this point on, I carry the rat model forward in my argumentation for the sake of simplicity.

Rats are dichromats (have red-green colorblindness) and have overall fainter color vision than humans do [14]. Practically, rats are colorblind and we should not rely on color vision when determining their ability to perceive entities in Doom.

To emulate the amount of information rats could visually extract from a scene, first a dichromat filter was applied on an in-game screenshot, then the color intensity was reduced by a factor of 5 to match (very handwavy) the disparity between human and rat cone to rode ratio [14], and finally Gaussian noise was added so spatial frequencies above 1 cycle/degree (cpd) got attenuated more than 95% given a 90° field of view.

Doom II is rendered with classic, low-res graphics, which fits the low spatial frequency discrimination ability of rats. Below is a video imitating what a rat would see playing the first level (MAP01) of Doom II. The horizontal field of view is set to the default 90°, but in the actual experiment the rat’s visual field should be covered as much as possible [15]: ideally 300° horizontal and 80° vertical [2], though tighter coverage have worked too before [5].

Rodents are quite sensitive to auditory cues, which should further aid them in locating demons with auxiliary environmental noise and music eliminated.

Spatial learning

Let me make this short: the area of a rat’s home can range up to 0.183 km² with lengths up to 311 m [1]. In their natural habitat, they traverse complex environments daily with ease. As a conservative comparison, take a look at a well-established, relatively complex lab maze and a VR maze, side-by-side with the first map of Doom II:

Compilation of rodent mazes: the Hampton court maze has been used extensively in behavioral studies, one of Thurley’s VR maze [16] with a branching structure, and MAP01 from Doom II. Green rooms in MAP01 are accessible without uncovering secrets or pushing buttons. Red arrows indicate the shortest path from start to finish, which really shows how little spatial navigation is necessary to complete the level. Although rats are needed to be trained on simpler maps first, successfully finishing MAP01 should be the first major milestone of the proposed project.

I think it is established: the spatial intricacy of MAP01 is comparable to mazes used previously in rodent experiments and is far simpler than the territory of a rat’s natural habitat.

Cerebral game controller

To build a neural interface on top of Doom, we need to decode intent of 2D movement and an additional behavior associated to the act of shooting. Both should be fairly accessible from the motor cortex and should not interfere with each other.

Movement

Decode locomotion speed and acceleration, combined with yaw velocity and acceleration. It has been done by fitting simple linear regression [17]: Muzzu et al. trained mice to traverse a virtual corridor allowing horizontal rotation (yaw), while recording from a 32-channel electrode array implanted in the cerebellum — not the cortex, though cortical decodings of rats have been done too in non-virtual environments, e.g. [18].

To train the decoder, behavioral and neural activity are needed to be registered simultaneously. Movement can be recorded in a free-roaming setup using either a top-view camera or a piezo pad [19]. In VR, the movements of the rat can be derived from the angular displacement of the spherical treadmill.

The aim of this research project is to build a complete neural interface to Doom. First, we need to bootstrap from actual behavioral actions, like walking on the treadmill, to initiate movements in-game. Primates have been shown to adapt and control an actuator by neural activity alone [20]. As far as I know, this form of adaptation is yet to be shown in rodents, but could be initiated by slowly diverging the control from treadmill displacements to corresponding neural activity and even provide appropriate proprioception by stimulating sensory areas to emulate sensory byproducts of movement.

Shooting

Let’s break down the act of shooting a demon from the rat’s perspective with its surrounding stimuli and successive reward in context.

The animal would not understand that shooting is an act of killing. It would probably associate the presence of a demon as a source of reward (e.g. sucrose water) and ‘shooting’ as a mean to milk that reward cow. The shot hits and thus delivers reward if the demon is at the center; one hit from the pump-action shotgun floors the imp from most distances the rat can visually cover. Punishments may be delivered in the form of reduced reward or an air puff in the rat’s face, so the subject doesn’t become comfortable of being attacked by the enemy.

Pump-action meets imp.

If the ratio of rewards and punishments is balanced and doesn’t lead to highly risk-averse behavior, the rat should learn to actively gun down demons, while avoiding being hit by them. Rats been pulling levers for decades to earn treats and stop being electroshocked, there’s nothing new here; except that we would prefer them not to shoot mindlessly at everything, which could be discouraged by a tiny air puff of punishment following missed shots. If punishment by air puffs is out question for reasons of reward scheme complexity, the traditional way of withdrawing reward can be applied: if the rat misbehaves shooting aimlessly, turn a bright led on to signal failure, and revoke the sugary water it would receive in the successive kill(s).

We still need to cover the behavioral action that the rat has to perform (or intend to perform) to shoot. Such an action should be distinguishable from walking, thus we can’t rely on leg grasping or pulling a lever, which is the go-to trained discrete behavior for rodents in experiments. Associating biting with gunning is one option: bites are simple, fast and regular just like shots need to be. Bites are egocentrically local to the visual feedback (mouth is close to the shotgun), and they should be easy to decode being a frequent behavior. Although biting is not as easy to capture in a free-roaming scenario, it’s plausible in the virtual setup by face recordings — note, we need simultaneous recordings of neural activity and the action in question to be able to train a decoding model.

Another option is nodding or rearing: raising then dropping the head to it’s original elevation. Rat posture including head positioning has been decoded [21]. The locality of the visual feedback is even more true than for biting, and nodding (neck joint movement) can be recorded both in free-roaming and VR conditions. However, a body-fixation VR setup is required to allow for nodding, and there’s a potential source of false positive detections: the rat just looking around. To avoid false positives, a nod could be classified as the raise and the drop of the head only when looking straight ahead. Overall, nods are preferable to bites, as they are easier to detect in both free-roaming and VR settings and they have been successfully decoded from the rat motor cortex.

Hardware interface

Electrodes need to be implanted chronically to allow reliable recording and stimulation for the span of weeks to months, due to the extensive training procedure involved in the proposed experiment. A wireless interface is not a must, but should simplify the experimental protocol and setup.

As mentioned, movement decoding has been done using 32-channel electrode arrays [21], involuntary rearing has been semi-accurately stimulated using a single well-placed electrode in the subthalamic nucleus [22].

The behavioral repertoire can be expanded within the same setup of rodents playing first-person (shooter) games. We may be able to build more abstract, auxiliary neural interfaces by predicting intended in-game actions from proprioception-reinforced neural activity [23] speeding up the rate of rat-to-machine communication resulting is fast-paced gameplays. The beauty of it all is that most of the work committed in this project is translational: if the developed decoding and stimulation solutions, in the form of learning models, manage to externalize brain states into in-game actions for rats, they should work on the same principles for humans. I’m not talking about transfer learning of course, I mean the transfer of model architectures. We are not that far from AIs training us in sandbox environments, accelerating learning through brain stimulation; or just from bonding with our pet hamsters over coop gaming.

Decoding models

Decoding of neural activity into behavioral actions like muscle movements is usually performed to reconstruct just a couple degrees of freedom — for instance, primates reaching for objects [24], or rodents running on corridors [4]. Kinematics such as movement velocity and acceleration, or positional and angular information of the limb in motion are often in focus. Although modeling approaches vary, only a handful of models have stood the test of time.

Traditionally, neuroscientists like to fit tuning curves on each recorded neuron, which explicitly outlines the preference of the neuron, which preference is demonstrated as a higher accompanying firing rate, for e.g. a movement direction. As was done in the work of Kennedy and Schwartz [25], tuning curves and firing rates of multiple neurons can be combined to derive a more accurate, less uncertain estimate of arm movement direction. Explicit directional tuning does not scale well with increasing degrees of freedom: try to decode a hand puppet’s range of motions by breaking it down into elemental 1D movements and then reconstruct it.

As an exotic example, Kloosterman et al. argued for skipping spike detection and decode rat position straight from spike waveform features, including peak amplitude and spike width, modeling spikes as spatial temporal Poisson processes, evaluating stochastic rates of spiking activity from the Poisson process, which is then used to derive the posterior distribution on rat position, all this from hippocampal recordings [26]. They present valid criticism of traditional spike detection algorithms and how misclassification of spikes — their presence or to which unit they belong — may lead to silent decoding errors.

Muzzu et al. fits simple linear regression to decode movement speed in a VR environment [4] taking spiking bins of 5 ms width, pre-smoothed by a Gaussian filter (σ=50 ms). In the studies I review here, spiking rates are time-binned between 5 and 64 ms. While long bins result in decoding delays, short ones introduce high variance and renders the signal very sparse at the extreme. One may cheat a little to report a small bin size and apply Gaussian smoothing on the spike counts, which may also aid the assumption of normality that follows the linear models often employed in decoding [24].

Kalman and Wiener filters are the usual suspects when it comes to neural decoding. They can incorporate spiking rates, model and propagate uncertainty over time, though with the assumptions of normally distributed spiking rates and of linearity in their transition function. Unscented Kalman filters partly overcome the issue of linearity by allowing explicit, albeit not learnt, inclusion of nonlinearity [27]. Others have cooked up models like the one named recurrent exponential family harmonium (rEFH), which can model spiking rates as Poisson, and allow ‘arbitrary’ nonlinear interactions in their huge binary latent state [24]. But then again, are spiking rates really Poisson distributed? Not really. It is also hard to untangle the underlying assumptions and just believe that the hidden dynamics can cover all kinds of nonlinearities with no drawbacks on sample efficiency or generalizability.

Barroso and others compared Wiener filters to recurrent neural networks (LSTM) in the task of decoding rat locomotion, including limb and knee angles, both models incorporating 500 ms long history of 50 ms long spike bins for each prediction [28]. Glaser et al. implemented and matched the performance of 10 kinds of decoding models [29], including Kalman, Wiener filters and an LSTM architecture, taking 700 ms long spiking rate history. Both studies concluded that LSTM networks beat everything else in decoding accuracy.

Are recurrent neural networks the way to go in the long run? If I had to bet, I’d say yes. NNs still lack the appreciation in the neural decoding field and considered more of a hack. Nevertheless, deep learning models are well-researched, accessible and very flexible. We just need to embed the appropriate inductive bias into the network that reflects the geometry of the electrode array and the likely connectivity of the recorded neurons. I’m working on one at the moment, will publish it soon. Regardless, decoding can be done with linear models; I am not so sure about stimulation though.

AI in the (closed-)loop

So far I have glossed over the mechanism of training rats to perform timely behavioral actions in VR, i.e. shooting demons and traversing the map. The latter has been researched: rats have been taught to perform goal-directed movements in virtual environments relying on visual cues and spatial learning; exploration is in their veins. However, we can’t expect the animal to perform a behavioral action, such as a bite or a nod, at the right moment facing a demon, to initiate a shot, and do that consistently over and over again so it can link the subsequent positive reward to the context and to the action performed in said context. It is a rare event to happen at random and that is what fundamentally makes this project challenging.

Unless, we can nudge the animal in an automated manner to perform the action associated with shooting at the appropriate time. Brain stimulation as a nudging instrument scales well with the complexity of the desired behavioral action, while querying a reinforcement learning AI about the optimal action scales well with the complexity of the virtual setting.

Artificial Doom Slayer

Reinforcement learning AIs have been trained on playing Doom [30, 31] and bots devoid of machine learning had been developed long before the Atari RL craze. To incorporate an RL agent as the animal’s coach, just follow the gameplay of the rat, keep feeding the AI the context, i.e. the sequence of images and decisions the rat sees and makes, and at points when the animal seems to be stuck, query the AI about the next best step and brain-stim the rat to perform it. Doom is open source for decades now, so there is no technical limitation in writing an API to access latent game states, such as the position of the player, enemies, map layout, etc.

AutoDOOM bot in action.

In our stripped down version of Doom (MAP01 with a pump-action shotgun), the AI could inform the rat in which direction to move and whether to shoot. It’s possible and likely desirable to teach movement in the virtual environment without applying brain stimulation at first. The right movement can be simply demonstrated by forcibly rotating the spherical treadmill using some wheels and servomotors — as done in [5] though manually. It’s one of the charms of this research project that it can be implemented with complexities introduced in small increments, always having a plan B for the ambitious components.

In case the exploration of the map is already taken care of by the combination of the innate behavior of the rat and rotating a ball under its feet, then it only needs to be brain stimulated to shoot when necessary. Strictly speaking, such limited assistance can be accomplished by a simple game mod that uses latent game state information to locate nearby enemies, derive yaw movements to turn towards the demons then signal to shoot. Such a solution would not scale to more complex action sequences and would have to be hardcoded, though it would probably be more predictable and preferable at first.

Nevertheless, later on RL agents may be used to guide animals in more complex settings where weapons have to be swapped, levers pulled, keycards and items like the rad suit need to be picked up. Guiding the subject with AIs trained on different reward schemes can be another interesting avenue of research: train a more pacific or a Rip&Tear-esque rat by issuing different amounts of reward for killing demons.

RL agent playing deathmatch relying purely on visual pixel-level information.

Matching the reward scheme of the RL agent and the rat (administration of sucrose water) is likely beneficial to deliver consistent guidance, which if followed, results in positive feedback for the animal. Testing reward schemes in-silico on RL agents first could further help in knocking out schemes that might lead the rat to local minima; e.g. shooting endlessly without an enemy in sight may be remedied by a small but ever increasing punishment (air puff in the face) as the consecutive missed shots pile up.

Reward designs are difficult to get right, especially when combining rewards and punishments [32], with varying patterns of performance [33], and it could be a serious time-sink if misjudged. For instance, we don’t want the rat to get discouraged from shooting because of initial negative rewards. As we have a complete recording of the rat’s actions in VR, we may employ an adaptive reward scheme that increases or decreases the amplitude of the delivered reward or punishment to encourage behavior that the particular animal lacks, though the efficacy of this solution is very much uncertain.

Stimulation modeling

Brain stimulation is hard. First, we should stick to inducing relatively simple and fast behavioral actions, such that the stimulation sequence remains short; hence the proposed biting or nodding. While decoding brain states is essentially a supervised learning or fitting problem, stimulation is a reinforcement learning or control problem. Control theory mostly deals with systems that can be described in differential equations, i.e. we need a mathematical model of how stimulation sequences influence the outcome, which is the behavioral action. That’s just not possible: the ever shifting nonlinear dynamical interactions between millions of neurons cannot be modeled analytically.

Reinforcement learning methods overcome said constraint by optimizing a policy on rewards that are detached from the system under control and can be defined externally. So let’s say we need to stimulate the rat to nod. The stimulation model would select the right set of electrodes and stimulation parameters (amplitude, frequency, waveform, etc.) by taking the current neural state of the animal as input. The rat’s head movement is tracked and if it nods, we assign a positive reward to the stimulation sequence leading up to the nod and update our policy to prefer such stimulations.

AI nudging the animal to turn left and shoot. The involuntary turn is enforced by a servo motor and a wheel, while the shot (nod) is suggested by the AI and imposed by the stimulation model. The decoder detects the neural correlates of the nod, which is translated to an in-game shot. The successful shot is rewarded with sugar water to reinforce the preceding behavior.

When the decoding model reaches a reliable level of accuracy, we may come full circle and judge the outcome of the stimulation by using the behavioral state predictions of the decoder. The decoding model could record downstream, subsequent neural activity, and if it detects a nod, then the stimulation is deemed to be successful. Getting the feedback from the decoder is not as trivial, as stimulation artifacts may pollute the recorded signal, which artifacts have to be removed by either time-multiplexing stimulation and decoding, or by modeling the electrode-to-electrode stimulation artifacts and explicitly remove them [34, 35].

Stimulation models are to be trained in both free roam and VR environments: apply short stimulation sequences at random times, record the behavioral and neural activity, and reward the stim model if the induced physical response is similar to the desired one, and/or if downstream neural activity corresponds to the activity associated with the desired behavior.

Training regime

I‘ve done some literature review and consulted some of my rodent experimentalist friends to lay out an appropriate training procedure for the proposed experiment.

Thirst is a huge motivator. To build motivation in rodents, water deprivation period is employed starting a couple days (5 days in [8]) before the experiments begin. Without water restriction rats can be motivated with sucrose water or even with regular water if the otherwise available water is unpalatable, e.g. water with citric acid [36]. However, water restricted rats, when given the choice to attend trials, tend to have higher trial rates, which shows a higher inclination to perform [36].

Body weight is measured daily and usually kept between 80–95% [5, 8, 17] of the original weight in the water deprivation paradigm. Water rewards can be as small as 20 μl [36], while Young et al. gave 100 μl of chocolate milk each time their mice approached the correct visual target in VR [5]. Rats can maintain health on 12–13 ml of daily water intake and are satiated at 20–24 ml. At reward sizes of 20 μl, rats should at least be thirsty for 600 instances of rewards a day, if their exclusive source of liquid is supplied during training.

Rodents are first acclimated to the virtual environment with their body or head fixed standing on the ball and getting use to slurping water from a software triggered solenoid valve. The acclimation period can range from 2 days [5, 17] to a couple weeks [15]. The screen may be left turned off [5, 17] or on [8] for the duration of this period. Rodents can be motivated to try and move forward on the ball with small water rewards [17].

Training sessions are run once or twice a day. Session length depends on the specifics of the performed experiment, but it usually does not exceed an hour: studies report 10 [15], 30 [5], 45 [8] or 50 minutes [17]. After a few training sessions (3 days in [5]) the experimenter should be able to tell the animal’s capacity or willingness to operate the VR setup, and remove the unfit ones before electrode implantation to save resources.

My genius harness/hammock design adapted to let rats control the game by movement intent only. Fortunately, I’m not the first to fashion a harness for rodents [5, 15].

Map progression

Not to overwhelm the rats, they should be introduced to game mechanics in increments. Custom Doom maps should be built to introduce movement, doors, demons and their behavior (walking around, attacking), before graduating to a full-fledged MAP01 with enemies. It’s preferable to use the textures and the overall structure of MAP01 to design simpler versions of it with fewer distractions and objects at first. Some randomization is necessary though, so the rats can generalize and are pressured to build an understanding of the virtual space they are projected into.

Maps of increasing complexity.
  1. Practice simple movement: corridor with a single right or left 90° turn taken from MAP01 leading to an exit door that opens to a small room with a button. Door is opened by the rat bumping into it, then it has to walk straight ahead and bump into the button too to finish the map. This task teaches movement and reinforces the exit door stimulus with rewards at arrival and when pushing the button.
  2. Practice turns and exploration of larger spaces: corridor with multiple random turns, leading to a rectangle room with an exit door placed randomly. Reward is given when the rat passes the corridor section, reaches the door and when the button is pressed, same as for the previous task. This map can be made more complex including more turns and even a dead-end (panel 2.2 above).
  3. Practice shooting: same maps as before but demons (imps) positioned randomly along the corridor or in the rectangle room. First, place them in the middle of the corridor so the rat can’t pass without shooting them, then in the room in front or around the exit door. At this stage, the imps can’t move or attack. Extra, juicy rewards shall follow the death of demons.
  4. Hitting moving targets: same map design and demon placement, but the demons move. At this point the rat should be thirsty for some kills.
  5. Hitting live demons: now demons can attack, and a hit taken results in either reduced subsequent rewards, or an air puff in the face.
  6. MAP01: the full-fledged entry Doom II map.

Once we can establish a high bandwidth interface and reliably record neural activity that’s representative of complex behavioral actions, we have to get rid off the old habits of neuroscientific research: engineering simple linear tasks to slowly learn to decode singular dimensions of e.g. movement. When we enforce our own envisioned behavioral primitives on the brain, we are doomed to arrive at suboptimal decoding solutions; we just can’t decode nuanced intent if we explicitly engineer the underlying primitives instead of implicitly learning them.

Promising next steps may be to marry machine learning with brain interfacing, record uninhibited behavior and match it to the preceding neural activity. The good news is that the brain will likely adapt to slight decoding errors if it is supplied immediate feedback on the decoded action in the form of external stimuli or brain stimulation. Video games are ideal to test such a paradigm, as 1) they are designed around feedback loops, 2) they captivate the player to an extent that a fluent, pro play becomes rewarding in itself (at least for humans), which augments learning and thus brain adaptation, and 3) there’s just an abundance of video games to try.

In essence, the proposed experimental setup is a testbed for brain decoding and stimulation solutions in a rodent model. The subject can explore highly motivated, risk-free, goal-directed behavior of varying kinds and complexity in virtual worlds, which behavior can be accurately recorded and matched with neural activity. Video games rely on all kinds of cognitive faculty, and they engage the two major senses of vision and hearing. So if we aim to decode and (through decoding) map sensory regions, motor intent, memory consolidation/recall, planning, etc. in action, within a single setup, then training animals to play computer games seems like a solid bet. Game AI assisted brain stimulation enables such a training process through invoking the right behavior to be reinforced under positive feedback.

References

[1] C. Hölscher, A. Schnee, H. Dahmen, L. Setia, and H. A. Mallot, “Rats are able to navigate in virtual environments,” Journal of Experimental Biology, vol. 208, no. 3, pp. 561–569, Feb. 2005, doi: 10.1242/jeb.01371.[2] K. Thurley and A. Ayaz, “Virtual reality systems for rodents,” Curr Zool, vol. 63, no. 1, pp. 109–119, Feb. 2017, doi: 10.1093/cz/zow070.[3] J. R. Stowers et al., “Virtual reality for freely moving animals,” Nature Methods, vol. 14, no. 10, Art. no. 10, Oct. 2017, doi: 10.1038/nmeth.4399.[4] T. Muzzu, S. Mitolo, G. P. Gava, and S. R. Schultz, “Encoding of locomotion kinematics in the mouse cerebellum,” PLOS ONE, vol. 13, no. 9, p. e0203900, Sep. 2018, doi: 10.1371/journal.pone.0203900.[5] B. K. Young, J. N. Brennan, P. Wang, and N. Tian, “Virtual reality method to analyze visual recognition in mice,” PLoS One, vol. 13, no. 5, p. e0196563, 2018, doi: 10.1371/journal.pone.0196563.[6] D. A. Dombeck, C. D. Harvey, L. Tian, L. L. Looger, and D. W. Tank, “Functional imaging of hippocampal place cells at cellular resolution during virtual navigation,” Nat Neurosci, vol. 13, no. 11, pp. 1433–1440, Nov. 2010, doi: 10.1038/nn.2648.[7] M. Sato, M. Kawano, K. Mizuta, T. Islam, M. G. Lee, and Y. Hayashi, “Hippocampus-Dependent Goal Localization by Head-Fixed Mice in Virtual Reality,” eNeuro, vol. 4, no. 3, May 2017, doi: 10.1523/ENEURO.0369–16.2017.[8] C. D. Harvey, F. Collman, D. A. Dombeck, and D. W. Tank, “Intracellular dynamics of hippocampal place cells during virtual navigation,” Nature, vol. 461, no. 7266, Art. no. 7266, Oct. 2009, doi: 10.1038/nature08499.[9] M. Leinweber et al., “Two-photon Calcium Imaging in Mice Navigating a Virtual Reality Environment,” J Vis Exp, no. 84, Feb. 2014, doi: 10.3791/50885.[10] D. A. Dombeck and M. B. Reiser, “Real neuroscience in virtual worlds,” Current Opinion in Neurobiology, vol. 22, no. 1, pp. 3–10, Feb. 2012, doi: 10.1016/j.conb.2011.10.015.[11] G. T. Prusky and R. M. Douglas, “Characterization of mouse cortical spatial vision,” Vision Research, vol. 44, no. 28, pp. 3411–3418, Dec. 2004, doi: 10.1016/j.visres.2004.09.001.[12] G. T. Prusky, P. W. R. West, and R. M. Douglas, “Behavioral assessment of visual acuity in mice and rats,” Vision Research, vol. 40, no. 16, pp. 2201–2209, Jul. 2000, doi: 10.1016/S0042–6989(00)00081-X.[13] S. Jaramillo and A. M. Zador, “Mice and rats achieve similar levels of performance in an adaptive decision-making task,” Front Syst Neurosci, vol. 8, Sep. 2014, doi: 10.3389/fnsys.2014.00173.[14] G. H. Jacobs, J. A. Fenwick, and G. A. Williams, “Cone-based vision of rats for ultraviolet and visible lights,” J Exp Biol, vol. 204, no. Pt 14, pp. 2439–2446, Jul. 2001.[15] C. Hölscher, A. Schnee, H. Dahmen, L. Setia, and H. A. Mallot, “Rats are able to navigate in virtual environments,” J Exp Biol, vol. 208, no. Pt 3, pp. 561–569, Feb. 2005, doi: 10.1242/jeb.01371.[16] K. Thurley et al., “Mongolian gerbils learn to navigate in complex virtual spaces,” Behavioural Brain Research, vol. 266, pp. 161–168, Jun. 2014, doi: 10.1016/j.bbr.2014.03.007.[17] T. Muzzu, S. Mitolo, G. P. Gava, and S. R. Schultz, “Encoding of locomotion kinematics in the mouse cerebellum,” PLOS ONE, vol. 13, no. 9, p. e0203900, Sep. 2018, doi: 10.1371/journal.pone.0203900.[18] B. Mimica, B. A. Dunn, T. Tombaz, V. P. T. N. C. S. Bojja, and J. R. Whitlock, “Efficient cortical coding of 3D posture in freely behaving rats,” Science, vol. 362, no. 6414, pp. 584–589, Nov. 2018, doi: 10.1126/science.aau2013.[19] M. I. Carreño-Muñoz et al., “Detecting fine and elaborate movements with piezo sensors, from heartbeat to the temporal organization of behavior,” bioRxiv, p. 2020.04.03.024711, Apr. 2020, doi: 10.1101/2020.04.03.024711.[20] M. A. Lebedev et al., “Cortical Ensemble Adaptation to Represent Velocity of an Artificial Actuator Controlled by a Brain-Machine Interface,” J. Neurosci., vol. 25, no. 19, pp. 4681–4693, May 2005, doi: 10.1523/JNEUROSCI.4088–04.2005.[21] B. Mimica, B. A. Dunn, T. Tombaz, V. P. T. N. C. S. Bojja, and J. R. Whitlock, “Efficient cortical coding of 3D posture in freely behaving rats,” Science, vol. 362, no. 6414, pp. 584–589, Nov. 2018, doi: 10.1126/science.aau2013.[22] A. Huotarinen, S. Leino, R. K. Tuominen, and A. Laakso, “Rat subthalamic stimulation: Evaluating stimulation-induced dyskinesias, choosing stimulation currents and evaluating the anti-akinetic effect in the cylinder test,” MethodsX, vol. 6, pp. 2384–2395, Oct. 2019, doi: 10.1016/j.mex.2019.10.012.[23] M. C. Dadarlat, J. E. O’Doherty, and P. N. Sabes, “A learning-based approach to artificial sensory feedback leads to optimal integration,” Nature Neuroscience, vol. 18, no. 1, Art. no. 1, Jan. 2015, doi: 10.1038/nn.3883.[24] J. G. Makin, J. E. O’Doherty, M. M. B. Cardoso, and P. N. Sabes, “Superior arm-movement decoding from cortex with a new, unsupervised-learning algorithm,” J. Neural Eng., vol. 15, no. 2, p. 026010, Jan. 2018, doi: 10.1088/1741–2552/aa9e95.[25] S. D. Kennedy and A. B. Schwartz, “Distributed processing of movement signaling,” PNAS, vol. 116, no. 52, pp. 26266–26273, Dec. 2019, doi: 10.1073/pnas.1902296116.[26] F. Kloosterman, S. P. Layton, Z. Chen, and M. A. Wilson, “Bayesian decoding using unsorted spikes in the rat hippocampus,” J Neurophysiol, vol. 111, no. 1, pp. 217–227, Jan. 2014, doi: 10.1152/jn.01046.2012.[27] Z. Li, J. E. O’Doherty, T. L. Hanson, M. A. Lebedev, C. S. Henriquez, and M. A. L. Nicolelis, “Unscented Kalman Filter for Brain-Machine Interfaces,” PLOS ONE, vol. 4, no. 7, p. e6243, Jul. 2009, doi: 10.1371/journal.pone.0006243.[28] F. O. Barroso et al., “Decoding neural activity to predict rat locomotion using intracortical and epidural arrays,” J Neural Eng, vol. 16, no. 3, p. 036005, 2019, doi: 10.1088/1741–2552/ab0698.[29] J. I. Glaser, A. S. Benjamin, R. H. Chowdhury, M. G. Perich, L. E. Miller, and K. P. Kording, “Machine Learning for Neural Decoding,” eNeuro, vol. 7, no. 4, Aug. 2020, doi: 10.1523/ENEURO.0506–19.2020.[30] M. Wydmuch, M. Kempka, and W. Jaśkowski, “ViZDoom Competitions: Playing Doom from Pixels,” arXiv:1809.03470 [cs, stat], Sep. 2018, Accessed: Nov. 08, 2020. [Online]. Available: http://arxiv.org/abs/1809.03470.[31] G. Lample and D. S. Chaplot, “Playing FPS Games with Deep Reinforcement Learning,” arXiv:1609.05521 [cs], Jan. 2018, Accessed: Nov. 08, 2020. [Online]. Available: http://arxiv.org/abs/1609.05521.[32] V. Neville, J. King, I. D. Gilchrist, P. Dayan, E. S. Paul, and M. Mendl, “Reward and punisher experience alter rodent decision-making in a judgement bias task,” Scientific Reports, vol. 10, no. 1, Art. no. 1, Jul. 2020, doi: 10.1038/s41598–020–68737–1.[33] Z. C. Ashwood et al., “Mice alternate between discrete strategies during perceptual decision-making,” bioRxiv, p. 2020.10.19.346353, Oct. 2020, doi: 10.1101/2020.10.19.346353.[34] P. Sabes and J. O’Doherty, “Removal of Stimulation Artifact in Multi-Channel Neural Recordings,” US20200129766A1, Apr. 30, 2020.[35] P. N. Sabes, J. O’Doherty, and T. Hanson, “Restoring Proprioception via a Cortical Prosthesis: A Novel Learning-Based Approach,” University of California, San Francisco San Francisco United States, Oct. 2015. Accessed: Nov. 08, 2020. [Online]. Available: https://apps.dtic.mil/sti/citations/AD1002593.[36] P. Reinagel, “Training Rats Using Water Rewards Without Water Restriction,” Front Behav Neurosci, vol. 12, May 2018, doi: 10.3389/fnbeh.2018.00084.[37] N. A. D. Grosso, J. J. Graboski, W. Chen, E. Blanco-Hernández, and A. Sirota, “Virtual Reality system for freely-moving rodents,” bioRxiv, p. 161232, Jul. 2017, doi: 10.1101/161232.

--

--