A computational neuroscience–inspired CartPole agent built without neural networks, gradients, or backpropagation — only biologically plausible learning rules.
This project explores a fundamental question in computational neuroscience:
Can meaningful behavior emerge from simple biological learning rules alone?
Instead of artificial neural networks, the agent relies on mechanisms inspired by the brain:
- Membrane-potential–based action selection
- Dopamine-like reward prediction errors
- Local Hebbian/TD synaptic updates
- Sleep-based synaptic downscaling for stability
The goal is not algorithmic performance, but interpretability and biological realism in reinforcement learning.
The agent uses a small 4×2 synaptic weight matrix to map sensory inputs to motor outputs.
Membrane potential for each action:
Action is chosen through a winner-take-all mechanism:
This mirrors competitive action selection in basal ganglia circuits.
Learning is driven by a dopamine-like scalar signal:
Synaptic updates occur only on the active input-to-action pathway:
This combination of local synaptic eligibility + global dopamine modulation reflects key biological credit-assignment principles.
Every 100 episodes, the agent enters a simulated “sleep” stage.
Weak synapses are pruned:
Inspired by the Synaptic Homeostasis Hypothesis (SHY), this prevents weight explosion, reduces noise, and supports long-term stability.
- Environment: Gymnasium CartPole-v1
- Learning: dopamine-modulated TD update
- Action selection: membrane potentials + argmax
- Regularization: synaptic pruning during sleep
- Stability: weight clipping between –5 and +5
This system is:
- Lightweight
- Fully interpretable
- CPU-friendly (no GPU required)
- Neuroscience-inspired rather than algorithm-driven
The learning curve shows:
- Raw episode returns (grey)
- Smoothed reward trajectory (blue)
- Success threshold (green dashed)
- Sleep cycles (purple vertical lines)
Behavior reflects noisy but adaptive biological learning, rather than clean optimization.
The final 4×2 weight matrix reveals how each sensory input influences the two actions.