Aetheris is a hobbyist research project and experimental implementation exploring the intersection of State Space Models (Mamba) and Mixture of Experts (MoE).
The goal of this project was to learn by doing: attempting to combine the linear-time inference of Mamba with the sparse scaling capacity of MoE from scratch in PyTorch. It is designed as a playground for understanding these modern architectures, not as a published academic paper or production-ready foundation model.
Current LLM architectures are evolving rapidly. I built Aetheris to investigate a specific question:
Can we successfully interleave Mamba blocks (for long context) with sparse MoE layers (for capacity) to train an efficient model on consumer hardware?
This project implements a hybrid architecture that attempts to:
-
Replace Attention: Use Mamba (SSM) blocks to achieve
$O(N)$ sequence scaling. - Scale Parameters Sparsely: Use MoE layers to increase model size without exploding the computational cost per token.
- Run Locally: Optimize the implementation for single-GPU training (gradient checkpointing, efficient routing).
Aetheris alternates between custom implementations of two core modules:
- SSMBlock (The Backbone): Implements the selective scan mechanism described in the Mamba paper. This handles the sequence mixing and "memory" of the model.
- SparseMoELayer (The Scaling): A router-based layer that dispatches tokens to Top-K experts (Feed-Forward Networks). This allows the model to "specialize" parts of its parameters for different types of tokens.
This code is provided for educational purposes and for others who want to experiment with hybrid architectures.
Option 1: Local Python Environment
git clone https://github.com/Pomilon-Intelligence-Lab/Aetheris.git
cd Aetheris
pip install -r requirements.txtOption 2: Docker
We provide Dockerfiles for both CPU (slim) and GPU (NVIDIA) environments.
# CPU Version
docker build -t aetheris-cpu -f Dockerfile .
docker run -p 7860:7860 aetheris-cpu
# GPU Version (Requires NVIDIA Container Toolkit)
docker build -t aetheris-gpu -f Dockerfile-nvidia .
docker run --gpus all -p 7860:7860 aetheris-gpuAetheris includes a CLI to train, inference, or serve the model.
1. Training (From Scratch)
# Trains a small model defined in configs/default.yaml
python -m aetheris.cli.main train --config configs/default.yaml2. Generation (CLI)
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir checkpoints3. API Server (OpenAI-Compatible)
Start a local API server that simulates OpenAI's chat completions endpoint.
python -m aetheris.cli.main serve --host 0.0.0.0 --port 8000You can then interact with it using standard tools:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d {
"model": "aetheris-hybrid",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}To run the test suite:
pytest tests/You can tweak the hyperparameters in configs/. I've included a "Debug" config that is small enough to train on a laptop CPU for testing the code flow.
| Config File | Description |
|---|---|
configs/default.yaml |
Standard experimental setup (requires GPU). |
configs/debug.yaml |
Tiny model (2 layers) for code debugging. |
This project is an implementation study and relies heavily on the brilliant theoretical work of others. It is not an original invention of the Mamba or MoE concepts.
- Mamba Architecture: Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752
- Mixture of Experts: Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538
- Inspiration: Jamba (AI21 Labs) and OpenMoE.
All pre-trained checkpoints are hosted on the Hugging Face Hub.
| Model Artifact | Step | Description | Download |
|---|---|---|---|
| Aetheris-Base | 17k | Early convergence checkpoint (Loss ~1.81). Good for analyzing router behavior. | 🤗 Hugging Face |
| Aetheris-Chat | -- | Coming Soon (Post-SFT) | -- |
⚠️ Important: Aetheris uses a custom Hybrid Mamba-MoE architecture. You cannot load it directly withtransformers.AutoModel. You must use the interface provided in this repository.
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pthNote: will add better inference later down the line, for now use this scuffed version. :D
Note: These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D this project was made for learning and fun!
MIT