Experiments on a compact artificial token language (ATL) trained with SentencePiece and evaluated against GPT-style BPE for efficiency and reasoning.
- ATL-512 increased token density by ~112% vs. cl100k_base on WikiText-103 (0.543 vs. 0.256 tokens/char).
- ATL-coded GSM8K prompts were ~3× longer than English (274 vs. 92 tokens on average).
- Using a small local model (distilgpt2), reasoning accuracy was 0/12 for both English and ATL; no evidence of quality preservation.
- Conclusion: this ATL configuration hurts efficiency; better coding and stronger models are needed.
- Ensure Python 3.12+ and
uvare available. - From repo root:
uv venv && source .venv/bin/activate - Install deps:
uv sync(orpip install -r requirements.txt). - Run experiments:
python notebooks/atl_experiments.py - Outputs: metrics in
results/, plots inresults/plots/.
planning.md— research plan.notebooks/atl_experiments.py— tokenizer training, efficiency stats, reasoning eval.results/— CSV/JSON metrics and plots.artifacts/atl_512.model|vocab— trained ATL tokenizer.REPORT.md— full report with methods and findings.
See REPORT.md for detailed analysis and discussion.