Skip to content

Conversation

@CasualAutopsy
Copy link

@CasualAutopsy CasualAutopsy commented Nov 19, 2025

Top H's official github.
Top H's paper.


Github README Excerpt:

Top-H Decoding:

Top-H is a training-free decoding method that balances creativity and coherence in open-ended text generation by constraining entropy at each step. It solves an entropy-constrained mass maximization problem with an efficient greedy procedure, yielding robust, high-temperature generations that remain coherent.


Overview:

Classic truncated sampling (temperature, top-k, top-p, min-p) trades off diversity vs. coherence but often ignores the shape of the next-token distribution. Top-H makes this trade-off explicit by upper-bounding the entropy of the truncated distribution relative to the original model distribution — exploring more when the model is unsure, and tightening when it is confident.

At a glance:

  • Formulates Entropy-Constrained Minimum Divergence (ECMD) and proves equivalence to Entropy-Constrained Mass Maximization (ECMM) (NP-hard).
  • Introduces a greedy approximation (Top-H) with a simple termination guarantee controlled by an entropy scale α.
  • Delivers strong empirical gains over min-p and top-p, especially at higher temperatures.

Key Features

  • 🛠 Training-free & model-agnostic — drop-in decoding; no fine-tuning.
  • 🎛 Entropy-aware truncation — caps randomness via H(q) ≤ α·H(p), recalculated every step.
  • 🧮 Theory-backed — ECMD ⇔ ECMM (NP-hard); practical greedy rule with early-stop criterion.
  • 🔥 Robust at high temperature — maintains coherence where min-p/top-p degrade.
  • 🧪 Wide evaluation — creative writing (e.g., Alpaca-Eval, MT-Bench) and QA (GPQA, GSM8K).

📊 Results Summary

  • On creative writing benchmarks, Top-H outperforms SoTA alternatives by up to 25.63%, while preserving consistency.
  • On reasoning datasets (GSM8K, GPQA), Top-H remains robust at elevated temperatures.

Example (from paper)

Benchmark / Model / T min-p top-p Top-H
GSM8K — LLaMA-3.1-8B — T=2 13.72 2.65 39.35
GPQA — Phi-3-Mini — T=2 23.44 18.53 30.80

@CasualAutopsy
Copy link
Author

Can't find any more redundancies or more ways to optimize it. Done about as best I could with my skill level.

@CasualAutopsy CasualAutopsy marked this pull request as ready for review November 19, 2025 04:03
@CasualAutopsy CasualAutopsy marked this pull request as draft November 19, 2025 16:58
@CasualAutopsy
Copy link
Author

CasualAutopsy commented Nov 19, 2025

Set back to draft to test some cleaned up AI generated optimizations.

Edit: No problems during testing. Sampler should be good to go now.

@CasualAutopsy CasualAutopsy marked this pull request as ready for review November 19, 2025 17:40
@LostRuins
Copy link
Owner

tbh this looks kind of dubious and i'm very skeptical of the claims it's making. functionally I don't think it offers any benefit over min_p. Considering that there is basically zero adoption or notice of this "sampler", their github is dead and has 0 activity and 8 stars, and the math looks dodgy, I think I will hold off on it until it get some critical consideration. I don't want to add clutter.

@LostRuins LostRuins added the invalid This doesn't seem right label Nov 22, 2025
@LostRuins LostRuins closed this Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants