Hunter Heidenreich hunter-heidenreich

Hunter Heidenreich

Senior AI Research Scientist training large language and vision models at production scale, with research roots in scientific machine learning (Harvard).

One question runs under most of my work: how to represent data on the boundary between the continuous and the discrete. FFT frequency bins, language tokens, molecular-dynamics trajectories modeled with mixture-density heads, OCR pixels fused with text, and now SMILES strings for chemistry. The same question each time; only the substrate changes.

Selected work

GutenOCR: open-weights vision-language model family (3B and 7B) for grounded document OCR, with open training code and the 1.5M-page PubMed-OCR dataset. Apache-2.0. (Built at Roots.)
Page Stream Segmentation with LLMs: COLING 2025, industry track.
Deconstructing Recurrence, Attention, and Gating: architecture transferability for forecasting chaotic dynamical systems (Harvard).
academic-tools-mcp: an MCP server giving agents identifier-routed tools across seven academic providers.

Now

Foundation-model methods for the physical sciences. My active research direction is chemical language models: tokenization, pretraining, and scaling for chemistry. Pinned repositories below span scientific computing, generative modeling, and research tooling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hunter Heidenreich hunter-heidenreich

Achievements

Achievements

Block or report hunter-heidenreich

Hunter Heidenreich

Pinned Loading

Uh oh!