Skip to content
View hunter-heidenreich's full-sized avatar

Block or report hunter-heidenreich

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hunter-heidenreich/README.md

Hunter Heidenreich

Senior AI Research Scientist training large language and vision models at production scale, with research roots in scientific machine learning (Harvard).

One question runs under most of my work: how to represent data on the boundary between the continuous and the discrete. FFT frequency bins, language tokens, molecular-dynamics trajectories modeled with mixture-density heads, OCR pixels fused with text, and now SMILES strings for chemistry. The same question each time; only the substrate changes.

Selected work

Now

Foundation-model methods for the physical sciences. My active research direction is chemical language models: tokenization, pretraining, and scaling for chemistry. Pinned repositories below span scientific computing, generative modeling, and research tooling.

Pinned Loading

  1. academic-tools-mcp academic-tools-mcp Public

    MCP server giving LLM agents lean, identifier-routed tools to look up, read, and cross-reference academic papers across 7 providers (OpenAlex, arXiv, bioRxiv, ACL Anthology, Crossref, OpenCitations…

    Python 6

  2. Kabsch-Cookbook Kabsch-Cookbook Public

    Differentiable, gradient-safe Kabsch (SVD) and Horn (quaternion) point-cloud alignment across NumPy, PyTorch, JAX, TensorFlow, and MLX.

    Python 1

  3. mini-proteins mini-proteins Public

    GROMACS molecular dynamics of capped dipeptides (mini-proteins) with atomic-force extraction, built to generate ML-potential training data and encourage dataset diversity.

    Shell 2 1

  4. vae vae Public

    A clean, fully-configurable PyTorch VAE on MNIST (Kingma & Welling 2013) with three std parameterizations and a thorough gradient/latent/KL analysis suite.

    Python 3

  5. molecular-string-renderer molecular-string-renderer Public

    A python utility for rendering 2D molecular graph images given a string representation (SMILES, SELFIES)

    Python 2

  6. pytorch-word2vec pytorch-word2vec Public

    A from-scratch Word2Vec (Skip-gram / CBOW) in modern PyTorch: full softmax, hierarchical softmax, and negative sampling, with a CLI, streaming datasets, and a full test suite.

    Python