v0.1.0 (2025-02-12)

Feature

feat: pypi packaging and auto-release with semantic release (0ff8888)

Unknown

Merge pull request #37 from chanind/pypi-package

feat: pypi packaging and auto-release with semantic release (a711efe)

simplify matryoshka loss (43421f5)
Use torch.split() instead of direct indexing for 25% speedup (505a445)
Fix matryoshka spelling (aa45bf6)
Fix incorrect auxk logging name (784a62a)
Add citation (77f2690)
Make sure to detach reconstruction before calculating aux loss (db2b564)
Merge pull request #36 from saprmarks/aux_loss_fixes

Aux loss fixes, standardize decoder normalization (34eefda)

Standardize and fix topk auxk loss implementation (0af1971)
Normalize decoder after optimzer step (200ed3b)
Remove experimental matroyshka temperature (6c2fcfc)
Make sure x is on the correct dtype for jumprelu when logging (c697d0f)
Import trainers from correct relative location for submodule use (8363ff7)
By default, don't normalize Gated activations during inference (52b0c54)
Also update context manager for matroyshka threshold (65e7af8)
Disable autocast for threshold tracking (17aa5d5)
Add torch autocast to training loop (832f4a3)
Save state dicts to cpu (3c5a5cd)
Add an option to pass LR to TopK trainers (8316a44)
Add April Update Standard Trainer (cfb36ff)
Merge pull request #35 from saprmarks/code_cleanup

Consolidate LR Schedulers, Sparsity Schedulers, and constrained optimizers (f19db98)

Consolidate LR Schedulers, Sparsity Schedulers, and constrained optimizers (9751c57)
Merge pull request #34 from adamkarvonen/matroyshka

Add Matroyshka, Fix Jump ReLU training, modify initialization (92648d4)

Add a verbose option during training (0ff687b)
Prevent wandb cuda multiprocessing errors (370272a)
Log dead features for batch top k SAEs (936a69c)
Log number of dead features to wandb (77da794)
Add trainer number to wandb name (3b03b92)
Add notes (810dbb8)
Add option to ignore bos tokens (c2fe5b8)
Fix jumprelu training (ec961ac)
Use kaiming initialization if specified in paper, fix batch_top_k aux_k_alpha (8eaa8b2)
Format with ruff (3e31571)
Add temperature scaling to matroyshka (ceabbc5)
norm the correct decoder dimension (5383603)
Fix loading matroyshkas from_pretrained() (764d4ac)
Initial matroyshka implementation (8ade55b)
Make sure we step the learning rate scheduler (1df47d8)
Merge pull request #33 from saprmarks/lr_scheduling

Lr scheduling (316dbbe)

Properly set new parameters in end to end test (e00fd64)
Standardize learning rate and sparsity schedules (a2d6c43)
Merge pull request #32 from saprmarks/add_sparsity_warmup

Add sparsity warmup (a11670f)

Add sparsity warmup for trainers with a sparsity penalty (911b958)
Clean up lr decay (e0db40b)
Track lr decay implementation (f0bb66d)
Remove leftover variable, update expected results with standard SAE improvements (9687bb9)
Merge pull request #31 from saprmarks/add_demo

Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE (67a7857)

Also scale topk thresholds when scaling biases (efd76b1)
Use the correct standard SAE reconstruction loss, initialize W_dec to W_enc.T (8b95ec9)
Add bias scaling to topk saes (484ca01)
Fix topk bfloat16 dtype error (488a154)
Add option to normalize dataset activations (81968f2)
Remove demo script and graphing notebook (57f451b)
Track thresholds for topk and batchtopk during training (b5821fd)
Track threshold for batchtopk, rename for consistency (32d198f)
Modularize demo script (dcc02f0)
Begin creation of demo script (712eb98)
Fix JumpReLU training and loading (552a8c2)
Ensure activation buffer has the correct dtype (d416eab)
Merge pull request #30 from adamkarvonen/add_tests

Add end to end test, upgrade nnsight to support 0.3.0, fix bugs (c4eed3c)

Merge pull request #26 from mntss/batchtokp_aux_fix

Fix BatchTopKSAE training (2ec1890)

Check for is_tuple to support mlp / attn submodules (d350415)
Change save_steps to a list of ints (f1b9b80)
Add early stopping in forward pass (05fe179)
Obtain better test results using multiple batches (067bf7b)
Fix frac_alive calculation, perform evaluation over multiple batches (dc30720)
Complete nnsight 0.2 to 0.3 changes (807f6ef)
Rename input to inputs per nnsight 0.3.0 (9ed4af2)
Add a simple end to end test (fe54b00)
Create LICENSE (32fec9c)
Fix BatchTopKSAE training (4aea538)
dtype for loading SAEs (932e10a)
Merge pull request #22 from pleask/jumprelu

Implement jumprelu training (713f638)

Merge branch 'main' into jumprelu (099dbbf)
Merge pull request #21 from pleask/separate-wandb-runs

Use separate wandb runs for each SAE being trained (df60f52)

Merge branch 'main' into jumprelu (3dfc069)
implement jumprelu training (16bdfd9)
handle no wandb (8164d32)
Merge pull request #20 from pleask/batchtopk

Implement BatchTopK (b001fb0)

separate runs for each sae being trained (7d3b127)
add batchtopk (f08e00b)
Move f_gate to encoder's dtype (43bdb3b)
Ensure that x_hat is in correct dtype (3376f1b)
Preallocate buffer memory to lower peak VRAM usage when replenishing buffer (90aff63)
Perform logging outside of training loop to lower peak memory usage (57f8812)
Remove triton usage (475fece)
Revert to triton TopK implementation (d94697d)
Add relative reconstruction bias from GDM Gated SAE paper to evaluate() (8984b01)
git push origin main:Merge branch 'ElanaPearl-small_bug_fixes' into main (2d586e4)
simplifying readme (9c46e06)
simplify readme (5c96003)
add missing imports (7f689d9)
fix arg name in trainer_config (9577d26)
update sae training example code (9374546)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (7d405f7)
GatedSAE: moved feature re-normalization into encode (f628c0e)
documenting JumpReLU SAE support (322b6c0)
support for JumpReluAutoEncoders (57df4e7)
Add submodule_name to PAnnealTrainer (ecdac03)
host SAEs on huggingface (0ae37fe)
fixed batch loading in examine_dimension (82485d7)
Merge pull request #17 from saprmarks/collab

Merge Collab Branch (cdf8222)

moved experimental trainers to collab-dev (8d1d581)
Merge branch 'main' into collab (dda38b9)
Update README.md (4d6c6a6)
remove a sentence (2d40ed5)
add a list of trainers to the README (746927a)
add architecture details to README (60422a8)
make wandb integration optional (a26c4e5)
make wandb integration optional (0bdc871)
Fix tutorial 404 (deb3df7)
Add missing values to config (9e44ea9)
changed TrainerTopK class name (c52ff00)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (c04ee3b)
fixed loss_recovered to incorporate top_k (6be5635)
fixed TopK loss (spotted by Anish) (a3b71f7)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (40bcdf6)
naming conventions (5ff7fa1)
small fix to triton kernel (5d21265)
small updates for eval (585e820)
added some housekeeping stuff to top_k (5559c2c)
add support for Top-k SAEs (2d549d0)
add transcoder eval (8446f4f)
add transcoder support (c590a25)
added wandb finish to trainer (113c042)
fixed anneal end bug (fbd9ee4)
added layer and lm_name (d173235)
adding layer and lm_name to trainer config (6168ee0)
make tracer_args optional (31b2828)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (87d2b58)
bug fix evaluating CE loss with NNsight models (f8d81a1)
Combining P Annealing and Anthropic Update (44318e9)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (43e9ca6)
removing normalization (7a98d77)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (5f2b598)
added buffer for NNsight models (not LanguageModel classes) as an extra class. We'll want to combine the three buffers wo currently have at some point (f19d284)
fixed nnsight issues model tracing for chess-gpt (7e8c9f9)
added W_O projection to HeadBuffer (47bd4cd)
added support for training SAEs on individual heads (a0e3119)
added support for training SAEs on individual heads (47351b4)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (7de0bd3)
default hyperparameter adjustments (a09346b)
normalization in gated_new (104aba2)
fixing bug where inputs can get overwritten (93fd46e)
fixing tuple bug (b05dcaf)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (73b5663)
multiple steps debugging (de3eef1)
adding gradient pursuit function (72941f1)
bugfix (53aabc0)
bugfix (91691b5)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (9ce7d80)
logging more things (8498a75)
changing initialization for AutoEncoderNew (c7ee7ec)
fixing gated SAE encoder scheme (4084bc3)
changes to gatedSAE API (9e001d1)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (05b397b)
changing initialization (ebe0d57)
finished combining gated and p-annealing (4c08614)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (8e0a6f9)
gated_anneal first steps (ba8b8fa)
jump SAE (873b764)
adapted loss logging in p_anneal (33997c0)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (1eecbda)
merging gated and Anthropic SAEs (b6a24d0)
revert trainer naming (c0af6d9)
restored trainer naming (2ec3c67)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (fe7e93b)
various changes (32027ae)
debug panneal (463907d)
debug panneal (8c00100)
debug panneal (dc632cd)
debug panneal (166f6a9)
debug panneal (bcebaa6)
debug pannealing (446c568)
p_annealing loss buffer (e4d4a35)
implement Ben's p-annealing strategy (06a27f0)
panneal changes (fe4ff6f)
logging trainer names to wandb (f9c5e45)
bugfixes for StandardTrainerNew (70acd85)
trainer for new anthropic infrastructure (531c285)
adding r_mag parameter to GSAE (198ddf4)
gatedSAE trainer (3567d6d)
cosmetic change (0200976)
GatedAutoEncoder class (2cfc47b)
p annealing not affected by resampling (ad8d837)
integrated trainer update (c7613d3)
Merge branch 'collab' into p_annealing (933b80c)
fixed p calculation (9837a6f)
getting rid of useless seed arguement (377c762)
trainer initializes SAE (7dffb66)
trainer initialized SAE (6e80590)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (c58d23d)
changes to lista p_anneal trainers (3cc6642)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (9dfd3db)
decoupled lr warmup and p warmup in p_anneal trainer (c3c1645)
Merge pull request #14 from saprmarks/p_annealing

added annealing and trainer_param_callback (61927bc)

cosmetic changes to interp (4a7966f)
Merge branch 'collab' of https://github.com/saprmarks/dictionary_learning into collab (c76818e)
Merge pull request #13 from jannik-brinkmann/collab

add ListaTrainer (d4d2fd9)

additional evluation metrics (fa2ec08)
add GroupSAETrainer (60e6068)
added annealing and trainer_param_callback (18e3fca)
Merge remote-tracking branch 'upstream/collab' into collab (4650c2a)
fixing neuron resampling (a346be9)
improvements to saving and logging (4a1d7ae)
can export buffer config (d19d8d9)
fixing evaluation.py (c91a581)
fixing bug in neuron resampling (67a03c7)
add ListaTrainer (880f570)
fixing neuron resampling in standard trainer (3406262)
improvements to training and evaluating (b111d40)
Factoring out SAETrainer class (fabd001)
updating syntax for buffer (035a0f9)
updating readme for from_pretrained (70e8c2a)
from_pretrained (db96abc)
Change syntax for specifying activation dimensions and batch sizes (bdf1f19)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (86c7475)
activation_dim for IdentityDict is optional (be1b68c)
update umap requirement (776b53e)
Merge pull request #10 from adamkarvonen/shell_script_change

Add sae_set_name to local_path for dictionary downloader (33b5a6b)

Add sae_set_name to local_path for dictionary downloader (d6163be)
dispatch no longer needed when loading models (69c32ca)
removed in_and_out option for activation buffer (cf6ad1d)
updating readme with 10_32768 dictionaries (614883f)
upgrade to nnsight 0.2 (cbc5f79)
downloader script (7a305c5)
fixing device issue in buffer (b1b44f1)
added pretrained_dictionary_downloader.sh (0028ebe)
added pretrained_dictionary_downloader.sh (8b63d8d)
added pretrained_dictionary_downloader.sh (6771aff)
efficiency improvements (94844d4)
adding identity dict (76bd32f)
debugging interp (2f75db3)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (86812f5)
warns user when evaluating without enough data (246c472)
cleaning up interp (95d7310)
examine_dimension returns mbottom_tokens and logit stats (40137ff)
continuing merge (db693a6)
progress on merge (949b3a7)
changes to buffer.py (792546b)
fixing some things in buffer.py (f58688e)
updating requirements (a54b496)
updating requirements (a1db591)
identity dictionary (5e1f35e)
bug fix for neuron resampling (b281b53)
UMAP visualizations (81f8e1f)
better normalization for ghost_loss (fc74af7)
neuron resampling without replacement (4565e9a)
simplifications to interp functions (2318666)
Second nnsight 0.2 pass through (3bcebed)
Conversion to nnsight 0.2 first pass (cac410a)
detaching another thing in ghost grads (2f212d6)
Neuron resampling no longer errors when resampling zero neurons (376dd3b)
NNsight v0.2 Updates (90bbc76)
cosmetic improvements to buffer.py (b2bd5f0)
fix to ghost grads (9531fe5)
fixing table formatting (0e69c8c)
Fixing some table formatting (75f927f)
gpt2-small support (f82146c)
fixing bug relevant to UnifiedTransformer support (9ec9ce4)
Getting rid of histograms (31d09d7)
Fixing tables in readme (5934011)
Updates to the readme (a5ca51e)
Fixing ghost grad bugs (633d583)
Handling ghost grad case with no dead neurons (4f19425)
adding support for buffer on other devices (f3cf296)
support for ghost grads (25d2a62)
add an implementation of ghost gradients (2e09210)
fixing a bug with warmup, adding utils (47bbde1)
remove HF arg from buffer. rename search_utils to interp (7276f17)
typo fix (3f6b922)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (278084b)
added utils for converting hf dataset to generator (82fff19)
add ablated token effects to ; restore support for HF datasets (799e2ca)
merge in function for examining features (986bf96)
easier submodule/dictionary feature examination (2c8b985)
Adding lr warmup after every time neurons are resampled (429c582)
fixing issues with EmptyStream exception (39ff6e1)
Minor changes due to updates in nnsight (49bbbac)
Revert "restore support for streaming HF datasets"

This reverts commit b43527b. (23ada98)

restore support for streaming HF datasets (b43527b)
first version of automatic feature labeling (c6753f6)
Add feature_effect function to search_utils.py (0ada2c6)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (fab70b1)
adding sqrt to MSE (63b2174)
Merge pull request #1 from cadentj/main

Update README.md (fd79bb3)

Update README.md (cf5ec24)
Update README.md (55f33f2)
evaluation.py (2edf59e)
evaluating dictionaries (71e28fb)
Removing experimental use of sqrt on MSELoss (865bbb5)
Adding readme, evaluation, cleaning up (ddac948)
some stuff for saving dicts (d1f0e21)
removing device from buffer (398f15c)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (7f013c2)
lr schedule + enabling stretched mlp (4eaf7e3)
add random feature search (e58cc67)
restore HF support and progress bar (7e2b6c6)
Merge branch 'main' of https://github.com/saprmarks/dictionary_learning into main (d33ef05)
more support for saving checkpints (0ca258a)
fix unit column bug + add scheduler (5a05c8c)
fix merge bugs: checkpointing support (9c5bbd8)
Merge: add HF datasets and checkpointing (ccf6ed1)
checkpointing, progress bar, HF dataset support (fd8a3ee)
progress bar for training autoencoders (0a8064d)
implementing neuron resampling (f9b9d02)
lotsa stuff (bc09ba4)
adding init.py file for imports (3d9fd43)
modifying buffer (ba9441b)
first commit (ea89e90)
Initial commit (741f4d6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.0 (2025-02-12)

Feature

Unknown

Uh oh!