-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Granite Four #13550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gabe-l-hart
wants to merge
138
commits into
ggml-org:master
Choose a base branch
from
gabe-l-hart:GraniteFour
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,557
−638
Draft
Granite Four #13550
Changes from all commits
Commits
Show all changes
138 commits
Select commit
Hold shift + click to select a range
271104c
wip: llama : separate recurrent states from the KV cache
compilade 8db1e4d
llama : use std::find for seq_nodes in llama_rs_cache
compilade 0028010
llama : state checkpoints for recurrent models
compilade 0c8b3b2
llama : correctly handle more edge cases for the rs cache
compilade d66849f
Merge branch 'master' into compilade/refactor-kv-cache
compilade a09db95
llama : rename many llama_kv_cache_* functions
compilade c460ff1
Merge branch 'master' into compilade/refactor-kv-cache
compilade b6fafd1
llama : remove useless return value for some llama_cache_* functions
compilade b7ec12e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3b57b55
Merge branch 'master' into compilade/refactor-kv-cache
compilade 7e13f19
llama : rethink recurrent state cell counts
compilade cbc743e
llama : support Jamba
compilade 0fd13e9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 61a88a1
llama : fix BERT inference without KV cache
compilade ea2e63e
convert-hf : check for unprocessed Jamba experts
compilade fc59407
convert-hf : support Mini-Jamba conversion
compilade 181dadf
llama : fix Jamba quantization sanity checks
compilade 3a414b0
llama : sequence-length-aware batch splitting
compilade 4e4c41e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3587a94
llama : use equal-sequence-length sub-batches for recurrent models
compilade 5d3c7b9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 72eea49
llama : fix batch split output count for embeddings
compilade 18d1c14
llama : minimize swaps when reordering logits
compilade 61200ef
llama : fix edge case finding batch seq_id of split recurrent cell
compilade eb589d5
llama : avoid copies for simple batch splits
compilade 8fb57ac
llama : use im2col and mul_mat to perform convolution for Mamba
compilade 17f6c1e
llama : fix .base() compilation error on Windows
compilade fee3c1d
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
compilade 6840ac0
Merge branch 'master' into compilade/refactor-kv-cache
compilade 372482d
llama : rename llama_cache to llama_past
compilade 43d8d4b
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
compilade ff794f5
Merge branch 'master' into compilade/refactor-kv-cache
compilade 33425a7
mamba : fix non-contiguous usage of ggml_silu
compilade 10c3c41
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9b38f8b
Merge branch 'master' into compilade/refactor-kv-cache
compilade 1f0fea7
llama : initial Mamba-2 support
compilade dceff23
ggml : SIMD ggml_ssm_scan for Mamba-2
compilade 2bfe9de
llama : support running Mamba-Codestral-7B-v0.1
compilade aff9692
llama : fix Mamba-2 conv state saving
compilade e04910d
llama : remove unused variable
compilade fa358e7
llama : add missing break
compilade 38913dc
convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
compilade bc320ef
Merge branch 'master' into compilade/refactor-kv-cache
compilade fcb889c
llama : session saving and reloading for hybrid models
compilade a03e32a
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9d3f44d
convert_hf : fix Jamba conversion
compilade 5f62db7
llama : fix mixed signedness comparison
compilade 375de5b
llama : use unused n_embd_k_gqa in k_shift
compilade 4bb4b22
llama : begin renaming llama_past back to llama_kv_cache
compilade 63ac36b
Merge branch 'master' into compilade/refactor-kv-cache
compilade 0e601ca
Merge branch 'master' into compilade/mamba2
compilade 273e7a4
llama : avoid redundant state copy for Mamba 1 and 2
compilade 7d6cb36
Merge branch 'master' into compilade/mamba2
compilade 2c77d79
metal : attempt to adapt SSM_SCAN for Mamba-2
compilade 87b97d0
metal : fix SSM_SCAN pipeline scope
compilade 03d0e6e
metal : use log and exp instead of log1pf and expf in SSM_SCAN
compilade 7a351ab
metal : remove unused arguments for SSM_SCAN
compilade 8b15bc6
metal : add back n_seqs to SSM_SCAN args
compilade 5b8ec2b
metal : fix SSM_SCAN state head offset
compilade 62b09b3
metal : fix wrong number of tokens per sequence in SSM_SCAN
compilade 124c222
Merge branch 'master' into compilade/refactor-kv-cache
compilade 038d958
Merge branch 'master' into compilade/mamba2
compilade 805512a
ggml : remove unused fast broadcast path in GGML_MUL
compilade 7d16e1b
Merge branch 'master' into compilade/mamba2
compilade 3bc7103
ggml : avoid multiply by D in GGML_OP_SSM_SCAN
compilade 8d8f065
Merge branch 'master' into compilade/mamba2
compilade b4e9c59
convert : fix flake8 lint
compilade 8006f3b
llama : remove implicit recurrent state rollbacks
compilade 691698e
Merge branch 'master' into compilade/refactor-kv-cache
compilade e3fe612
llama : partially apply clang-format style
compilade 1ee6c48
Merge branch 'master' into compilade/mamba2
compilade c9ecf62
Merge branch 'master' into compilade/mamba2
compilade 35d06fa
Merge branch 'master' into compilade/mamba2
compilade cf4f0a4
metal : fix confusion between ; and ,
compilade 6def5cd
metal : add missing args for nb references in ssm_scan_f32_group
compilade 791998b
metal : single-user mamba2 inference works
compilade 94c3d53
kv-cache : remove const_cast when setting inputs for s_copy
compilade 929fe85
Merge branch 'master' into compilade/mamba2
compilade d55b0d0
convert : avoid AutoConfig for Mamba and Mamba2 hparams
compilade e94f393
kv-cache : allow context shift for recurrent models
compilade 9864bfc
Merge branch 'master' into compilade/mamba2
compilade 2fa5f2c
graph : fix recurrent state copies when avoiding copies
compilade 757aa62
ggml : fix mamba2 ssm scan when compiled with SVE
compilade 0b6f6be
ggml-cpu : reorder SVE FMA for consistency with other SIMD arches
compilade a42f239
Merge branch 'master' into compilade/mamba2
compilade f8c7cae
cuda : implement ssm scan for Mamba2
compilade 830e554
Merge branch 'master' into compilade/mamba2
compilade afdb669
Merge branch 'master' into compilade/mamba2
compilade 28881af
feat: Add conversion for Bamba models
gabe-l-hart c43259b
feat: Add Granite 4 conversion
gabe-l-hart 26816fd
feat: Plumb bamba through llama-arch
gabe-l-hart b901947
feat: Add bamba to llama_arch_is_hybrid_recurrent
gabe-l-hart fc56325
feat: Add optional mamba ssm_in bias tensor
gabe-l-hart b3453dc
feat: Add template specialization for get_arr to load a vector<uint32…
gabe-l-hart 13e8d3d
feat: Use an explicit bool to determine mamaba vs mamba2
gabe-l-hart b435dce
feat: Isolate mamba(2) and granite attention layer building in static…
gabe-l-hart 3d4c36b
fix: Use per-layer sizes in granite build_attention_layer
gabe-l-hart 0d28bf6
feat: First (broken) pass at end-to-end Bamba implementation
gabe-l-hart ed6216a
fix: Only do Granite multipliers if set
gabe-l-hart a6f9f90
refactor: Pull granite ffn portion into a static function and reuse i…
gabe-l-hart de4d870
feat(py): Allow gguf duplicate keys if they match by value and type
gabe-l-hart 7c2b0b8
refactor(py): Simplify granitemoehybrid conversion to use parents better
gabe-l-hart 915f1e3
feat: Add GRANITE_MOE_HYBRID through llama-arch
gabe-l-hart d0d3723
feat: Support GRANITE_MOE_HYBRID in llama-model
gabe-l-hart 2ca3416
style: Fix flake8 errors
gabe-l-hart 3c22e1d
fix: Fix recurrent cache get after rebase
gabe-l-hart 08493bf
fix: Fix hybrid granite implementation for signature changes in build…
gabe-l-hart ed15012
refactor: Refactor relationship between non-hybrid classes and hybrid…
gabe-l-hart 40e2346
refactor: Implement the full copy-paste version to duplicate the laye…
gabe-l-hart a9dcc84
refactor: Rename llm_build_hybrid_mamba -> llm_build_granite_hybrid
gabe-l-hart dc1d109
mamba : fix mismatched new and delete size for llm_build_mamba
compilade fdc9a8d
Merge remote-tracking branch 'origin/compilade/mamba2' into mamba2-sync
gabe-l-hart 2b263e6
Merge branch 'mamba2-sync' into GraniteFour
gabe-l-hart 66a7a43
memory : correctly handle failure in apply()
ggerganov 8cb4df5
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart f13f5bc
Merge remote-tracking branch 'origin/gg/memory-is-fail' into GraniteFour
gabe-l-hart 6cac586
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart 28361c4
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart bb2bb37
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart 8f9b513
style: Remove TODO for adding first hybrid models to the switch
gabe-l-hart eaec9c6
fix: Fix bad merge in tensor_mapping.py w/ SSM_NORM
gabe-l-hart 1085cf9
fix: Fix bad merge resolution with variable renames/moves in llm_buil…
gabe-l-hart b6d772f
docs: Fix comment about duplicate key check
gabe-l-hart bb590f2
fix: Conform to standard way of initializing inp_out_ids
gabe-l-hart 1c21a04
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart 2bcaf64
Merge branch 'master' into compilade/refactor-kv-cache
compilade 908e655
convert : fix jamba conv1d shape squeezing
compilade d7f4d73
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart e100153
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
gabe-l-hart 4b5f673
fix: Fix input initialization in granite_hybrid after removal of hybr…
gabe-l-hart 0796726
fix: Use llm_graph_context_mamba in llm_build_granite_hybrid
gabe-l-hart f7fa1b1
refactor: Refactor mamba2/granite/jamba/granite_hybrid relationships …
gabe-l-hart 4682e21
Merge branch 'master' into compilade/refactor-kv-cache
compilade 20f8e43
graph : add back hybrid memory graph input
compilade 07c252f
model : add Jamba to Mamba-specific hparams printing
compilade 2e1431f
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
gabe-l-hart 5c32e80
fix: Fix input setup after upstream merge
gabe-l-hart f9d6dd1
Merge remote-tracking branch 'origin/master' into GraniteFour
gabe-l-hart File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled these into the class so that they can be set differently by derived conversion classes and then used in the common methods below