Best practice for sharing model weights among several generators/caches #440

xonfour · 2024-05-05T18:01:35Z

xonfour
May 5, 2024

Hi there!

I'm looking for a way to share model weights among two or more generators/caches.

The reason for this:

I want to keep one cache for my "main line" iterative generations and have other caches for auxiliary generations (mainly agent/verifier tasks). Of course I could use batching instead but that will result in performance going down because of fixed batch size, even if some slots are used (or do I get something wrong here?).

Thanks!

turboderp · 2024-05-05T19:54:36Z

turboderp
May 5, 2024
Maintainer

The simple way to do it is just to create two generators, each with its own cache but both referencing the same model. They should work independently. The ExLlamaV2 object is stateless, so you can attach as many generators as you want, or use the forward function separately (for classification or whatever) in between calls to the generator(s). You could also interleave operations like so:

generator_1 = ExLlamaV2StreamingGenerator(model, cache_1, tokenizer)
generator_2 = ExLlamaV2StreamingGenerator(model, cache_2, tokenizer)
generator_1.begin_stream_ex(...)
generator_2.begin_stream_ex(...)
while True:
    res_1 = generator_1.stream_ex()
    res_2 = generator_1.stream_ex()
    ...

Just note that even though the model is stateless it's not thread-safe.

I plan to replace the generator/cache system with a more versatile paged attention scheme soon.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Best practice for sharing model weights among several generators/caches #440

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Best practice for sharing model weights among several generators/caches #440

Uh oh!

xonfour May 5, 2024

Replies: 1 comment

Uh oh!

Uh oh!

turboderp May 5, 2024 Maintainer

xonfour
May 5, 2024

turboderp
May 5, 2024
Maintainer