-
Notifications
You must be signed in to change notification settings - Fork 12.1k
ggml : implement GLU for split up/gate #14181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cisc/unary-reglu-geglu-swiglu
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CUDA code looks good to me.
Yay, mind generating a plot again for some largeish model? |
The "?b" LLaMA model is Mistral Small. For differences like ~1% I think differences are difficult to see in plots. Table with more GPUs
|
Ok, so as expected a lot less beneficial for split tensors, will probably gain another percent or so for MoEs? |
I added Vulkan support for split GLU. |
@0cc4m BTW, I see you check for |
Yes, the Vulkan support is only for contiguous. GLSL (the shader language we use) has no support for pointers, so incontiguous support isn't as simple to implement as it is for CPU, CUDA and Metal. That's why I did only contiguous for now. If necessary, we can add incontigous at a later point. |
917b5b5
to
42c2870
Compare
Had to refactor and deduplicate SYCL code before adding support for split up+gate. Let me know if there are any issues. Edit: Adding SYCL test results(on A750):
Edit2: Added chatglm and with tanh approximation GEGLU |
Nice, can you run a test with chatglm too? |
Implement GLU for split up/gate.
Builds upon #14158
@0cc4m @ggerganov PTAL for adding support to Metal/Vulkan.