Describe the feature request
Bring support for Llama.cpp inferencing and benchmarking.
Describe the solution you'd like
modelling_llama_skip.py changes for exporting to GGUF
- Add and dispatch inference to llama.cpp with sparse transformers GGUF
- update
run_benchmark.py to support llama.cpp