Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ X] I carefully followed the README.md.
- [X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
My GGML converted models should be easy to convert to GGUF.
I know the conversion tools aren't guaranteed but I'd like to file this one in case anybody else has a workaround or more version flexible option. I would love to see any version of GGML/GGJT supported if possible. Instead my GGML files converted earlier are apparently not supported for conversion to GGUF.
Is there any tool to show the standard version details of a model file? Happy to contribute one if there isn't.
Current Behavior
python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
* Scanning GGML input file
Traceback (most recent call last):
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
main()
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
offset = model.load(data, 0)
^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
offset += self.validate_header(data, offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported
Environment and Context
Working with models
- Physical (or virtual) hardware you are using, e.g. for Linux:
Physical Fedora 38, probably irrelevant give the Python.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
Stepping: 1
CPU(s) scaling MHz: 40%
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 3990.92
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts a
cpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_per
fmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes
64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_
2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
refetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb st
ibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bm
i2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (28 instances)
L1i: 896 KiB (28 instances)
L2: 7 MiB (28 instances)
L3: 70 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
- Operating System, e.g. for Linux:
$ uname -a
Linux z840 6.4.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.4
$ make --version
$ g++ --version
Failure Information (for bugs)
python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
* Scanning GGML input file
Traceback (most recent call last):
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
main()
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
offset = model.load(data, 0)
^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
offset += self.validate_header(data, offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- step 1 convert any of the PTH models to GGML (using previous unversioned commits of convert)
- step 2 convert the GGML to GGUF with the command given above.