Skip to content

Conversation

@spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Oct 14, 2025

Description

Add new AI artifact cataloger for Syft that detects and parses GGUF (GPT-Generated Unified Format) model files. This includes a proper package type, parser, and metadata structure so Syft can identify and describe .gguf models like any other package in the artifacts section of the syft sbom.

What’s New

  • ModelPkg type added to syft/pkg/type.go ModelPkg Type = "model"
  • GGUFFileMetadata struct introduced (syft/pkg/gguf.go) with fields for model format, architecture, quantization, tensor counts, etc. A general holding of key value pairs is also included for values seen not as frequently.
  • Header parser (cataloger/aiartifact/parse_gguf.go) that reads the GGUF magic number, extracts key metadata (name, license, architecture, etc.), and applies safety limits for corrupted files.
  • Cataloger (cataloger/aiartifact/cataloger.go) that locates .gguf files, reads just the header (up to 10 MB), and creates type=model packages tagged with ai-artifact, model, gguf, and ml.

Happy to renamespace the cataloger from aiartifact -> anything that comes up in this review

Integration hooks added in package_tasks.go and packagemetadata/names.go. Cataloger will work across both directory and image scans.

Output Support

Syft JSON: includes full gguf-file-metadata with all parsed fields.
CycloneDX 1.6 ML-BOM: emits machine-learning-model components with GGUF metadata encoded as properties.
SPDX: Separate PR

Demo

TODO

Fast Follow

  • OCI Support for Docker Model Registry
  • Hugging Face Source support

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation (updates the documentation)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

@github-actions github-actions bot added the json-schema Changes the json schema label Oct 14, 2025
@spiffcs spiffcs changed the title feat: 4184 gguf parser (ai artifact cataloger) feat: 4184 gguf parser (ai artifact cataloger) part 1 Oct 14, 2025
@spiffcs spiffcs linked an issue Oct 14, 2025 that may be closed by this pull request
@spiffcs spiffcs marked this pull request as ready for review October 14, 2025 10:09
@github-actions

This comment has been minimized.

Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Comment on lines 30 to 52
// Read and validate the GGUF file header using LimitedReader to prevent OOM
// We use LimitedReader to cap reads at maxHeaderSize (50MB)
limitedReader := &io.LimitedReader{R: reader, N: maxHeaderSize}
headerData, err := readHeader(limitedReader)
if err != nil {
return nil, nil, fmt.Errorf("failed to read GGUF header: %w", err)
}

// Create a temporary file for the library to parse
// The library requires a file path, so we create a temp file
tempFile, err := os.CreateTemp("", "syft-gguf-*.gguf")
if err != nil {
return nil, nil, fmt.Errorf("failed to create temp file: %w", err)
}
tempPath := tempFile.Name()
defer os.Remove(tempPath)

// Write the validated header data to temp file
if _, err := tempFile.Write(headerData); err != nil {
tempFile.Close()
return nil, nil, fmt.Errorf("failed to write to temp file: %w", err)
}
tempFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Read and validate the GGUF file header using LimitedReader to prevent OOM
// We use LimitedReader to cap reads at maxHeaderSize (50MB)
limitedReader := &io.LimitedReader{R: reader, N: maxHeaderSize}
headerData, err := readHeader(limitedReader)
if err != nil {
return nil, nil, fmt.Errorf("failed to read GGUF header: %w", err)
}
// Create a temporary file for the library to parse
// The library requires a file path, so we create a temp file
tempFile, err := os.CreateTemp("", "syft-gguf-*.gguf")
if err != nil {
return nil, nil, fmt.Errorf("failed to create temp file: %w", err)
}
tempPath := tempFile.Name()
defer os.Remove(tempPath)
// Write the validated header data to temp file
if _, err := tempFile.Write(headerData); err != nil {
tempFile.Close()
return nil, nil, fmt.Errorf("failed to write to temp file: %w", err)
}
tempFile.Close()
// Create a temporary file for the library to parse
// The library requires a file path, so we create a temp file
tempFile, err := os.CreateTemp("", "syft-gguf-*.gguf")
if err != nil {
return nil, nil, fmt.Errorf("failed to create temp file: %w", err)
}
tempPath := tempFile.Name()
defer os.Remove(tempPath)
// Read and validate the GGUF file header using LimitedReader to prevent OOM
// We use LimitedReader to cap reads at maxHeaderSize (50MB)
limitedReader := &io.LimitedReader{R: reader, N: maxHeaderSize}
err := copyHeader(tempFile, limitedReader)
if err != nil {
return nil, nil, fmt.Errorf("failed to read GGUF header: %w", err)
}
// Write the validated header data to temp file
if _, err := tempFile.Write(headerData); err != nil {
tempFile.Close()
return nil, nil, fmt.Errorf("failed to write to temp file: %w", err)
}
tempFile.Close()

Now copyHeader only does a quick magic check and a simple copy()

Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
@spiffcs spiffcs merged commit 4a60c41 into main Nov 13, 2025
12 checks passed
@spiffcs spiffcs deleted the 4184-gguf-parser branch November 13, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

json-schema Changes the json schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add support for cataloging GGUF models

3 participants