Skip to content

Scoring bug: component_basic/external_references always scores as missing due to jsonpath and key-casing mismatches #76

@rocklambros

Description

@rocklambros

Summary

The external_references field in the component_basic scoring category always scores as missing (not present), even when the generated BOM contains a fully populated externalReferences array with multiple entries. This causes every model to lose ~2.9 points out of 100 regardless of how complete their metadata is.

The root cause is two independent bugs in how the field is detected:

  1. Incorrect jsonpath in field_registry.json: The path is $.component.externalReferences (singular component) but the CycloneDX BOM structure uses $.components[0].externalReferences (plural components with array index).
  2. Snake_case/camelCase mismatch in fallback checker: The fallback check_field_in_aibom() checks if "external_references" in component but the CycloneDX key is "externalReferences" (camelCase).

Environment

  • aibom-generator version: v1.0.2 (commit 67829cb)
  • Python: 3.12
  • OS: Linux (Ubuntu)

Steps to Reproduce

  1. Run the AIBOM generator against any HuggingFace model that has external references:
python3 -m src.cli "rockCO78/crosswalk-v7c" --output /tmp/test_aibom.json --verbose
  1. Observe the output shows Component Basic: 17.1/20 instead of 20/20.

  2. Inspect the generated BOM to confirm externalReferences IS present:

import json
bom = json.load(open("/tmp/test_aibom.json"))
comp = bom["components"][0]

# The field EXISTS in the BOM under the correct CycloneDX key:
print("externalReferences" in comp)  # True
print(len(comp["externalReferences"]))  # 5 entries

# But the scorer looks for the wrong key:
print("external_references" in comp)  # False
  1. Verify the BOM top-level structure uses components (plural), not component:
print("component" in bom)   # False
print("components" in bom)  # True

Root Cause Analysis

Bug 1: Incorrect jsonpath in field_registry.json

File: src/models/field_registry.json

The external_references field definition uses:

{
  "external_references": {
    "category": "component_basic",
    "jsonpath": "$.component.externalReferences",
    ...
  }
}

The jsonpath $.component.externalReferences navigates to bom["component"]["externalReferences"], but the CycloneDX BOM structure (both 1.6 and 1.7) uses bom["components"][0]["externalReferences"] -- note the plural components with an array index.

For comparison, the other component_basic fields all use the correct plural path:

Field jsonpath
name $.components[0].name
type $.components[0].type
component_version $.components[0].version
purl $.components[0].purl
description $.components[0].description
licenses $.components[0].licenses
external_references $.component.externalReferences (incorrect)

The inconsistency is clear: external_references is the only field using singular $.component instead of plural $.components[0].

This causes FieldRegistryManager.detect_field_presence() -> _get_nested_value() to fail at line 258-260 of src/models/registry.py:

if isinstance(current, dict) and part in current:
    current = current[part]
else:
    return False, None  # <-- hits this because bom["component"] doesn't exist

Bug 2: Snake_case field name vs camelCase BOM key in fallback checker

File: src/models/scoring.py, lines 93-98

When the jsonpath-based detection fails (bug 1), the scorer falls back to check_field_in_aibom(). The relevant code:

# Line 93-98 of scoring.py
components = aibom.get("components", [])
if components:
    component = components[0]
    if field in component:  # field = "external_references"
        return True          # BOM key = "externalReferences" -- no match

The field registry names this field external_references (snake_case), but CycloneDX uses externalReferences (camelCase). The if field in component check performs a literal key lookup, so "external_references" in {"externalReferences": [...]} returns False.

Other fields avoid this problem because their registry names match their BOM keys exactly (e.g., name, type, purl, description, licenses). The component_version field also has a name mismatch (component_version vs version), but it is rescued by the jsonpath-based detection (bug 1 doesn't affect it because its jsonpath $.components[0].version is correct).

Suggested Fix

Fix 1 (field_registry.json): Change the jsonpath from singular to plural:

- "jsonpath": "$.component.externalReferences",
+ "jsonpath": "$.components[0].externalReferences",

This alone should fix the scoring because the enhanced checker (check_field_with_enhanced_results) tries the jsonpath-based detection first (line 154-158 of scoring.py), and if that succeeds, it never reaches the fallback.

Fix 2 (scoring.py, defense-in-depth): Add a camelCase alias check in the fallback, or normalize field names before lookup:

# Option A: explicit alias map
FIELD_ALIASES = {
    "external_references": "externalReferences",
    "component_version": "version",
}

# In check_field_in_aibom(), line 97:
field_key = FIELD_ALIASES.get(field, field)
if field_key in component:
    return True

Fix 1 is sufficient on its own. Fix 2 provides defense-in-depth against similar issues in future field additions.

Impact

  • Every model scored by the AIBOM generator loses ~2.86 points (1/7 * 20) in the component_basic category, even when the BOM correctly contains external references.
  • This makes it impossible to achieve 100/100 completeness.
  • The issue affects both CycloneDX 1.6 and 1.7 output since both use the components (plural) array structure.

Additional Context

I discovered this while publishing a model (rockCO78/crosswalk-v7c) and maximizing the AIBOM completeness score. The model card covers all 35 non-GGUF fields in the registry, achieving 97.1/100 -- with the remaining 2.9 points lost entirely to this bug.

Scoring output:

Completeness Score: 97.1/100

Section Breakdown:
  - Required Fields: 20/20
  - Metadata: 20/20
  - Component Basic: 17.1/20    <-- should be 20/20
  - Component Model Card: 30/30
  - External References: 10/10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions