feat: add trainingDataAvailable flag and verification warnings for unlinked or missing HuggingFace datasets

## Problem
When generating an AIBOM for certain models, the output includes training  dataset references that resolve to 0 results or non-existent pages on  HuggingFace Hub. The AIBOM is generated without any warning, creating a false sense of completeness.

This is a transparency and audit gap, organizations relying on AIBOM output for compliance may assume training data is  documented when it is not.

## Reproduction
1. Run: `python3 -m src.cli <model-id>`
2. Observe dataset references in AIBOM output
3. Click dataset link on HuggingFace — page returns 0 results or 404

## Suggested Fix
Add a verification step that checks dataset existence via the HF Hub API during extraction. If a dataset cannot be verified, surface it explicitly 
in the AIBOM metadata rather than silently including a dead reference.

## Proposed Metadata Schema

### When datasets cannot be verified:
```
{
  "name": "genai:aibom:trainingDataAvailable",
  "value": "false"
},
{
  "name": "genai:aibom:trainingDataWarning",
  "value": "Training datasets were referenced but could not be verified on Hugging Face Hub. Dataset may not exist, be disabled or be inaccessible."
}
```

### When datasets are successfully verified:
```
{
  "name": "genai:aibom:trainingDataAvailable",
  "value": "true"
},
{
  "name": "genai:aibom:trainingDataStatus",
  "value": "Training datasets verified: Dataset(s) exist and are accessible on Hugging Face Hub."
}
```

## Design Intent
Both states are explicit. A consumer of the AIBOM always knows whether training data was verified or not, there is no silent middle ground.
This is consistent with how completeness scoring already works in this project.

## Why This Matters
Silent inclusion of unverifiable dataset references undermines the core purpose of AIBOM, supply chain transparency. A verifiably incomplete AIBOM is more trustworthy than a silently incomplete one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add trainingDataAvailable flag and verification warnings for unlinked or missing HuggingFace datasets #80

Problem

Reproduction

Suggested Fix

Proposed Metadata Schema

When datasets cannot be verified:

When datasets are successfully verified:

Design Intent

Why This Matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: add trainingDataAvailable flag and verification warnings for unlinked or missing HuggingFace datasets #80

Description

Problem

Reproduction

Suggested Fix

Proposed Metadata Schema

When datasets cannot be verified:

When datasets are successfully verified:

Design Intent

Why This Matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions