Skip to content

Commit 53d5c69

Browse files
authored
Track identity evidences in VDR (owasp-dep-scan#424)
* Validation fixes Signed-off-by: Prabhu Subramanian <[email protected]> Track identity evidences Signed-off-by: Prabhu Subramanian <[email protected]> * Bug fix Signed-off-by: Prabhu Subramanian <[email protected]> * Bug fix Signed-off-by: Prabhu Subramanian <[email protected]> * Docs Signed-off-by: Prabhu Subramanian <[email protected]> * Docs Signed-off-by: Prabhu Subramanian <[email protected]> * Bug fix Signed-off-by: Prabhu Subramanian <[email protected]> --------- Signed-off-by: Prabhu Subramanian <[email protected]>
1 parent e96489f commit 53d5c69

File tree

10 files changed

+5923
-95
lines changed

10 files changed

+5923
-95
lines changed

contrib/bom-1.6.schema.json

Lines changed: 5699 additions & 0 deletions
Large diffs are not rendered by default.

contrib/depscanGPT/README.md

Lines changed: 70 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -5,71 +5,84 @@ depscanGPT is [available](https://chatgpt.com/g/g-674f260c887c819194e465d2c65f40
55
## System prompt
66

77
```text
8-
# System Prompt
8+
# System Prompt
99
1010
You are depscan, an application‑security expert in Software Composition Analysis (SCA) and supply‑chain security. Your only sources of truth are:
11-
- JSON files the user uploads (CycloneDX VDR, SBOM, CBOM, OBOM, SaaSBOM, ML‑BOM, CSAF VEX)
12-
- Embedded reference docs bundled with this GPT (e.g., PROJECT_TYPES.md)
11+
JSON files the user uploads (CycloneDX VDR, SBOM, CBOM, OBOM, SaaSBOM, ML‑BOM, CSAF VEX)
12+
Embedded reference docs bundled with this GPT (e.g., PROJECT_TYPES.md)
1313
1414
If data is missing, reply: “That information isn’t available in the provided materials.”
1515
1616
## Scope
1717
1818
Answer only questions about:
19-
- CycloneDX BOM or VDR content
20-
- OASIS CSAF VEX
21-
- OWASP depscan, blint, or cdxgen
22-
23-
**BOM generation & CycloneDX authoring**
24-
25-
If the user’s question is about creating a BOM or general CycloneDX mechanics (rather than analysing an existing report), redirect them to cdxgenGPT:
26-
“For BOM generation, please try the dedicated assistant here → https://chatgpt.com/g/g-673bfeb4037481919be8a2cd1bf868d2-cdxgen ”
27-
28-
For anything else, respond: “I’m sorry, but I can only help with BOM and VDR‑related queries.”
29-
30-
## Interaction flow
31-
1. Greeting (first turn only) – “Hello, I’m OWASP depscan — how can I help with your BOM or VDR?”
32-
2. Ask for a JSON file or a specific question.
33-
3. Never offer to create sample BOM/VDR files.
34-
35-
## Analysis rules
36-
- VDR: use vulnerabilities, severity, analysis, etc.
37-
- SBOM/CBOM/OBOM/ML‑BOM: use components, purl, licenses, properties, etc.
38-
- SaaSBOM: use services, endpoints, authenticated, data.classification.
39-
- Infer ecosystem from purl (pkg:npm → npm, pkg:pypi → Python).
40-
- If coverage is unclear, suggest regenerating with depscan `--profile research` or `--reachability-analyzer SemanticReachability`.
41-
42-
## Understanding depscan reports
43-
44-
**Input expectations**
45-
- If the user’s question involves scan results but no report is attached, ask them to upload `depscan.html` or `depscan.txt` (console output) — whichever they have handy.
46-
- Accept CycloneDX VDR JSON alongside the HTML/TXT when both are supplied.
47-
- If key details (e.g., reachable flows, service endpoints, remediation notes) are missing from the uploaded depscan.html or depscan.txt, tell the user: “Please rerun depscan with the `--explain` flag and attach the regenerated report for a detailed analysis.”
48-
49-
**How to analyse the report (JSON, HTML or TXT)**
50-
1. When summarizing a VDR JSON file, if an annotations array exists and any annotator.name is "owasp-depscan", prefer the text field as the primary summary. Choose the latest timestamped annotation if multiple exist.
51-
2. In TEXT and HTML files, locate the “Dependency Scan Results (BOM)” table → extract package, CVE, severity, score and fix version.
52-
1. Use the “Reachable / Endpoint‑Reachable / Top Priority” sections to explain exploitability and remediation order.
53-
2. Parse the “Service Endpoints” and “Reachable Flows” tables to highlight insecure routes or code hotspots.
54-
3. Everything you state must be quoted or paraphrased from the uploaded report; if a datum is absent, say so plainly.
55-
56-
**Response rules**
57-
- Never guess, extrapolate or add external CVE intelligence.
58-
- Keep the normal style limits (≤ 2 sentences or ≤ 3 bullets).
59-
- When advising fixes, repeat only the fix version shown in the report; do not suggest alternative versions.
60-
61-
## Reference look‑ups
62-
- For supported languages/frameworks, consult PROJECT_TYPES.md and quote it.
63-
- If unsupported, direct the user to open a “Premium Issue” in the cdxgen GitHub repo (link on request).
64-
65-
## Response style
66-
- ≤ 2 sentences (or ≤ 3 brief bullet points).
67-
- No jokes or small talk.
68-
- Don’t add unsolicited suggestions.
69-
70-
## Feedback nudge
71-
72-
When a user expresses satisfaction, once per session invite them to review depscanGPT on social media or donate to the OWASP Foundation.
19+
• CycloneDX BOM or VDR content
20+
• OASIS CSAF VEX
21+
• OWASP depscan, blint, or cdxgen
22+
23+
## BOM generation & CycloneDX authoring
24+
25+
If the user’s question is about creating a BOM or general CycloneDX mechanics (rather than analyzing an existing report), redirect them:
26+
27+
“For BOM generation, please try the dedicated assistant here → https://chatgpt.com/g/g-673bfeb4037481919be8a2cd1bf868d2-cdxgen”
28+
29+
For any other unrelated request, respond:
30+
31+
“I’m sorry, but I can only help with BOM and VDR-related queries.”
32+
33+
## Interaction Flow
34+
1. Greeting (first turn only): “Hello, I’m OWASP depscan — how can I help with your BOM or VDR?”. Display the ascii logo from "Optional ASCII logo" occasionally.
35+
2. Request a JSON file or specific question.
36+
3. Never offer to create sample BOM/VDR files.
37+
38+
## Analysis Rules
39+
• VDR: Only use vulnerabilities, analysis, annotations, severity.
40+
• SBOM/CBOM/OBOM/ML‑BOM: Only use components, purl, licenses, properties.
41+
• SaaSBOM: Only use services, endpoints, authenticated, data.classification.
42+
• Infer the ecosystem solely from purl fields (e.g., pkg:npm → npm).
43+
• If coverage is unclear, suggest rerunning depscan with --profile research or --reachability-analyzer SemanticReachability.
44+
45+
## Understanding Depscan Reports (TXT/HTML)
46+
• If the user provides a depscan.txt or depscan.html, accept it.
47+
• Prefer annotations array from VDR when summarizing vulnerabilities, picking the latest timestamp if multiple exist.
48+
• Parse and use:
49+
• “Dependency Scan Results (BOM)” table: extract package name, CVE, severity, fix version.
50+
• “Reachable / Endpoint-Reachable / Top Priority” sections: highlight exploitability and remediation order.
51+
• “Service Endpoints” and “Reachable Flows” tables: highlight insecure code paths.
52+
• “Next Steps” section: treat this as **mandatory source of truth** for recommending actions if present.
53+
• **Never extrapolate** beyond what the reports or annotations explicitly state.
54+
55+
## Automatic Build Manager Command Generation
56+
57+
When a “Next Steps” section exists:
58+
• If a “Fix Version” and “Package” are specified, generate a build tool command based solely on:
59+
• the purl format (e.g., pkg:nuget, pkg:npm, pkg:maven)
60+
• any explicitly provided project hints (e.g., .csproj paths).
61+
• Only use standard native command syntax:
62+
• NuGet (.NET projects):
63+
dotnet add <path>.csproj package <package-name> --version <fix-version>
64+
• npm projects:
65+
npm install <package-name>@<fix-version> --save
66+
• Maven projects:
67+
Suggest manually updating pom.xml or using:
68+
mvn versions:set -DnewVersion=<fix-version>
69+
• **Do not infer missing information.**
70+
• **Do not recommend upgrades for packages without a fix version provided.**
71+
72+
## Response Rules
73+
• Never guess, extrapolate, or add external CVE intelligence.
74+
• Responses must match exact data and structure from the uploaded depscan or VDR.
75+
• When advising a fix, **repeat exactly** the “Fix Version” shown in the report — no alternative versions or speculations.
76+
• If multiple “Next Steps” exist, treat them independently.
77+
78+
## Style
79+
• Keep all responses ≤ 2 sentences or ≤ 3 bullets unless user asks for expanded details.
80+
• No jokes, small talk, or promotional suggestions.
81+
• Do not insert external links unless specifically asked.
82+
83+
## Feedback Nudge
84+
85+
When a user expresses satisfaction, invite them once per session to review depscanGPT on social media or donate to the OWASP Foundation.
7386
7487
## Optional ASCII logo
7588

contrib/vex-validate.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ def build_args():
2323

2424

2525
def vvex(vex_json):
26-
schema = os.path.join(os.path.dirname(__file__), "bom-1.5.schema.json")
26+
schema = os.path.join(os.path.dirname(__file__), "bom-1.6.schema.json")
2727
with open(schema, mode="r") as sp:
2828
with open(vex_json, mode="r") as vp:
2929
vex_obj = json.load(vp)
3030
try:
3131
validate(instance=vex_obj, schema=json.load(sp))
32-
print("VEX file is valid")
32+
print("VDR/VEX file is valid")
3333
except ValidationError as ve:
3434
print(ve)
3535
sys.exit(1)

depscan/cli.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -170,17 +170,22 @@ def vdr_analyze_summarize(
170170
vdr_file = os.path.join(bom_dir, DEPSCAN_DEFAULT_VDR_FILE)
171171
if vdr_result.success:
172172
pkg_vulnerabilities = vdr_result.pkg_vulnerabilities
173+
cdx_vdr_data = None
173174
# Always create VDR files even when empty
174175
if pkg_vulnerabilities is not None:
175176
# Case 1: Single BOM file resulting in a single VDR file
176177
if bom_file:
177-
if bom_data := json_load(bom_file, log=LOG):
178-
export_bom(bom_data, ds_version, pkg_vulnerabilities, vdr_file)
178+
cdx_vdr_data = json_load(bom_file, log=LOG)
179179
# Case 2: Multiple BOM files in a bom directory
180180
elif bom_dir:
181-
bom_data = create_empty_vdr(pkg_list, ds_version)
182-
export_bom(bom_data, ds_version, pkg_vulnerabilities, vdr_file)
183-
LOG.debug(f"The VDR file '{vdr_file}' was created successfully.")
181+
cdx_vdr_data = create_empty_vdr(pkg_list, ds_version)
182+
if cdx_vdr_data:
183+
export_bom(cdx_vdr_data, ds_version, pkg_vulnerabilities, vdr_file)
184+
LOG.debug(f"The VDR file '{vdr_file}' was created successfully.")
185+
else:
186+
LOG.debug(
187+
f"VDR file '{vdr_file}' was not created for the type {project_type}."
188+
)
184189
summary = summary_stats(pkg_vulnerabilities)
185190
elif bom_dir or bom_file or pkg_list:
186191
LOG.info("No vulnerabilities found for project type '%s'!", project_type)
@@ -656,10 +661,13 @@ def run_depscan(args):
656661
or (vuln_analyzer == "auto" and bom_dir_mode)
657662
):
658663
if args.reachability_analyzer == "SemanticReachability":
659-
LOG.info(
660-
"Semantic Reachability analysis requested for project type '%s'. This might take a while ...",
661-
project_type,
662-
)
664+
if not args.bom_dir:
665+
LOG.info(
666+
"Semantic Reachability analysis requested for project type '%s'. This might take a while ...",
667+
project_type,
668+
)
669+
else:
670+
LOG.info("Attempting semantic analysis based on existing data at '%s'", args.bom_dir)
663671
else:
664672
LOG.info(
665673
"Lifecycle-based vulnerability analysis requested for project type '%s'. This might take a while ...",
@@ -862,7 +870,9 @@ def run_depscan(args):
862870
else:
863871
LOG.debug("Vulnerability database loaded from %s", config.VDB_BIN_FILE)
864872
if len(pkg_list) > 1:
865-
if args.bom:
873+
if project_type == "bom":
874+
LOG.info("Scanning CycloneDX xBOMs and atom slices")
875+
elif args.bom:
866876
LOG.info(
867877
"Scanning %s with type %s",
868878
args.bom,
@@ -935,6 +945,7 @@ def run_depscan(args):
935945
project_type,
936946
src_dir,
937947
args.bom_dir or reports_dir,
948+
vdr_file,
938949
vdr_result,
939950
args.explanation_mode,
940951
)

depscan/lib/bom.py

Lines changed: 50 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
import os
22
import shutil
33
import sys
4+
import uuid
5+
from collections import defaultdict
6+
from datetime import datetime, timezone
47
from urllib.parse import unquote_plus
58

69
from blint.cyclonedx.spec import CycloneDX
@@ -438,8 +441,8 @@ def create_lifecycle_boms(cdxgen_lib, src_dir, options):
438441

439442
def create_empty_vdr(pkg_list, ds_version):
440443
components = pkg_list or []
441-
metadata = update_tools_metadata(None, None, ds_version)
442-
return {"metadata": metadata, "components": components}
444+
bom_data = update_tools_metadata(None, None, ds_version)
445+
return {**bom_data, "components": components}
443446

444447

445448
def update_tools_metadata(tools, bom_data, ds_version):
@@ -451,18 +454,31 @@ def update_tools_metadata(tools, bom_data, ds_version):
451454
:return: None
452455
"""
453456
if not bom_data:
454-
bom_data = {"metadata": {}}
455-
components = tools.get("components", []) if tools else []
456-
ds_purl = f"pkg:pypi/owasp-depscan@{ds_version}"
457-
components.append(
458-
{
459-
"type": "application",
460-
"name": "owasp-depscan",
461-
"version": ds_version,
462-
"purl": ds_purl,
463-
"bom-ref": ds_purl,
457+
now_utc = datetime.now(timezone.utc)
458+
bom_data = {
459+
"bomFormat": "CycloneDX",
460+
"specVersion": "1.6",
461+
"serialNumber": f"urn:uuid:{uuid.uuid4()}",
462+
"version": 1,
463+
"metadata": {
464+
"timestamp": now_utc.strftime("%Y-%m-%dT%H:%M:%SZ"),
465+
},
464466
}
467+
components = tools.get("components", []) if tools else []
468+
needs_ds_component = (
469+
len([c for c in components if c.get("name") == "owasp-depscan"]) == 0
465470
)
471+
if needs_ds_component:
472+
ds_purl = f"pkg:pypi/owasp-depscan@{ds_version}"
473+
components.append(
474+
{
475+
"type": "application",
476+
"name": "owasp-depscan",
477+
"version": ds_version,
478+
"purl": ds_purl,
479+
"bom-ref": ds_purl,
480+
}
481+
)
466482
bom_data["metadata"]["tools"] = {"components": components}
467483
return bom_data
468484

@@ -505,16 +521,34 @@ def trim_vdr_bom_data(bom_data):
505521
if metadata and metadata.get("properties"):
506522
del metadata["properties"]
507523
bom_data["metadata"] = metadata
508-
new_components = []
524+
new_components = {}
525+
component_identities = defaultdict(list)
509526
for comp in components:
527+
identity_evidences = comp.get("evidence", {}).get("identity", []) or []
528+
if isinstance(identity_evidences, dict):
529+
identity_evidences = [identity_evidences]
510530
for p in (
511531
"properties",
512532
"signature",
533+
"url",
534+
"vendor",
535+
"licenses", # We need a better logic to retain licenses here
513536
):
514-
if comp.get(p):
537+
if comp.get(p) is not None:
515538
del comp[p]
516-
new_components.append(comp)
517-
bom_data["components"] = new_components
539+
ref = comp.get("bom-ref") or comp.get("purl")
540+
# This is an error condition really
541+
if not ref:
542+
continue
543+
component_identities[ref] += identity_evidences
544+
if not new_components.get(ref):
545+
new_components[ref] = comp
546+
vdr_components = []
547+
for ref, comp in new_components.items():
548+
identity_evidences = component_identities[ref]
549+
comp["evidence"] = {"identity": identity_evidences}
550+
vdr_components.append(comp)
551+
bom_data["components"] = vdr_components
518552
for p in (
519553
"annotations",
520554
"signature",

depscan/lib/explainer.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,16 @@
1616
from depscan.lib.logger import console, LOG
1717

1818

19-
def explain(project_type, src_dir, bom_dir, vdr_result, explanation_mode):
19+
def explain(project_type, src_dir, bom_dir, vdr_file, vdr_result, explanation_mode):
2020
"""
2121
Explain the analysis and findings based on the explanation mode.
2222
2323
:param project_type: Project type
24+
:param src_dir: Source directory
2425
:param bom_dir: BOM directory
26+
:param vdr_file: VDR file
27+
:param vdr_result: VDR Result
28+
:param explanation_mode: Explanation mode
2529
"""
2630
pattern_methods = {}
2731
has_any_explanation = False

packages/analysis-lib/src/analysis_lib/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ class VDRResult:
7171
reached_purls: Optional[Dict[str, int]] = None
7272
reached_services: Optional[Dict[str, int]] = None
7373
endpoint_reached_purls: Optional[Dict[str, int]] = None
74+
purl_identities: Optional[Dict[str, List]] = None
7475

7576

7677
class Counts:

0 commit comments

Comments
 (0)