diff --git a/docs/superpowers/plans/2026-05-10-blind-relation-gate-and-track1-champion-eval.md b/docs/superpowers/plans/2026-05-10-blind-relation-gate-and-track1-champion-eval.md new file mode 100644 index 00000000..8d471b01 --- /dev/null +++ b/docs/superpowers/plans/2026-05-10-blind-relation-gate-and-track1-champion-eval.md @@ -0,0 +1,421 @@ +# Blind Relation Gate And Track1 Champion Eval Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a generic blind relation gate to Vulca so relation semantics are judged from the image before caption anchoring, then use it to harden AffectiveArt Track1 candidate evaluation. + +**Architecture:** Vulca keeps the reusable capability: a blind image-only relation read, a deterministic relation comparator, and content-fidelity score capping when the blind read contradicts required relations. The challenge repo remains a consumer: it runs champion/candidate audits and only accepts replacements that are clear wins under caption fidelity, artifact boundary, style, emotion, and blind relation checks. + +**Tech Stack:** Python 3, pytest, LiteLLM/Gemini VLM scoring path, existing `vulca.content_lock` and `vulca.pipeline.nodes.evaluate` modules, existing Track1 audit scripts in `/Users/yhryzy/dev/emoart-130k`. + +--- + +## File Structure + +- Modify: `src/vulca/content_lock.py` + - Add blind-relation prompt construction. + - Add deterministic blind-relation gate construction. + - Extend `apply_content_fidelity_gate` to cap high scores when blind relation decision is `reject` or `hold`. +- Modify: `src/vulca/_vlm.py` + - Add a second image-only VLM call for content locks with required relations. + - Merge blind relation gate output into `content_fidelity_gate`. + - Keep primary scoring usable if the blind VLM call fails. +- Modify: `tests/test_content_lock.py` + - Unit-test blind prompt non-anchoring, reject/hold/pass gate decisions, and score cap behavior. +- Modify: `tests/test_evaluate.py` or `tests/test_vlm_prompt.py` + - Integration-test that `score_image`/`EvaluateNode` propagates blind gate metadata without requiring live network. +- Read-only consumer: `/Users/yhryzy/dev/emoart-130k/scripts/track1_quality_review.py` + - Use existing heuristic/VLM review first. Do not mutate Track1 submission packages during this plan. + +## Task 1: Content-Lock Blind Relation Helpers + +**Files:** +- Modify: `src/vulca/content_lock.py` +- Test: `tests/test_content_lock.py` + +- [ ] **Step 1: Write failing tests** + +Add tests that express the API before implementation: + +```python +from vulca.content_lock import ( + build_blind_relation_gate, + build_blind_relation_read_prompt, +) + + +def test_blind_relation_prompt_does_not_anchor_on_caption_or_forbidden_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + prompt = build_blind_relation_read_prompt(lock) + + assert "caption" not in prompt.lower() + assert "escort" not in prompt.lower() + assert "protect" not in prompt.lower() + assert "soldiers chasing civilians" not in prompt.lower() + assert "visible relationships" in prompt.lower() + + +def test_blind_relation_gate_rejects_forbidden_primary_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": "Mounted soldiers appear to chase fleeing civilians.", + "apparent_relations": ["mounted soldiers chasing civilians"], + "ambiguous_readings": [], + }, + ) + + assert gate["blind_relation_decision"] == "reject" + assert "soldiers chasing civilians" in gate["blind_forbidden_readings_present"] + + +def test_blind_relation_gate_holds_ambiguous_relation_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": "The riders could be escorting or pursuing the civilians.", + "apparent_relations": ["riders behind fleeing civilians"], + "ambiguous_readings": ["escort or pursuit"], + }, + ) + + assert gate["blind_relation_decision"] == "hold" + assert gate["blind_ambiguous_readings"] == ["escort or pursuit"] + + +def test_blind_relation_gate_passes_clear_escort_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": "Mounted soldiers flank civilians and guide them away from burning ruins.", + "apparent_relations": ["mounted soldiers guiding civilians away from burning ruins"], + "ambiguous_readings": [], + }, + ) + + assert gate["blind_relation_decision"] == "pass" +``` + +- [ ] **Step 2: Run tests and verify RED** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_content_lock.py -k "blind_relation" -q +``` + +Expected: fail because `build_blind_relation_gate` and `build_blind_relation_read_prompt` are not defined. + +- [ ] **Step 3: Implement minimal helpers** + +Add functions to `src/vulca/content_lock.py`: + +```python +def build_blind_relation_read_prompt(lock: ContentLock | dict[str, Any]) -> str: + content_lock = content_lock_from_dict(lock) if isinstance(lock, dict) else lock + if not content_lock.required_relations: + return "" + return "\n".join( + [ + "BLIND IMAGE RELATION READ:", + "Describe only what is visible in the image. Do not use any external caption, prompt, sample id, filename, or expected story.", + "Focus on visible relationships among people, animals, vehicles, objects, threats, movement direction, gaze, weapons, gestures, and protection cues.", + "Return exactly one JSON object with these fields:", + '"visible_entities": [short strings],', + '"primary_reading": "one sentence describing the most natural visible relationship reading",', + '"apparent_relations": [short subject-relation-object strings visible in the image],', + '"threat_cues": [short strings],', + '"protective_cues": [short strings],', + '"ambiguous_readings": [short strings for plausible alternate readings, or empty list],', + '"confidence": number from 0.0 to 1.0.', + ] + ) + + +def build_blind_relation_gate( + lock: ContentLock | dict[str, Any], + blind_read: dict[str, Any] | None, +) -> dict[str, Any]: + content_lock = content_lock_from_dict(lock) if isinstance(lock, dict) else lock + if not content_lock.required_relations: + return {"blind_relation_decision": "not_applicable"} + if not blind_read: + return { + "blind_relation_decision": "unavailable", + "blind_relation_reason": "blind relation read unavailable", + } + primary = str(blind_read.get("primary_reading") or "") + apparent = _as_string_list(blind_read.get("apparent_relations")) + ambiguous = _as_string_list(blind_read.get("ambiguous_readings")) + joined = " ".join([primary, *apparent]).lower() + forbidden_present = [ + reading + for reading in content_lock.forbidden_readings + if _relation_reading_matches(reading, joined) + ] + decision = "pass" + reason = "blind read did not contradict required relations" + if forbidden_present: + decision = "reject" + reason = "blind read matches forbidden relation reading" + elif ambiguous: + decision = "hold" + reason = "blind read is ambiguous for required relations" + return { + "blind_relation_decision": decision, + "blind_relation_reason": reason, + "blind_primary_reading": primary, + "blind_apparent_relations": apparent, + "blind_ambiguous_readings": ambiguous, + "blind_forbidden_readings_present": forbidden_present, + } +``` + +Also add `_relation_reading_matches(reading: str, joined: str) -> bool` with conservative matching for `chasing`, `attacking`, `threatened`. + +- [ ] **Step 4: Run tests and verify GREEN** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_content_lock.py -k "blind_relation" -q +``` + +Expected: all selected tests pass. + +## Task 2: Score Cap For Blind Reject/Hold + +**Files:** +- Modify: `src/vulca/content_lock.py` +- Test: `tests/test_content_lock.py` + +- [ ] **Step 1: Write failing test** + +Add: + +```python +def test_blind_relation_reject_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "blind_relation_decision": "reject", + "blind_forbidden_readings_present": ["soldiers chasing civilians"], + "blind_primary_reading": "Mounted soldiers appear to chase civilians.", + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert "Blind relation gate rejected image" in gated["rationales"]["content_fidelity"] + assert "content_fidelity_failed" in gated["risk_flags"] +``` + +- [ ] **Step 2: Run test and verify RED** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_content_lock.py::test_blind_relation_reject_caps_high_score -q +``` + +Expected: fail because current `apply_content_fidelity_gate` ignores `blind_relation_decision`. + +- [ ] **Step 3: Implement score cap** + +In `apply_content_fidelity_gate`, read: + +```python +blind_relation_decision = str(gate.get("blind_relation_decision") or "") +blind_relation_failed = blind_relation_decision in {"reject", "hold"} +``` + +Include `blind_relation_failed` in the cap condition and add rationale: + +```python +if blind_relation_failed: + rationale_parts.append( + f"Blind relation gate rejected image: {gate.get('blind_relation_reason') or blind_relation_decision}" + ) +``` + +- [ ] **Step 4: Run focused tests** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_content_lock.py -k "blind_relation or relation_semantics" -q +``` + +Expected: selected tests pass. + +## Task 3: VLM Blind Read Integration + +**Files:** +- Modify: `src/vulca/_vlm.py` +- Test: `tests/test_vlm_prompt.py` or `tests/test_evaluate.py` + +- [ ] **Step 1: Write failing integration test** + +Patch `litellm.acompletion` with two responses: the normal caption-conditioned score and the blind relation read. Assert the returned `content_fidelity_gate` includes `blind_relation_decision="reject"` when the blind read says pursuit. + +```python +@pytest.mark.asyncio +async def test_score_image_adds_blind_relation_gate_for_required_relations(monkeypatch): + from vulca._vlm import score_image + from vulca.content_lock import extract_content_lock + + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + normal_response = _completion_response( + '{"L1":0.9,"L2":0.9,"L3":0.9,"L4":0.9,"L5":0.9,' + '"L1_rationale":"ok","L2_rationale":"ok","L3_rationale":"ok","L4_rationale":"ok","L5_rationale":"ok",' + '"missing_required_subjects":[],"missing_required_text_elements":[],"missing_required_surface":[],' + '"missing_required_style_attributes":[],"apparent_relations":["caption-conditioned escort"],' + '"relation_semantics_failed":false,"forbidden_readings_present":[],' + '"forbidden_visual_artifacts":[],"unwanted_visible_text":false,"output_is_artwork_itself":true,' + '"risk_flags":[]}' + ) + blind_response = _completion_response( + '{"visible_entities":["mounted soldiers","civilians"],' + '"primary_reading":"Mounted soldiers appear to chase fleeing civilians.",' + '"apparent_relations":["mounted soldiers chasing civilians"],' + '"threat_cues":[],"protective_cues":[],"ambiguous_readings":[],"confidence":0.82}' + ) + calls = [normal_response, blind_response] + + async def fake_completion(**kwargs): + return calls.pop(0) + + monkeypatch.setattr("litellm.acompletion", fake_completion) + + result = await score_image( + img_b64="iVBORw0KGgo=", + mime="image/png", + subject="track1_0747", + tradition="default", + api_key="fake-key", + content_lock=lock.to_dict(), + ) + + gate = result["content_fidelity_gate"] + assert gate["blind_relation_decision"] == "reject" + assert gate["blind_forbidden_readings_present"] == ["soldiers chasing civilians"] +``` + +- [ ] **Step 2: Run test and verify RED** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_vlm_prompt.py -k "blind_relation_gate" -q +``` + +Expected: fail because `score_image` does not make a blind relation read yet. + +- [ ] **Step 3: Implement integration** + +In `src/vulca/_vlm.py`, after parsing the normal VLM scoring JSON and before returning data: + +```python +blind_relation_gate = None +if resolved_content_lock is not None and resolved_content_lock.required_relations: + from vulca.content_lock import build_blind_relation_gate + + blind_read = await _blind_relation_read( + img_b64=img_b64, + mime=mime, + api_key=api_key, + model=model, + api_base=api_base, + ) + blind_relation_gate = build_blind_relation_gate(resolved_content_lock, blind_read) +``` + +Merge `blind_relation_gate` into `content_fidelity_gate` when present. Add `_blind_relation_read(...)` that calls LiteLLM with the image and `build_blind_relation_read_prompt`, then parses JSON with `parse_llm_json`. On exception, return `{"_error": str(exc)}` and let `build_blind_relation_gate` decide `unavailable`. + +- [ ] **Step 4: Run focused tests** + +Run: + +```bash +PYTHONPATH=src pytest tests/test_content_lock.py tests/test_vlm_prompt.py -k "blind_relation or relation_semantics or content_fidelity" -q +``` + +Expected: selected tests pass. + +## Task 4: Challenge Evaluation Pass + +**Files:** +- Read: `/Users/yhryzy/dev/emoart-130k/submissions/track1_submission.json` +- Read: `/Users/yhryzy/dev/emoart-130k/submissions/track1_candidate_v2/submission.json` +- Read/write reports under `/Users/yhryzy/dev/emoart-130k/experiments/track1_champion_quality_review_*_20260510/` + +- [ ] **Step 1: Run heuristic full-package scan** + +Run: + +```bash +python3 scripts/track1_quality_review.py --image-dir submissions/track1/images --out-dir experiments/track1_champion_quality_review_current_20260510 --heuristic-only +python3 scripts/track1_quality_review.py --image-dir submissions/track1_candidate_v2/images --out-dir experiments/track1_champion_quality_review_candidate_v2_20260510 --heuristic-only +``` + +Expected: each command writes `heuristic_risk_rank.json`, `quality_review_report.json`, and `quality_review_report.md`. + +- [ ] **Step 2: Run live VLM review on top-risk samples** + +Use the Gemini key from Keychain without printing it: + +```bash +GEMINI_API_KEY="$(security find-generic-password -s affectiveart-gemini-api-key -a gemini -w)" python3 scripts/track1_quality_review.py --image-dir submissions/track1/images --out-dir experiments/track1_champion_quality_review_current_20260510 --model gemini-3-flash-preview --limit 40 +GEMINI_API_KEY="$(security find-generic-password -s affectiveart-gemini-api-key -a gemini -w)" python3 scripts/track1_quality_review.py --image-dir submissions/track1_candidate_v2/images --out-dir experiments/track1_champion_quality_review_candidate_v2_20260510 --model gemini-3-flash-preview --limit 40 +``` + +Expected: each report summarizes high-priority replacement risks. Quota exhaustion should be reported with exact retry delay if present. + +- [ ] **Step 3: Run 0747 blind relation live dogfood** + +After Vulca blind gate is implemented, evaluate: + +```bash +PYTHONPATH=/Users/yhryzy/dev/vulca/.worktrees/caption-fidelity-content-lock-v1/src:$PWD \ +GEMINI_API_KEY="$(security find-generic-password -s affectiveart-gemini-api-key -a gemini -w)" \ +VULCA_VLM_MODEL=gemini/gemini-3-flash-preview \ +python3 scripts/track1_challenger_130k_vulca.py \ + --sample-id track1_0747 \ + --out-dir experiments/track1_130k_compiler_gate_v1/blind_relation_gate_0747_dogfood \ + --force +``` + +Expected: the generated create JSON contains a content fidelity gate whose blind relation result does not allow a pursuit/chase image to be accepted. + +## Self-Review Checklist + +- Spec coverage: covers generic Vulca gate, 0747 false-negative root cause, and challenge-side champion evaluation. +- Placeholder scan: no TBD/TODO placeholders remain. +- Type consistency: helper names are stable across tests and implementation snippets: `build_blind_relation_read_prompt`, `build_blind_relation_gate`, `blind_relation_decision`. +- Submission safety: the plan does not mutate `/Users/yhryzy/dev/emoart-130k/submissions/track1_submission.json`, `/Users/yhryzy/dev/emoart-130k/submissions/track1_submission.zip`, or `/Users/yhryzy/dev/emoart-130k/submissions/track1/images`. diff --git a/src/vulca/_parse.py b/src/vulca/_parse.py index 740d5495..578dd7b0 100644 --- a/src/vulca/_parse.py +++ b/src/vulca/_parse.py @@ -28,6 +28,10 @@ def parse_llm_json(text: str) -> dict: # Fix trailing commas before } or ] text = re.sub(r",\s*([}\]])", r"\1", text) + # Fix a rare LLM typo where a key starts with two double quotes: + # { "L5": 0.7, ""missing_required_subjects": [] } + text = re.sub(r'([,{]\s*)""([A-Za-z_][A-Za-z0-9_]*"\s*:)', r'\1"\2', text) + # Fix single quotes → double quotes (careful with apostrophes in text) # Only replace quotes that look like JSON keys/values text = re.sub(r"(?<=[\[{,:])\s*'([^']*?)'\s*(?=[,}\]:])", r' "\1"', text) diff --git a/src/vulca/_vlm.py b/src/vulca/_vlm.py index 99c90966..a79a2b25 100644 --- a/src/vulca/_vlm.py +++ b/src/vulca/_vlm.py @@ -18,6 +18,7 @@ # Token budget: start low, escalate on truncation _DEFAULT_MAX_TOKENS = 3072 _ESCALATED_MAX_TOKENS = 8192 +_CONTENT_LOCK_MAX_TOKENS = 16384 _MAX_ESCALATION_ATTEMPTS = 1 # Local (Ollama) models consistently emit >3072 tokens for the L1-L5 JSON @@ -221,32 +222,64 @@ def _build_dynamic_suffix( return "\n".join(p for p in parts if p) def _extract_scoring(text: str) -> str: - """Extract content inside the **last** ... block. + """Extract parseable scoring JSON from a two-phase VLM response. - Implements the two-phase scratchpad protocol: the model writes free-form - observations in tags (discarded), then structured JSON in - tags (parsed). Falls back to full text for backward compatibility - with responses that do not use the tag protocol. - - Uses rfind for the last ```` to avoid mis-matching when earlier - text (e.g. observation or JSON values) accidentally contains ````. + Prefer the last valid block. If the model omits scoring tags, + strip scratchpad observation blocks and return the first balanced JSON + object. Falling back this way avoids Gemini scratchpad braces poisoning the + generic JSON parser. """ - close_tag = "" - close_idx = text.rfind(close_tag) - if close_idx == -1: - return text - prefix = text[:close_idx] - open_tag = "" - # Search backwards for the that yields content starting with '{' - search_end = len(prefix) - while True: - open_idx = prefix.rfind(open_tag, 0, search_end) - if open_idx == -1: - return text - candidate = prefix[open_idx + len(open_tag):].strip() + scoring_blocks = list( + re.finditer( + r"\s*(.*?)\s*", + text, + flags=re.IGNORECASE | re.DOTALL, + ) + ) + for match in reversed(scoring_blocks): + candidate = match.group(1).strip() if candidate.startswith("{"): return candidate - search_end = open_idx + + without_observation = re.sub( + r".*?", + "", + text, + flags=re.IGNORECASE | re.DOTALL, + ) + json_candidate = _first_balanced_json(without_observation) + if json_candidate: + return json_candidate + return text.strip() + + +def _first_balanced_json(text: str) -> str: + start = text.find("{") + if start < 0: + return "" + + depth = 0 + in_string = False + escaped = False + for index, char in enumerate(text[start:], start=start): + if in_string: + if escaped: + escaped = False + elif char == "\\": + escaped = True + elif char == '"': + in_string = False + continue + + if char == '"': + in_string = True + elif char == "{": + depth += 1 + elif char == "}": + depth -= 1 + if depth == 0: + return text[start : index + 1].strip() + return "" def _build_extra_dimensions_prompt(extras: list[dict]) -> str: @@ -544,6 +577,7 @@ async def score_image( *, mode: str = "strict", model: str = "", + content_lock: dict | None = None, ) -> dict: """Call Gemini Vision to score an image on L1-L5. @@ -572,9 +606,21 @@ async def score_image( {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{img_b64}"}}, ] if subject: - user_parts.append({"type": "text", "text": f"Subject/context: {subject}"}) + user_text = f"Subject/context: {subject}" else: - user_parts.append({"type": "text", "text": "Evaluate this artwork."}) + user_text = "Evaluate this artwork." + resolved_content_lock = None + if content_lock: + from vulca.content_lock import ( + build_content_fidelity_prompt, + content_lock_from_dict, + ) + + resolved_content_lock = content_lock_from_dict(content_lock) + fidelity_prompt = build_content_fidelity_prompt(resolved_content_lock) + if fidelity_prompt: + user_text = f"{user_text}\n\n{fidelity_prompt}" + user_parts.append({"type": "text", "text": user_text}) try: messages = [ @@ -591,10 +637,19 @@ async def score_image( # Adaptive token budget: cloud models start small (cost-conscious), # local models start at the escalated budget since tokens are free - # and Gemma-class models regularly exceed 3072. - max_tokens = _LOCAL_DEFAULT_MAX_TOKENS if is_local else _DEFAULT_MAX_TOKENS + # and Gemma-class models regularly exceed 3072. Content-lock scoring + # asks for additional gate fields, so allow one larger final attempt + # when the model truncates twice. + token_budgets = ( + [_LOCAL_DEFAULT_MAX_TOKENS] + if is_local + else [_DEFAULT_MAX_TOKENS, _ESCALATED_MAX_TOKENS] + ) + if content_lock and _CONTENT_LOCK_MAX_TOKENS not in token_budgets: + token_budgets.append(_CONTENT_LOCK_MAX_TOKENS) + max_tokens = token_budgets[0] resp = None - for attempt in range(_MAX_ESCALATION_ATTEMPTS + 1): + for attempt, max_tokens in enumerate(token_budgets): # Local models (Ollama) need longer timeout for first load timeout = 300 if model.startswith("ollama") else 55 call_kwargs = dict( @@ -609,14 +664,13 @@ async def score_image( call_kwargs["api_base"] = api_base resp = await litellm.acompletion(**call_kwargs) finish_reason = getattr(resp.choices[0], "finish_reason", "stop") - if finish_reason == "length" and attempt < _MAX_ESCALATION_ATTEMPTS: + if finish_reason == "length" and attempt < len(token_budgets) - 1: logger.info( "VLM response truncated (finish_reason=length) at %d tokens; " "escalating to %d tokens", max_tokens, - _ESCALATED_MAX_TOKENS, + token_budgets[attempt + 1], ) - max_tokens = _ESCALATED_MAX_TOKENS else: break @@ -651,6 +705,29 @@ async def score_image( logger.debug("VLM debug dump failed: %s", _dump_exc) parsed_json = parse_llm_json(scoring_text) + content_fidelity_gate = None + if resolved_content_lock is not None: + from vulca.content_lock import ( + build_blind_relation_gate, + build_content_fidelity_gate, + ) + + content_fidelity_gate = build_content_fidelity_gate( + resolved_content_lock, + parsed_json, + ) + if resolved_content_lock.required_relations: + blind_read = await _blind_relation_read( + img_b64=img_b64, + mime=mime, + api_key=api_key, + model=model, + api_base=api_base, + content_lock=resolved_content_lock, + ) + content_fidelity_gate.update( + build_blind_relation_gate(resolved_content_lock, blind_read) + ) # Use _parse_vlm_response to extract and validate all fields (including extras) parsed = _parse_vlm_response(parsed_json, extra_keys=extra_keys) @@ -673,6 +750,8 @@ async def score_image( data[f"{level}_reference_technique"] = ref_techniques.get(level, "") # Include risk_flags so _engine.py can read it from the flat dict data["risk_flags"] = parsed["risk_flags"] + if content_fidelity_gate is not None: + data["content_fidelity_gate"] = content_fidelity_gate # Store extra_keys and names in data so _engine.py can split core vs extra data["_extra_keys"] = extra_keys data["_extra_names"] = {e["key"]: e["name"] for e in extra_dims[:3]} @@ -690,3 +769,52 @@ async def score_image( fallback[f"{level}_observations"] = "" fallback[f"{level}_reference_technique"] = "" return fallback + + +async def _blind_relation_read( + *, + img_b64: str, + mime: str, + api_key: str, + model: str, + api_base: str | None, + content_lock, +) -> dict: + """Run an image-only relation read without caption or intended-relation anchors.""" + try: + from vulca._parse import parse_llm_json + from vulca.content_lock import build_blind_relation_read_prompt + + prompt = build_blind_relation_read_prompt(content_lock) + if not prompt: + return {} + + user_parts = [ + {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{img_b64}"}}, + {"type": "text", "text": prompt}, + ] + call_kwargs = dict( + model=model, + messages=[ + { + "role": "system", + "content": ( + "You are a strict image-only visual relationship reader. " + "Use only visible evidence in the image." + ), + }, + {"role": "user", "content": user_parts}, + ], + max_tokens=2048, + temperature=0.0, + api_key=api_key, + timeout=300 if model.startswith("ollama") else 55, + ) + if api_base: + call_kwargs["api_base"] = api_base + resp = await litellm.acompletion(**call_kwargs) + text = resp.choices[0].message.content.strip() + return parse_llm_json(text) + except Exception as exc: + logger.warning("Blind relation read failed: %s", exc) + return {"_error": str(exc)} diff --git a/src/vulca/cli.py b/src/vulca/cli.py index 54812d6f..0862820a 100644 --- a/src/vulca/cli.py +++ b/src/vulca/cli.py @@ -107,6 +107,16 @@ def main(argv: list[str] | None = None) -> None: create_p.add_argument("--ref-type", default="full", choices=["style", "composition", "full"], help="Reference type: style, composition, or full") create_p.add_argument("--colors", default="", help="Hex color palette (comma-separated, e.g. '#C87F4A,#5F8A50')") + create_p.add_argument( + "--content-lock", + action="store_true", + help="Treat explicit subjects and visible attributes in the intent as non-negotiable constraints", + ) + create_p.add_argument( + "--output-is-artwork-itself", + action="store_true", + help="Require the output to be the artwork surface itself, not a gallery/photo/mockup display", + ) create_p.add_argument("--output", "-o", default="", help="Save generated image to this path (default: ./vulca-.png)") # traditions command @@ -843,6 +853,33 @@ def _cmd_create(args: argparse.Namespace) -> None: node_params: dict[str, dict] = {} if weights: node_params["evaluate"] = {"custom_weights": weights} + if getattr(args, "content_lock", False) or getattr(args, "output_is_artwork_itself", False): + from vulca.content_lock import ContentLock, extract_content_lock + + artifact_boundary = ( + getattr(args, "output_is_artwork_itself", False) + or getattr(args, "content_lock", False) + ) + if getattr(args, "content_lock", False): + lock = extract_content_lock( + args.intent, + output_is_artwork_itself=artifact_boundary, + ) + else: + lock = ContentLock( + original_intent=" ".join(args.intent.strip().split()), + output_is_artwork_itself=artifact_boundary, + ) + if lock.has_requirements or lock.output_is_artwork_itself: + lock_data = lock.to_dict() + node_params["generate"] = { + **node_params.get("generate", {}), + "content_lock": lock_data, + } + node_params["evaluate"] = { + **node_params.get("evaluate", {}), + "content_lock": lock_data, + } pipeline_input = PipelineInput( subject=args.subject or args.intent, @@ -892,6 +929,8 @@ def _cmd_create(args: argparse.Namespace) -> None: reference=getattr(args, "reference", "") or "", ref_type=getattr(args, "ref_type", "full") or "full", colors=getattr(args, "colors", "") or "", + content_lock=getattr(args, "content_lock", False), + output_is_artwork_itself=getattr(args, "output_is_artwork_itself", False), ) except Exception as e: print(f"Error: {e}", file=sys.stderr) diff --git a/src/vulca/content_lock.py b/src/vulca/content_lock.py new file mode 100644 index 00000000..ddbb0477 --- /dev/null +++ b/src/vulca/content_lock.py @@ -0,0 +1,818 @@ +"""Content-lock helpers for caption-faithful generation and evaluation.""" + +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from typing import Any + + +@dataclass(frozen=True) +class ContentLock: + """Explicit user-requested content that should survive style optimization.""" + + original_intent: str + required_subjects: list[str] = field(default_factory=list) + required_text_elements: list[str] = field(default_factory=list) + required_surface: list[str] = field(default_factory=list) + required_style_attributes: list[str] = field(default_factory=list) + required_mood: list[str] = field(default_factory=list) + required_relations: list[dict[str, str]] = field(default_factory=list) + composition_intent: str = "" + forbidden_readings: list[str] = field(default_factory=list) + output_is_artwork_itself: bool = False + + @property + def has_requirements(self) -> bool: + return any( + ( + self.required_subjects, + self.required_text_elements, + self.required_surface, + self.required_style_attributes, + self.required_mood, + self.required_relations, + self.composition_intent, + self.forbidden_readings, + ) + ) + + def to_dict(self) -> dict[str, object]: + return { + "original_intent": self.original_intent, + "required_subjects": list(self.required_subjects), + "required_text_elements": list(self.required_text_elements), + "required_surface": list(self.required_surface), + "required_style_attributes": list(self.required_style_attributes), + "required_mood": list(self.required_mood), + "required_relations": [dict(relation) for relation in self.required_relations], + "composition_intent": self.composition_intent, + "forbidden_readings": list(self.forbidden_readings), + "output_is_artwork_itself": self.output_is_artwork_itself, + } + + +def content_lock_from_dict(data: dict[str, Any] | ContentLock) -> ContentLock: + if isinstance(data, ContentLock): + return data + allowed = { + "original_intent", + "required_subjects", + "required_text_elements", + "required_surface", + "required_style_attributes", + "required_mood", + "required_relations", + "composition_intent", + "forbidden_readings", + "output_is_artwork_itself", + } + cleaned = {key: value for key, value in data.items() if key in allowed} + return ContentLock( + original_intent=str(cleaned.get("original_intent") or ""), + required_subjects=_as_string_list(cleaned.get("required_subjects")), + required_text_elements=_as_string_list(cleaned.get("required_text_elements")), + required_surface=_as_string_list(cleaned.get("required_surface")), + required_style_attributes=_as_string_list(cleaned.get("required_style_attributes")), + required_mood=_as_string_list(cleaned.get("required_mood")), + required_relations=_as_relation_list(cleaned.get("required_relations")), + composition_intent=str(cleaned.get("composition_intent") or ""), + forbidden_readings=_as_string_list(cleaned.get("forbidden_readings")), + output_is_artwork_itself=bool(cleaned.get("output_is_artwork_itself")), + ) + + +def extract_content_lock( + intent: str, + *, + output_is_artwork_itself: bool = True, +) -> ContentLock: + """Extract explicit visual requirements from a short caption-like intent. + + This is intentionally conservative: it locks concrete, named objects and + visible text/material requirements, but leaves broad style words to the + tradition guidance unless they are clearly phrased as explicit attributes. + """ + text = " ".join(intent.strip().split()) + lower = text.lower() + + subjects = _extract_subjects(text) + subjects = _replace_known_subjects(subjects, lower) + subjects.extend(_extract_keyword_subjects(lower)) + ( + relation_subjects, + required_relations, + composition_intent, + forbidden_readings, + ) = _extract_relation_semantics(lower) + subjects.extend(relation_subjects) + + text_elements: list[str] = [] + if re.search(r"\bcircular calligraphy panel\b", lower): + text_elements.append("circular calligraphy panel") + elif re.search(r"\bvertical chinese calligraphy\b", lower): + text_elements.append("vertical Chinese calligraphy") + elif re.search(r"\bcalligraphy along the side\b", lower): + text_elements.append("calligraphy along the side") + elif re.search(r"\bcalligraphy\b", lower): + text_elements.append("calligraphy") + if re.search(r"\bred seals?\b", lower): + text_elements.append("red seals") + + surface: list[str] = [] + if re.search(r"\baged paper\b", lower): + surface.append("aged paper") + if re.search(r"\bgraph paper\b", lower): + surface.append("graph paper") + if re.search(r"\bpale beige silk ground\b", lower): + surface.append("pale beige silk ground") + if re.search(r"\bornate pale patterned border\b", lower): + surface.append("ornate pale patterned border") + + style_attributes: list[str] = [] + if re.search(r"\bgongbi vertical hanging scroll\b", lower): + style_attributes.append("Gongbi vertical hanging scroll") + elif re.search(r"\bvertical hanging scroll\b", lower): + style_attributes.append("vertical hanging scroll") + if re.search(r"\bgongbi album leaf\b", lower): + style_attributes.append("Gongbi album leaf") + if re.search(r"\brectangular frame\b", lower): + style_attributes.append("rectangular frame") + if re.search(r"\bmonochrome pencil style\b", lower): + style_attributes.append("monochrome pencil style") + if re.search(r"\bdelicate linework\b", lower): + style_attributes.append("delicate linework") + if re.search(r"\bmuted brown tones\b", lower): + style_attributes.append("muted brown tones") + if re.search(r"\bsparse brushwork\b", lower): + style_attributes.append("sparse brushwork") + + mood: list[str] = [] + if re.search(r"\bcalm scholarly composition\b", lower): + mood.append("calm scholarly composition") + + return ContentLock( + original_intent=text, + required_subjects=subjects, + required_text_elements=_dedupe(text_elements), + required_surface=_dedupe(surface), + required_style_attributes=_dedupe(style_attributes), + required_mood=_dedupe(mood), + required_relations=required_relations, + composition_intent=composition_intent, + forbidden_readings=forbidden_readings, + output_is_artwork_itself=output_is_artwork_itself, + ) + + +def build_content_lock_prompt(lock: ContentLock) -> str: + """Build generation instructions that make explicit content non-negotiable.""" + if not lock.has_requirements and not lock.output_is_artwork_itself: + return "" + + lines: list[str] = [] + if lock.output_is_artwork_itself: + lines.extend(_build_artifact_boundary_lines(lock.original_intent)) + + if lock.has_requirements: + if lines: + lines.append("") + lines.extend( + [ + "NON-NEGOTIABLE CONTENT REQUIREMENTS:", + ( + "The following requirements come from the user's explicit request " + "and must be satisfied before style optimization." + ), + ] + ) + if lock.required_subjects: + lines.append(f"- Required subjects: {', '.join(lock.required_subjects)}.") + if lock.required_text_elements: + lines.append( + f"- Required text/seal elements: {', '.join(lock.required_text_elements)}." + ) + if lock.required_surface: + lines.append(f"- Required surface/material: {', '.join(lock.required_surface)}.") + if lock.required_style_attributes: + lines.append( + f"- Required style attributes: {', '.join(lock.required_style_attributes)}." + ) + if lock.required_mood: + lines.append(f"- Required mood/composition: {', '.join(lock.required_mood)}.") + if lock.required_relations: + lines.append("RELATION SEMANTICS REQUIREMENTS:") + for relation in lock.required_relations: + lines.append(f"- {_format_required_relation(relation)}.") + if lock.composition_intent: + lines.append(f"COMPOSITION INTENT: {lock.composition_intent}.") + if lock.forbidden_readings: + lines.append(f"FORBIDDEN RELATION READINGS: {', '.join(lock.forbidden_readings)}.") + if lock.has_requirements: + lines.append( + "Do not replace these subjects with mountains, generic landscapes, " + "or unrelated tradition prototypes." + ) + lines.append( + "Do not render sample IDs, filenames, watermarks, large labels, gallery " + "walls, exhibition labels, framed museum installations, or photographed " + "artwork mockups unless the user explicitly requested them." + ) + lines.append( + "If any required subject is absent, the image is a failed response even " + "if the cultural style is strong." + ) + return "\n".join(lines) + + +def build_content_fidelity_prompt(lock: ContentLock) -> str: + """Build VLM scoring instructions for explicit content presence checks.""" + if not lock.has_requirements and not lock.output_is_artwork_itself: + return "" + + lines = [ + "CONTENT FIDELITY CHECK:", + ( + "Before final scoring, verify whether the artwork visibly contains " + "the user's non-negotiable content requirements." + ), + ] + if lock.required_subjects: + lines.append(f"- Required subjects: {', '.join(lock.required_subjects)}") + if lock.required_text_elements: + lines.append( + f"- Required text/seal elements: {', '.join(lock.required_text_elements)}" + ) + if lock.required_surface: + lines.append(f"- Required surface/material: {', '.join(lock.required_surface)}") + if lock.required_style_attributes: + lines.append( + f"- Required style attributes: {', '.join(lock.required_style_attributes)}" + ) + if lock.required_relations: + lines.append( + "- Required relations: " + f"{'; '.join(_format_required_relation(relation) for relation in lock.required_relations)}" + ) + if lock.composition_intent: + lines.append(f"- Required composition intent: {lock.composition_intent}") + if lock.forbidden_readings: + lines.append(f"- Forbidden relation readings: {', '.join(lock.forbidden_readings)}") + if lock.output_is_artwork_itself: + lines.append( + ( + "- Required artifact boundary: output_is_artwork_itself must be true; " + "the image must be the requested artwork surface, not a photo, " + "gallery scene, installation, catalog/mockup, or framed display." + ) + ) + lines.extend( + [ + ( + "Also check for forbidden visual artifacts: visible sample IDs, " + "filenames, watermarks, large labels, gallery walls, exhibition " + "labels, framed museum installations, and photographed artwork mockups." + ), + "Add these exact fields to the JSON inside :", + '"missing_required_subjects": [strings copied exactly from the required subjects list],', + '"missing_required_text_elements": [strings copied exactly from the required text/seal list],', + '"missing_required_surface": [strings copied exactly from the required surface/material list].', + '"missing_required_style_attributes": [strings copied exactly from the required style attributes list],', + '"apparent_relations": [short strings describing visible subject-relation-object readings],', + '"relation_semantics_failed": true or false,', + '"forbidden_readings_present": [strings copied from forbidden relation readings, or close visual readings],', + '"forbidden_visual_artifacts": [visible forbidden artifacts, or an empty list].', + '"unwanted_visible_text": true or false,', + '"output_is_artwork_itself": true or false.', + "Use an empty list when every item in a category is visible or no forbidden artifact is present.", + ] + ) + return "\n".join(lines) + + +def build_blind_relation_read_prompt(lock: ContentLock | dict[str, Any]) -> str: + """Build an image-only relation-reading prompt without caption anchors.""" + content_lock = content_lock_from_dict(lock) if isinstance(lock, dict) else lock + if not content_lock.required_relations: + return "" + + return "\n".join( + [ + "BLIND IMAGE RELATION READ:", + ( + "Describe only what is visible in the image. Do not use any " + "external prompt, sample id, filename, or expected story." + ), + ( + "Focus on visible relationships among people, animals, vehicles, " + "objects, threats, movement direction, gaze, weapons, gestures, " + "and safety cues." + ), + "Return exactly one JSON object with these fields:", + '"visible_entities": [short strings],', + ( + '"primary_reading": "one sentence describing the most natural ' + 'visible relationship reading",' + ), + ( + '"apparent_relations": [short subject-relation-object strings ' + 'visible in the image],' + ), + '"threat_cues": [short strings],', + '"safety_cues": [short strings],', + ( + '"ambiguous_readings": [short strings for plausible alternate ' + 'readings, or empty list],' + ), + '"confidence": number from 0.0 to 1.0.', + ] + ) + + +def build_blind_relation_gate( + lock: ContentLock | dict[str, Any], + blind_read: dict[str, Any] | None, +) -> dict[str, Any]: + """Compare image-only relation reading against required relations.""" + content_lock = content_lock_from_dict(lock) if isinstance(lock, dict) else lock + if not content_lock.required_relations: + return {"blind_relation_decision": "not_applicable"} + if not blind_read: + return { + "blind_relation_decision": "unavailable", + "blind_relation_reason": "blind relation read unavailable", + } + if blind_read.get("_error"): + return { + "blind_relation_decision": "unavailable", + "blind_relation_reason": str(blind_read.get("_error")), + } + + primary = str(blind_read.get("primary_reading") or "") + apparent = _as_string_list(blind_read.get("apparent_relations")) + ambiguous = _as_string_list(blind_read.get("ambiguous_readings")) + joined = " ".join([primary, *apparent]).lower() + forbidden_present = [ + reading + for reading in content_lock.forbidden_readings + if _relation_reading_matches(reading, joined) + ] + + decision = "pass" + reason = "blind read did not contradict required relations" + has_high_confidence_forbidden = any( + reading != "soldiers chasing civilians" for reading in forbidden_present + ) + if forbidden_present and has_high_confidence_forbidden: + decision = "reject" + reason = "blind read matches forbidden relation reading" + elif ambiguous: + decision = "hold" + reason = "blind read is ambiguous for required relations" + elif forbidden_present: + decision = "reject" + reason = "blind read matches forbidden relation reading" + + return { + "blind_relation_decision": decision, + "blind_relation_reason": reason, + "blind_primary_reading": primary, + "blind_apparent_relations": apparent, + "blind_ambiguous_readings": ambiguous, + "blind_forbidden_readings_present": forbidden_present, + } + + +def build_content_fidelity_gate( + lock: ContentLock | dict[str, Any], + scoring_data: dict[str, Any], +) -> dict[str, Any]: + """Create deterministic gate data from VLM missing-item fields.""" + content_lock = content_lock_from_dict(lock) if isinstance(lock, dict) else lock + apparent_relations = _as_string_list(scoring_data.get("apparent_relations")) + forbidden_artifacts = _as_string_list( + scoring_data.get("forbidden_visual_artifacts") + ) + inferred_artifacts = _infer_artifacts_from_readings(content_lock, apparent_relations) + unwanted_visible_text = _as_optional_bool(scoring_data.get("unwanted_visible_text")) + if "unrequested visible text labels" in inferred_artifacts: + unwanted_visible_text = True + return { + "required_subjects": list(content_lock.required_subjects), + "missing_required_subjects": _as_string_list( + scoring_data.get("missing_required_subjects") + ), + "required_text_elements": list(content_lock.required_text_elements), + "missing_required_text_elements": _as_string_list( + scoring_data.get("missing_required_text_elements") + ), + "required_surface": list(content_lock.required_surface), + "missing_required_surface": _as_string_list( + scoring_data.get("missing_required_surface") + ), + "required_style_attributes": list(content_lock.required_style_attributes), + "missing_required_style_attributes": _as_string_list( + scoring_data.get("missing_required_style_attributes") + ), + "required_relations": [ + dict(relation) for relation in content_lock.required_relations + ], + "apparent_relations": apparent_relations, + "relation_semantics_failed": _as_optional_bool( + scoring_data.get("relation_semantics_failed") + ), + "forbidden_readings": list(content_lock.forbidden_readings), + "forbidden_readings_present": _as_string_list( + scoring_data.get("forbidden_readings_present") + ), + "forbidden_visual_artifacts": _dedupe([*forbidden_artifacts, *inferred_artifacts]), + "required_output_is_artwork_itself": content_lock.output_is_artwork_itself, + "output_is_artwork_itself": _as_optional_bool( + scoring_data.get("output_is_artwork_itself") + ), + "unwanted_visible_text": unwanted_visible_text, + } + + +def apply_content_fidelity_gate(result: dict[str, Any], gate: dict[str, Any]) -> dict[str, Any]: + """Cap high scores when required caption content is known missing.""" + missing_subjects = _as_string_list(gate.get("missing_required_subjects")) + missing_text = _as_string_list(gate.get("missing_required_text_elements")) + missing_surface = _as_string_list(gate.get("missing_required_surface")) + missing_style = _as_string_list(gate.get("missing_required_style_attributes")) + relation_semantics_failed = ( + _as_optional_bool(gate.get("relation_semantics_failed")) is True + ) + forbidden_readings_present = _as_string_list( + gate.get("forbidden_readings_present") + ) + forbidden_artifacts = _as_string_list(gate.get("forbidden_visual_artifacts")) + required_artwork_itself = bool(gate.get("required_output_is_artwork_itself")) + output_is_artwork_itself = gate.get("output_is_artwork_itself") + unwanted_visible_text = gate.get("unwanted_visible_text") + artifact_boundary_failed = ( + required_artwork_itself and output_is_artwork_itself is False + ) + unwanted_text_failed = unwanted_visible_text is True + blind_relation_decision = str(gate.get("blind_relation_decision") or "") + blind_relation_failed = blind_relation_decision in {"reject", "hold"} + + if not ( + missing_subjects + or missing_text + or missing_surface + or missing_style + or relation_semantics_failed + or forbidden_readings_present + or forbidden_artifacts + or artifact_boundary_failed + or unwanted_text_failed + or blind_relation_failed + ): + return result + + updated = dict(result) + scores = dict(updated.get("scores") or {}) + for key in ("L1", "L3", "L4", "L5"): + scores[key] = min(float(scores.get(key, 0.0)), 0.25) + updated["scores"] = scores + updated["weighted_total"] = min(float(updated.get("weighted_total", 0.0)), 0.25) + + rationale_parts: list[str] = [] + if missing_subjects: + rationale_parts.append(f"Missing required subjects: {', '.join(missing_subjects)}") + if missing_text: + rationale_parts.append( + f"Missing required text elements: {', '.join(missing_text)}" + ) + if missing_surface: + rationale_parts.append( + f"Missing required surface/material: {', '.join(missing_surface)}" + ) + if missing_style: + rationale_parts.append( + f"Missing required style attributes: {', '.join(missing_style)}" + ) + if relation_semantics_failed: + rationale_parts.append("Relation semantics failed") + if forbidden_readings_present: + rationale_parts.append( + f"Forbidden relation readings: {', '.join(forbidden_readings_present)}" + ) + if forbidden_artifacts: + rationale_parts.append( + f"Forbidden visual artifacts: {', '.join(forbidden_artifacts)}" + ) + if artifact_boundary_failed: + rationale_parts.append("Output is not the artwork itself") + if unwanted_text_failed: + rationale_parts.append("Unwanted visible text") + if blind_relation_failed: + rationale_parts.append( + "Blind relation gate rejected image: " + f"{gate.get('blind_relation_reason') or blind_relation_decision}" + ) + rationales = dict(updated.get("rationales") or {}) + rationales["content_fidelity"] = "; ".join(rationale_parts) + updated["rationales"] = rationales + + risk_flags = list(updated.get("risk_flags") or []) + if "content_fidelity_failed" not in risk_flags: + risk_flags.append("content_fidelity_failed") + updated["risk_flags"] = risk_flags + updated["content_fidelity_gate"] = dict(gate) + return updated + + +def _extract_subjects(text: str) -> list[str]: + match = re.search( + r"\b(?:of|showing|featuring|depicting)\s+(.+?)(?:\s+" + r"(?:beside|with|on|under|against|in|at|near|over|around)\b|,\s+with\b|$)", + text, + flags=re.IGNORECASE, + ) + if not match: + return [] + + segment = match.group(1) + pieces = re.split(r",\s*(?:and\s+)?|\s+and\s+", segment) + subjects = [] + for piece in pieces: + normalized = _clean_subject(piece) + if normalized: + subjects.append(normalized) + return _dedupe(subjects) + + +def _infer_artifacts_from_readings( + lock: ContentLock, + apparent_relations: list[str], +) -> list[str]: + """Infer obvious artifact-boundary failures from VLM free-text readings.""" + joined = " ".join(apparent_relations).lower() + artifacts: list[str] = [] + if re.search( + r"\b(meme|memes|social|icon|icons|qr|chat|ui|interface|screen|app|" + r"notification|overlay|collage)\b", + joined, + ): + artifacts.append("modern UI/collage artifacts") + if _has_unrequested_text_label_reading(lock, joined): + artifacts.append("unrequested visible text labels") + return artifacts + + +def _has_unrequested_text_label_reading(lock: ContentLock, joined: str) -> bool: + if not re.search( + r"\b(english|label|labels|text|caption|captions|metadata|acquisition|" + r"condition report|concept|concepts|speech bubble)\b", + joined, + ): + return False + if re.search( + r"\b(english|metadata|acquisition|condition report|concept|concepts|" + r"speech bubble|sample id|filename)\b", + joined, + ): + return True + + allowed_text_terms = [ + *lock.required_text_elements, + "cyrillic" if "cyrillic" in lock.original_intent.lower() else "", + "calligraphy" if "calligraphy" in lock.original_intent.lower() else "", + "lettering" if "lettering" in lock.original_intent.lower() else "", + ] + normalized_allowed = [term.lower() for term in allowed_text_terms if term] + if normalized_allowed and any(term in joined for term in normalized_allowed): + return False + return True + + +def _replace_known_subjects(subjects: list[str], lower: str) -> list[str]: + known: list[str] = [] + if re.search(r"\bbamboo\b", lower): + known.append("bamboo") + if re.search(r"\borchid grasses?\b", lower): + known.append("orchid grasses") + elif re.search(r"\borchids?\b", lower): + known.append("orchids") + if known: + remaining = [ + subject + for subject in subjects + if "bamboo" not in subject and "orchid" not in subject + ] + return _dedupe([*known, *remaining]) + return subjects + + +def _build_artifact_boundary_lines(intent: str) -> list[str]: + lower = intent.lower() + lines = [ + "ARTIFACT BOUNDARY REQUIREMENT:", + ( + "The output image must be the artwork itself, not a photograph or " + "display of the artwork." + ), + ( + "Fill the entire canvas with the requested poster/scroll/album/artwork " + "surface." + ), + ( + "Do not include gallery walls, museum displays, framed mockups, " + "installation views, catalog layouts, UI screens, QR codes, filename " + "labels, sample IDs, watermarks, or unrequested readable text." + ), + "Do not show the artwork as an object in a room.", + ] + if re.search(r"\bposter\b|\bpropaganda poster\b", lower): + lines.append( + "Render a flat, front-facing propaganda poster artwork that fills the canvas." + ) + lines.append("Do not render a poster hanging on a wall or photographed in a room.") + if re.search(r"\bscroll\b|\balbum leaf\b|\balbum-leaf\b", lower): + lines.append("Render the scroll/album-leaf artwork as the primary image surface.") + lines.append( + "Do not render a gallery wall, catalog spread, side-by-side detail mockup, " + "or framed display." + ) + return lines + + +def _extract_keyword_subjects(lower: str) -> list[str]: + subjects: list[str] = [] + for pattern, label in ( + (r"\blotus blossoms?\b", "lotus blossoms"), + (r"\bslender stems?\b", "slender stems"), + (r"\bsmall leaves\b", "small leaves"), + (r"\bdense tree-like network\b", "dense tree-like network"), + (r"\bsmall heart\b", "small heart"), + (r"\bgeometric marks\b", "geometric marks"), + (r"\bsparse branches\b", "sparse branches"), + (r"\bworkers?\b", "workers"), + (r"\bred banners?\b", "red banners"), + ): + if re.search(pattern, lower): + subjects.append(label) + if re.search(r"\bhand-drawn branching lines\b", lower): + subjects.append("hand-drawn branching lines") + elif re.search(r"\bbranching lines\b", lower): + subjects.append("branching lines") + if re.search(r"\bfinely detailed bird\b", lower): + subjects.append("finely detailed bird") + elif re.search(r"\bsmall bird\b", lower): + subjects.append("small bird") + elif re.search(r"\bbird\b", lower): + subjects.append("bird") + return _dedupe(subjects) + + +def _extract_relation_semantics( + lower: str, +) -> tuple[list[str], list[dict[str, str]], str, list[str]]: + """Extract conservative subject-relation-object locks from narrative captions.""" + has_mounted_soldiers = bool( + re.search(r"\bmounted(?:\s+[a-z-]+){0,3}\s+soldiers?\b", lower) + ) + has_fleeing_civilians = bool( + re.search(r"\b(?:fleeing|evacuating|displaced)\s+civilians?\b", lower) + or re.search(r"\bcivilians?\s+(?:as\s+they\s+)?(?:flee|evacuate)\b", lower) + or re.search(r"\bcivilians?\s+(?:fleeing|evacuating|displaced)\b", lower) + ) + has_burning_village_ruins = bool( + re.search(r"\bburning village ruins?\b|\bburning villages?\b", lower) + ) + has_aircraft_overhead = bool( + re.search(r"\baircraft overhead\b|\baircraft\b|\bplanes? overhead\b", lower) + ) + + if not (has_mounted_soldiers and has_fleeing_civilians and has_burning_village_ruins): + return [], [], "", [] + + subjects = [ + "mounted soldiers", + "fleeing civilians", + "burning village ruins", + ] + relations = [ + { + "subject": "mounted soldiers", + "relation": "escort/protect", + "object": "fleeing civilians", + }, + { + "subject": "fleeing civilians", + "relation": "evacuate_from", + "object": "burning village ruins", + }, + ] + composition_intent = ( + "mounted soldiers must read as escort/protect figures for fleeing " + "civilians while the civilians evacuate from burning village ruins" + ) + if has_aircraft_overhead: + subjects.append("aircraft overhead") + relations.append( + { + "subject": "aircraft overhead", + "relation": "overhead_threat_or_wartime_context", + "object": "scene", + } + ) + composition_intent += ( + " and the aircraft overhead reads as wartime threat/context" + ) + + forbidden_readings = [ + "soldiers chasing civilians", + "soldiers attacking civilians", + "civilians threatened by soldiers", + ] + return subjects, relations, composition_intent, forbidden_readings + + +def _format_required_relation(relation: dict[str, str]) -> str: + subject = relation.get("subject", "").strip() + predicate = relation.get("relation", "").strip() + obj = relation.get("object", "").strip() + if subject and predicate and obj: + return f"{subject} must read as {predicate} {obj}" + if subject and predicate: + return f"{subject} must read as {predicate}" + return subject or predicate or obj + + +def _clean_subject(value: str) -> str: + value = value.strip(" .,:;") + value = re.sub(r"^(?:a|an|the)\s+", "", value, flags=re.IGNORECASE) + value = re.sub(r"\b(?:delicate|detailed|high-quality)\s+", "", value, flags=re.IGNORECASE) + return " ".join(value.split()) + + +def _as_string_list(value: Any) -> list[str]: + if not value: + return [] + if isinstance(value, str): + return [value] + if isinstance(value, (list, tuple)): + return [str(item) for item in value if str(item).strip()] + return [] + + +def _as_relation_list(value: Any) -> list[dict[str, str]]: + if not isinstance(value, (list, tuple)): + return [] + relations: list[dict[str, str]] = [] + for item in value: + if not isinstance(item, dict): + continue + subject = str(item.get("subject") or "").strip() + relation = str(item.get("relation") or "").strip() + obj = str(item.get("object") or "").strip() + if not (subject and relation and obj): + continue + relations.append({"subject": subject, "relation": relation, "object": obj}) + return relations + + +def _as_optional_bool(value: Any) -> bool | None: + if value is None: + return None + if isinstance(value, bool): + return value + if isinstance(value, str): + normalized = value.strip().lower() + if normalized in {"true", "yes", "1"}: + return True + if normalized in {"false", "no", "0"}: + return False + return None + + +def _relation_reading_matches(reading: str, joined: str) -> bool: + normalized = reading.lower() + has_soldiers = "soldier" in joined or "rider" in joined or "mounted" in joined + has_civilians = "civilian" in joined or "people" in joined or "refugee" in joined + if "chasing" in normalized: + return has_soldiers and has_civilians and re.search(r"\bchas\w*|\bpursu\w*", joined) is not None + if "attacking" in normalized: + return has_soldiers and has_civilians and re.search(r"\battack\w*|\bassault\w*|\bshoot\w*", joined) is not None + if "threatened" in normalized: + return has_soldiers and has_civilians and re.search( + r"\bthreat\w*|\bmenac\w*|\bbrandish\w*|\bdrawn\s+swords?\b|" + r"\bcharge\w*\b|\bweapon\w*", + joined, + ) is not None + return reading.lower() in joined + + +def _dedupe(values: list[str]) -> list[str]: + seen: set[str] = set() + result: list[str] = [] + for value in values: + key = value.lower() + if key in seen: + continue + seen.add(key) + result.append(value) + return result diff --git a/src/vulca/create.py b/src/vulca/create.py index 2dcf447d..b09429ed 100644 --- a/src/vulca/create.py +++ b/src/vulca/create.py @@ -25,6 +25,8 @@ async def acreate( reference: str = "", ref_type: str = "full", colors: str = "", + content_lock: bool = False, + output_is_artwork_itself: bool = False, ) -> CreateResult: """Create artwork via local pipeline or remote API (async). @@ -55,6 +57,12 @@ async def acreate( reference: Reference image path or base64. Also serves as sketch input -- providers treat both identically as ``reference_image_b64``. + content_lock: + Treat explicit subjects and visible attributes in the intent as + non-negotiable generation and evaluation requirements. + output_is_artwork_itself: + Treat the requested artwork as the full output surface. Generation and + evaluation should reject gallery/display/mockup artifacts. Returns ------- @@ -80,6 +88,9 @@ async def acreate( reference=reference, ref_type=ref_type, colors=colors, + api_key=api_key, + content_lock=content_lock, + output_is_artwork_itself=output_is_artwork_itself, ) return await _create_remote( intent, @@ -88,6 +99,8 @@ async def acreate( provider=provider, base_url=base_url, api_key=api_key, + content_lock=content_lock, + output_is_artwork_itself=output_is_artwork_itself, ) @@ -104,6 +117,9 @@ async def _create_local( reference: str = "", ref_type: str = "full", colors: str = "", + api_key: str = "", + content_lock: bool = False, + output_is_artwork_itself: bool = False, ) -> CreateResult: """Run the slim pipeline engine locally.""" from vulca._image import resolve_image_input @@ -117,6 +133,31 @@ async def _create_local( if weights: node_params["evaluate"] = {"custom_weights": weights} + if content_lock or output_is_artwork_itself: + from vulca.content_lock import ContentLock, extract_content_lock + + artifact_boundary = output_is_artwork_itself or content_lock + if content_lock: + lock = extract_content_lock( + intent, + output_is_artwork_itself=artifact_boundary, + ) + else: + lock = ContentLock( + original_intent=" ".join(intent.strip().split()), + output_is_artwork_itself=artifact_boundary, + ) + if lock.has_requirements or lock.output_is_artwork_itself: + lock_data = lock.to_dict() + node_params["generate"] = { + **node_params.get("generate", {}), + "content_lock": lock_data, + } + node_params["evaluate"] = { + **node_params.get("evaluate", {}), + "content_lock": lock_data, + } + # Inject reference/colors into generate node params gen_params: dict[str, Any] = {} if reference: @@ -132,6 +173,7 @@ async def _create_local( intent=intent, tradition=tradition or "default", provider=provider, + api_key=api_key, node_params=node_params, image_provider=image_provider, eval_mode=eval_mode, @@ -163,6 +205,7 @@ async def _create_local( elif event.payload.get("image_b64"): best_image_b64 = event.payload["image_b64"] + output_dict = output.to_dict() return CreateResult( session_id=output.session_id, mode="create", @@ -178,12 +221,16 @@ async def _create_local( rounds=[r.to_dict() for r in output.rounds], summary=output.summary, recommendations=output.recommendations, + risk_flags=output.risk_flags, + content_fidelity_gate=output.content_fidelity_gate, + evaluation_source=output.evaluation_source, + evaluation_error=output.evaluation_error, suggestions=suggestions, deviation_types=deviation_types, eval_mode=eval_mode, latency_ms=output.total_latency_ms, cost_usd=output.total_cost_usd, - raw=output.to_dict(), + raw=output_dict, ) @@ -195,6 +242,8 @@ async def _create_remote( provider: str = "nb2", base_url: str = "", api_key: str = "", + content_lock: bool = False, + output_is_artwork_itself: bool = False, ) -> CreateResult: """Call remote VULCA API for creation.""" import httpx @@ -209,6 +258,10 @@ async def _create_remote( "provider": provider, "stream": False, } + if content_lock: + body["content_lock"] = True + if output_is_artwork_itself or content_lock: + body["output_is_artwork_itself"] = True headers = {"Content-Type": "application/json"} if key: @@ -235,6 +288,10 @@ async def _create_remote( rounds=data.get("rounds") or [], summary=data.get("summary") or "", recommendations=data.get("recommendations") or [], + risk_flags=data.get("risk_flags") or [], + content_fidelity_gate=data.get("content_fidelity_gate") or {}, + evaluation_source=data.get("evaluation_source") or "", + evaluation_error=data.get("evaluation_error") or "", latency_ms=data.get("latency_ms", 0), cost_usd=data.get("cost_usd", 0.0), raw=data, @@ -257,6 +314,8 @@ def create( reference: str = "", ref_type: str = "full", colors: str = "", + content_lock: bool = False, + output_is_artwork_itself: bool = False, ) -> CreateResult: """Create artwork (synchronous wrapper). @@ -282,6 +341,8 @@ def create( reference=reference, ref_type=ref_type, colors=colors, + content_lock=content_lock, + output_is_artwork_itself=output_is_artwork_itself, ) if loop and loop.is_running(): diff --git a/src/vulca/pipeline/engine.py b/src/vulca/pipeline/engine.py index bfb85f72..90476637 100644 --- a/src/vulca/pipeline/engine.py +++ b/src/vulca/pipeline/engine.py @@ -574,6 +574,10 @@ async def _run_one(name: str, _nodes: dict = node_instances, _ctx: object = ctx) total_rounds=len(rounds), total_latency_ms=total_ms, total_cost_usd=ctx.cost_usd, + risk_flags=ctx.get("risk_flags", []), + content_fidelity_gate=ctx.get("content_fidelity_gate", {}) or {}, + evaluation_source=ctx.get("evaluation_source", ""), + evaluation_error=ctx.get("evaluation_error", ""), summary=summary, original_intent=pipeline_input.intent or pipeline_input.subject, original_provider=pipeline_input.provider, diff --git a/src/vulca/pipeline/nodes/evaluate.py b/src/vulca/pipeline/nodes/evaluate.py index ff6d69e0..5c96f021 100644 --- a/src/vulca/pipeline/nodes/evaluate.py +++ b/src/vulca/pipeline/nodes/evaluate.py @@ -35,14 +35,17 @@ async def run(self, ctx: NodeContext) -> dict[str, Any]: if not img_b64: logger.warning("EvaluateNode: no image_b64 in context, using mock scores") result = self._mock_scores(ctx) - return self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + merged = self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + return self._apply_content_fidelity_gate(ctx, merged) if VLM_SCORING not in provider_capabilities(ctx.provider) or not ctx.api_key: result = self._mock_scores(ctx) - return self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + merged = self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + return self._apply_content_fidelity_gate(ctx, merged) result = await self._vlm_scores(ctx, img_b64, img_mime) - return self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + merged = self._merge_algo_scores(result, algo_scores, algo_covered_dims, weights) + return self._apply_content_fidelity_gate(ctx, merged) @staticmethod def _detect_algo_coverage( @@ -151,6 +154,23 @@ def _get_weights(ctx: NodeContext) -> dict[str, float]: from vulca.cultural import get_weights return get_weights(ctx.tradition) + @staticmethod + def _apply_content_fidelity_gate( + ctx: NodeContext, + result: dict[str, Any], + ) -> dict[str, Any]: + node_params = ctx.get("node_params") or {} + eval_params = node_params.get("evaluate") or {} + gate = result.get("content_fidelity_gate") or eval_params.get( + "content_fidelity_gate" + ) + if not gate: + return result + + from vulca.content_lock import apply_content_fidelity_gate + + return apply_content_fidelity_gate(result, gate) + @staticmethod def _apply_locked_dimensions( new_scores: dict[str, float], @@ -206,6 +226,8 @@ async def _vlm_scores( from vulca._vlm import score_image eval_mode = ctx.get("eval_mode", "strict") + node_params = ctx.get("node_params") or {} + eval_params = node_params.get("evaluate") or {} data = await score_image( img_b64=img_b64, @@ -214,19 +236,22 @@ async def _vlm_scores( tradition=ctx.tradition, api_key=ctx.api_key, mode=eval_mode, + content_lock=eval_params.get("content_lock"), ) # If VLM failed (quota/network error), fall back to mock scores if data.get("error"): logger.warning("VLM scoring failed, falling back to mock: %s", data["error"]) - return EvaluateNode._mock_scores(ctx) + fallback = EvaluateNode._mock_scores(ctx) + fallback["evaluation_source"] = "mock_fallback" + fallback["evaluation_error"] = str(data["error"]) + return fallback scores = {f"L{i}": data.get(f"L{i}", 0.0) for i in range(1, 6)} rationales = { f"L{i}_rationale": data.get(f"L{i}_rationale", "") for i in range(1, 6) } - node_params = ctx.get("node_params") or {} locked_vlm: list[str] = (node_params.get("evaluate") or {}).get("locked_dimensions", []) previous_vlm: dict[str, float] = ctx.get("scores") or {} if locked_vlm and previous_vlm: @@ -238,5 +263,9 @@ async def _vlm_scores( return { "scores": scores, "rationales": rationales, + "risk_flags": data.get("risk_flags", []), "weighted_total": round(weighted_total, 4), + "content_fidelity_gate": data.get("content_fidelity_gate"), + "evaluation_source": "vlm", + "evaluation_error": "", } diff --git a/src/vulca/pipeline/nodes/generate.py b/src/vulca/pipeline/nodes/generate.py index cec9eb5c..89aa92a4 100644 --- a/src/vulca/pipeline/nodes/generate.py +++ b/src/vulca/pipeline/nodes/generate.py @@ -6,6 +6,7 @@ import base64 import hashlib import logging +import re import time from typing import Any @@ -52,6 +53,10 @@ } +def _looks_like_sample_id(value: str) -> bool: + return bool(re.fullmatch(r"[a-z][a-z0-9]*[_-]\d{2,}", value.strip(), re.IGNORECASE)) + + class GenerateNode(PipelineNode): """Generate an image from a text prompt via the Provider Registry. @@ -91,6 +96,24 @@ async def _provider_generate( from vulca.providers import get_image_provider prompt = ctx.get("prompt") or ctx.subject or ctx.intent + node_params = ctx.get("node_params") or {} + gen_params = node_params.get("generate") or {} + + content_lock_data = gen_params.get("content_lock") + if content_lock_data: + from vulca.content_lock import ( + build_content_lock_prompt, + content_lock_from_dict, + ) + + content_lock = content_lock_from_dict(content_lock_data) + lock_prompt = build_content_lock_prompt(content_lock) + if lock_prompt: + original_prompt = ctx.intent or prompt + prompt = ( + f"{lock_prompt}\n\n" + f"USER INTENT TO PRESERVE VERBATIM:\n{original_prompt}" + ) # Build extra kwargs with cultural guidance + improvement instructions extra_kwargs: dict[str, Any] = {} @@ -106,8 +129,6 @@ async def _provider_generate( # Resolve reference image (top-level or node_params) ref_b64 = ctx.get("reference_image_b64") or "" - node_params = ctx.get("node_params") or {} - gen_params = node_params.get("generate") or {} if not ref_b64: ref_b64 = gen_params.get("reference_image_b64", "") @@ -140,11 +161,15 @@ async def _provider_generate( provider_name, api_key=ctx.api_key ) + subject_for_provider = ctx.subject or "" + if content_lock_data and _looks_like_sample_id(subject_for_provider): + subject_for_provider = "" + result = await asyncio.wait_for( provider_instance.generate( prompt, tradition=ctx.tradition, - subject=ctx.subject or "", + subject=subject_for_provider, reference_image_b64=ref_b64, **extra_kwargs, ), @@ -236,7 +261,10 @@ def _mock_generate(ctx: NodeContext) -> dict[str, Any]: tradition = ctx.tradition or "default" bg = GenerateNode._TRADITION_COLORS.get(tradition, "#5F8A50") tradition_display = tradition.replace("_", " ").title() - subject_display = (ctx.subject or "Untitled")[:50] + subject_value = ctx.subject or "Untitled" + if _looks_like_sample_id(subject_value): + subject_value = "Untitled" + subject_display = subject_value[:50] # Escape XML special characters for old, new in [("&", "&"), ("<", "<"), (">", ">"), ('"', """)]: subject_display = subject_display.replace(old, new) diff --git a/src/vulca/pipeline/types.py b/src/vulca/pipeline/types.py index 70ff59f5..946f537d 100644 --- a/src/vulca/pipeline/types.py +++ b/src/vulca/pipeline/types.py @@ -140,6 +140,9 @@ class PipelineOutput: total_cost_usd: float = 0.0 risk_flags: list[str] = field(default_factory=list) recommendations: list[str] = field(default_factory=list) + content_fidelity_gate: dict[str, Any] = field(default_factory=dict) + evaluation_source: str = "" + evaluation_error: str = "" interrupted_at: str = "" summary: str = "" # Preserved for HITL resume — original user inputs @@ -165,6 +168,9 @@ def to_dict(self) -> dict[str, Any]: "total_cost_usd": self.total_cost_usd, "risk_flags": self.risk_flags, "recommendations": self.recommendations, + "content_fidelity_gate": self.content_fidelity_gate, + "evaluation_source": self.evaluation_source, + "evaluation_error": self.evaluation_error, "interrupted_at": self.interrupted_at, "summary": self.summary, "original_intent": self.original_intent, diff --git a/src/vulca/providers/gemini.py b/src/vulca/providers/gemini.py index e9206854..82c6db76 100644 --- a/src/vulca/providers/gemini.py +++ b/src/vulca/providers/gemini.py @@ -81,6 +81,15 @@ def _build_visible_mask_reference(mask_bytes: bytes) -> bytes: return buf.getvalue() +def _iter_image_parts(response: object): + candidates = getattr(response, "candidates", None) or [] + for candidate in candidates: + content = getattr(candidate, "content", None) + parts = getattr(content, "parts", None) or [] + for part in parts: + yield part + + class GeminiImageProvider: """Image generation via Google Gemini API. @@ -217,19 +226,20 @@ async def _call() -> object: retryable_check=_is_retryable, ) - if response.candidates: - for part in response.candidates[0].content.parts: - if part.inline_data and part.inline_data.mime_type.startswith("image/"): - img_b64 = base64.b64encode(part.inline_data.data).decode() - return ImageResult( - image_b64=img_b64, - mime=part.inline_data.mime_type, - metadata={ - "model": self.model, - "image_size": image_size, - "aspect_ratio": aspect_ratio, - }, - ) + for part in _iter_image_parts(response): + inline_data = getattr(part, "inline_data", None) + mime_type = getattr(inline_data, "mime_type", "") + if inline_data and mime_type.startswith("image/"): + img_b64 = base64.b64encode(inline_data.data).decode() + return ImageResult( + image_b64=img_b64, + mime=mime_type, + metadata={ + "model": self.model, + "image_size": image_size, + "aspect_ratio": aspect_ratio, + }, + ) # No image data — classify the failure before surfacing the error so # users get an actionable remediation hint instead of a generic @@ -375,20 +385,21 @@ async def _call() -> object: retryable_check=_is_retryable, ) - if response.candidates: - for part in response.candidates[0].content.parts: - if part.inline_data and part.inline_data.mime_type.startswith("image/"): - img_b64 = base64.b64encode(part.inline_data.data).decode() - return ImageResult( - image_b64=img_b64, - mime=part.inline_data.mime_type, - metadata={ - "model": self.model, - "mode": "gemini_mask_adapter", - "image_size": image_size, - "aspect_ratio": aspect_ratio, - }, - ) + for part in _iter_image_parts(response): + inline_data = getattr(part, "inline_data", None) + mime_type = getattr(inline_data, "mime_type", "") + if inline_data and mime_type.startswith("image/"): + img_b64 = base64.b64encode(inline_data.data).decode() + return ImageResult( + image_b64=img_b64, + mime=mime_type, + metadata={ + "model": self.model, + "mode": "gemini_mask_adapter", + "image_size": image_size, + "aspect_ratio": aspect_ratio, + }, + ) prompt_feedback = getattr(response, "prompt_feedback", None) block_reason = getattr(prompt_feedback, "block_reason", None) if prompt_feedback else None diff --git a/src/vulca/types.py b/src/vulca/types.py index 4a237a0f..606f69c2 100644 --- a/src/vulca/types.py +++ b/src/vulca/types.py @@ -162,6 +162,18 @@ class CreateResult: recommendations: list[str] = field(default_factory=list) """Actionable recommendations.""" + risk_flags: list[str] = field(default_factory=list) + """Risk and gate flags from evaluation, e.g. content_fidelity_failed.""" + + content_fidelity_gate: dict = field(default_factory=dict) + """Content-lock/artifact-boundary audit fields used by the final score gate.""" + + evaluation_source: str = "" + """Scoring source for the final candidate, e.g. vlm, mock, or mock_fallback.""" + + evaluation_error: str = "" + """Non-empty when scoring fell back after a VLM or parser error.""" + suggestions: dict[str, str] = field(default_factory=dict) """Per-dimension actionable suggestions (L1→suggestion text).""" diff --git a/tests/test_cli_create_output.py b/tests/test_cli_create_output.py index 6a334cea..3597fc1c 100644 --- a/tests/test_cli_create_output.py +++ b/tests/test_cli_create_output.py @@ -5,6 +5,7 @@ import subprocess import sys from pathlib import Path +from unittest.mock import patch import pytest @@ -78,3 +79,41 @@ def test_create_image_is_valid_png(self, tmp_path): # PNG magic bytes: 89 50 4E 47 data = out_file.read_bytes() assert data[:4] == b'\x89PNG' or len(data) > 0, "File should be valid PNG or non-empty image" + + def test_create_cli_accepts_content_lock_flag(self, capsys): + """create --content-lock should pass content_lock=True to the API.""" + from vulca.cli import main + from vulca.types import CreateResult + + with patch("vulca.create", return_value=CreateResult(session_id="s1")) as mock_create: + main([ + "create", + "Ink and wash painting of bamboo beside calligraphy.", + "--content-lock", + "--provider", + "mock", + "--json", + ]) + + captured = capsys.readouterr() + assert '"session_id": "s1"' in captured.out + assert mock_create.call_args.kwargs["content_lock"] is True + + def test_create_cli_accepts_output_is_artwork_itself_flag(self, capsys): + """create --output-is-artwork-itself should pass the artifact-boundary flag.""" + from vulca.cli import main + from vulca.types import CreateResult + + with patch("vulca.create", return_value=CreateResult(session_id="s1")) as mock_create: + main([ + "create", + "Socialist Realism propaganda poster with workers.", + "--output-is-artwork-itself", + "--provider", + "mock", + "--json", + ]) + + captured = capsys.readouterr() + assert '"session_id": "s1"' in captured.out + assert mock_create.call_args.kwargs["output_is_artwork_itself"] is True diff --git a/tests/test_content_lock.py b/tests/test_content_lock.py new file mode 100644 index 00000000..e510b0ef --- /dev/null +++ b/tests/test_content_lock.py @@ -0,0 +1,624 @@ +from __future__ import annotations + +from vulca.content_lock import ( + ContentLock, + apply_content_fidelity_gate, + build_blind_relation_gate, + build_blind_relation_read_prompt, + build_content_fidelity_gate, + build_content_fidelity_prompt, + build_content_lock_prompt, + extract_content_lock, +) + + +def test_extracts_required_subjects_and_attributes_from_caption(): + lock = extract_content_lock( + "Ink and wash painting of delicate bamboo and orchid grasses beside " + "vertical Chinese calligraphy and red seals on aged paper, with sparse " + "brushwork and a calm scholarly composition." + ) + + assert lock.required_subjects == ["bamboo", "orchid grasses"] + assert lock.required_text_elements == ["vertical Chinese calligraphy", "red seals"] + assert lock.required_surface == ["aged paper"] + assert "sparse brushwork" in lock.required_style_attributes + assert "calm scholarly composition" in lock.required_mood + + +def test_extracts_generic_required_subjects_from_caption(): + lock = extract_content_lock( + "Editorial illustration of a silver astronaut, cracked moon rover, and " + "orange emergency flare under a black sky." + ) + + assert lock.required_subjects == [ + "silver astronaut", + "cracked moon rover", + "orange emergency flare", + ] + + +def test_extracts_declarative_graph_paper_branching_caption(): + lock = extract_content_lock( + "Abstract hand-drawn branching lines fill a rectangular frame on graph " + "paper, forming a dense tree-like network with a small heart and " + "geometric marks in monochrome pencil style." + ) + + assert "hand-drawn branching lines" in lock.required_subjects + assert "dense tree-like network" in lock.required_subjects + assert "small heart" in lock.required_subjects + assert "geometric marks" in lock.required_subjects + assert lock.required_surface == ["graph paper"] + assert "rectangular frame" in lock.required_style_attributes + assert "monochrome pencil style" in lock.required_style_attributes + + +def test_extracts_gongbi_album_leaf_subjects_and_format(): + lock = extract_content_lock( + "Gongbi album leaf with a small bird perched beside sparse branches, " + "a circular calligraphy panel, and an ornate pale patterned border." + ) + + assert "small bird" in lock.required_subjects + assert "sparse branches" in lock.required_subjects + assert "circular calligraphy panel" in lock.required_text_elements + assert "ornate pale patterned border" in lock.required_surface + assert "Gongbi album leaf" in lock.required_style_attributes + + +def test_extracts_relation_semantics_for_escort_evacuation_caption(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + assert "mounted soldiers" in lock.required_subjects + assert "fleeing civilians" in lock.required_subjects + assert "burning village ruins" in lock.required_subjects + assert "aircraft overhead" in lock.required_subjects + assert lock.required_relations == [ + { + "subject": "mounted soldiers", + "relation": "escort/protect", + "object": "fleeing civilians", + }, + { + "subject": "fleeing civilians", + "relation": "evacuate_from", + "object": "burning village ruins", + }, + { + "subject": "aircraft overhead", + "relation": "overhead_threat_or_wartime_context", + "object": "scene", + }, + ] + assert "soldiers chasing civilians" in lock.forbidden_readings + assert "escort/protect" in lock.composition_intent + + +def test_extracts_relation_semantics_with_modifier_between_mounted_and_soldiers(): + lock = extract_content_lock( + "A Socialist Realism poster with mounted Soviet soldiers escorting and " + "protecting civilians as they flee burning village ruins, aircraft overhead." + ) + + assert "mounted soldiers" in lock.required_subjects + assert "fleeing civilians" in lock.required_subjects + assert "aircraft overhead" in lock.required_subjects + assert lock.required_relations[0] == { + "subject": "mounted soldiers", + "relation": "escort/protect", + "object": "fleeing civilians", + } + assert "soldiers chasing civilians" in lock.forbidden_readings + + +def test_content_lock_prompt_makes_subjects_non_negotiable(): + lock = extract_content_lock( + "Ink and wash painting of delicate bamboo and orchid grasses beside " + "vertical Chinese calligraphy and red seals on aged paper." + ) + + prompt = build_content_lock_prompt(lock) + + assert "NON-NEGOTIABLE CONTENT REQUIREMENTS" in prompt + assert "bamboo" in prompt + assert "orchid grasses" in prompt + assert "vertical Chinese calligraphy" in prompt + assert "red seals" in prompt + assert ( + "Do not replace these subjects with mountains, generic landscapes, " + "or unrelated tradition prototypes." + ) in prompt + + +def test_content_lock_prompt_bans_visible_ids_and_gallery_artifacts(): + lock = extract_content_lock( + "Abstract hand-drawn branching lines fill a rectangular frame on graph " + "paper in monochrome pencil style." + ) + + prompt = build_content_lock_prompt(lock) + + assert "sample IDs" in prompt + assert "gallery" in prompt.lower() + assert "large labels" in prompt + + +def test_content_lock_prompt_makes_relations_non_negotiable(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + prompt = build_content_lock_prompt(lock) + + assert "RELATION SEMANTICS REQUIREMENTS" in prompt + assert "mounted soldiers must read as escort/protect fleeing civilians" in prompt + assert "fleeing civilians must read as evacuate_from burning village ruins" in prompt + assert "COMPOSITION INTENT" in prompt + assert "FORBIDDEN RELATION READINGS" in prompt + assert "soldiers chasing civilians" in prompt + + +def test_blind_relation_prompt_does_not_anchor_on_caption_or_forbidden_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + prompt = build_blind_relation_read_prompt(lock) + + assert "caption" not in prompt.lower() + assert "escort" not in prompt.lower() + assert "protect" not in prompt.lower() + assert "soldiers chasing civilians" not in prompt.lower() + assert "visible relationships" in prompt.lower() + + +def test_blind_relation_gate_rejects_forbidden_primary_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": "Mounted soldiers appear to chase fleeing civilians.", + "apparent_relations": ["mounted soldiers chasing civilians"], + "ambiguous_readings": [], + }, + ) + + assert gate["blind_relation_decision"] == "reject" + assert "soldiers chasing civilians" in gate["blind_forbidden_readings_present"] + + +def test_blind_relation_gate_rejects_weapon_threat_to_civilians(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": ( + "Soldiers on horseback charge forward with drawn swords past " + "fleeing civilians." + ), + "apparent_relations": [ + "soldiers brandish swords", + "civilians flee from fire", + ], + "ambiguous_readings": [], + }, + ) + + assert gate["blind_relation_decision"] == "reject" + assert "civilians threatened by soldiers" in gate[ + "blind_forbidden_readings_present" + ] + + +def test_blind_relation_gate_rejects_weapon_threat_even_with_ambiguity(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": ( + "Soldiers on horseback charge forward with drawn swords past " + "fleeing civilians." + ), + "apparent_relations": [ + "soldiers brandish swords", + "civilians flee from fire", + ], + "ambiguous_readings": [ + "soldiers could be arriving to defend or passing through" + ], + }, + ) + + assert gate["blind_relation_decision"] == "reject" + assert "civilians threatened by soldiers" in gate[ + "blind_forbidden_readings_present" + ] + + +def test_blind_relation_gate_holds_ambiguous_relation_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": "The riders could be escorting or pursuing the civilians.", + "apparent_relations": ["riders behind fleeing civilians"], + "ambiguous_readings": ["escort or pursuit"], + }, + ) + + assert gate["blind_relation_decision"] == "hold" + assert gate["blind_ambiguous_readings"] == ["escort or pursuit"] + + +def test_blind_relation_gate_passes_clear_escort_reading(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_blind_relation_gate( + lock, + { + "primary_reading": ( + "Mounted soldiers flank civilians and guide them away from burning ruins." + ), + "apparent_relations": [ + "mounted soldiers guiding civilians away from burning ruins" + ], + "ambiguous_readings": [], + }, + ) + + assert gate["blind_relation_decision"] == "pass" + + +def test_artifact_boundary_prompt_for_poster_requires_flat_artwork_surface(): + lock = ContentLock( + original_intent="Socialist Realism propaganda poster with workers and red banners.", + output_is_artwork_itself=True, + ) + + prompt = build_content_lock_prompt(lock) + + assert "ARTIFACT BOUNDARY REQUIREMENT" in prompt + assert "artwork itself" in prompt + assert "flat, front-facing propaganda poster artwork" in prompt + assert "poster hanging on a wall" in prompt + + +def test_artifact_boundary_prompt_for_scroll_rejects_catalog_displays(): + lock = ContentLock( + original_intent="A Gongbi vertical hanging scroll with lotus blossoms.", + output_is_artwork_itself=True, + ) + + prompt = build_content_lock_prompt(lock) + + assert "scroll/album-leaf artwork as the primary image surface" in prompt + assert "catalog spread" in prompt + assert "framed display" in prompt + + +def test_content_fidelity_prompt_requests_missing_elements(): + lock = extract_content_lock( + "Ink and wash painting of bamboo beside vertical Chinese calligraphy." + ) + + prompt = build_content_fidelity_prompt(lock) + + assert "CONTENT FIDELITY CHECK" in prompt + assert "missing_required_subjects" in prompt + assert "missing_required_text_elements" in prompt + assert "bamboo" in prompt + assert "vertical Chinese calligraphy" in prompt + assert "forbidden_visual_artifacts" in prompt + assert "output_is_artwork_itself" in prompt + assert "unwanted_visible_text" in prompt + + +def test_content_fidelity_prompt_requests_relation_semantics_fields(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + prompt = build_content_fidelity_prompt(lock) + + assert "Required relations" in prompt + assert "Forbidden relation readings" in prompt + assert "apparent_relations" in prompt + assert "relation_semantics_failed" in prompt + assert "forbidden_readings_present" in prompt + + +def test_content_fidelity_prompt_requests_missing_style_attributes(): + lock = extract_content_lock( + "Abstract hand-drawn branching lines fill a rectangular frame on graph " + "paper in monochrome pencil style." + ) + + prompt = build_content_fidelity_prompt(lock) + + assert "missing_required_style_attributes" in prompt + assert "rectangular frame" in prompt + assert "monochrome pencil style" in prompt + + +def test_missing_required_subject_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_subjects": ["bamboo", "orchid grasses"], + "missing_required_subjects": ["bamboo", "orchid grasses"], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert gated["scores"]["L3"] <= 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert ( + "Missing required subjects: bamboo, orchid grasses" + in gated["rationales"]["content_fidelity"] + ) + + +def test_missing_required_text_element_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_text_elements": ["vertical Chinese calligraphy"], + "missing_required_text_elements": ["vertical Chinese calligraphy"], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert "Missing required text elements" in gated["rationales"]["content_fidelity"] + + +def test_missing_required_style_attribute_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_style_attributes": ["monochrome pencil style"], + "missing_required_style_attributes": ["monochrome pencil style"], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert "Missing required style attributes" in gated["rationales"]["content_fidelity"] + + +def test_forbidden_visual_artifact_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "forbidden_visual_artifacts": ["visible sample ID", "gallery photo mockup"], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert "Forbidden visual artifacts" in gated["rationales"]["content_fidelity"] + + +def test_artifact_boundary_violation_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_output_is_artwork_itself": True, + "output_is_artwork_itself": False, + "unwanted_visible_text": True, + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert "Output is not the artwork itself" in gated["rationales"]["content_fidelity"] + assert "Unwanted visible text" in gated["rationales"]["content_fidelity"] + + +def test_content_fidelity_gate_reads_artifact_boundary_fields(): + lock = ContentLock( + original_intent="Graph-paper branching pencil drawing.", + output_is_artwork_itself=True, + ) + + gate = build_content_fidelity_gate( + lock, + { + "forbidden_visual_artifacts": ["gallery wall"], + "unwanted_visible_text": True, + "output_is_artwork_itself": False, + }, + ) + + assert gate["required_output_is_artwork_itself"] is True + assert gate["output_is_artwork_itself"] is False + assert gate["unwanted_visible_text"] is True + assert gate["forbidden_visual_artifacts"] == ["gallery wall"] + + +def test_content_fidelity_gate_infers_modern_ui_text_artifacts(): + lock = extract_content_lock( + "A Socialist Realism propaganda poster of a triumphant armored rider " + "on horseback beside Soviet soldiers with rifles, fallen enemies in " + "the foreground, and a fortified city rising in the background under " + "bold Cyrillic lettering." + ) + + gate = build_content_fidelity_gate( + lock, + { + "apparent_relations": [ + "rider-leads-soldiers", + "memes-overlay-history", + "text-labels-concepts", + "social icons frame the city", + ], + "forbidden_visual_artifacts": [], + "unwanted_visible_text": False, + "output_is_artwork_itself": True, + }, + ) + + assert gate["unwanted_visible_text"] is True + assert "modern UI/collage artifacts" in gate["forbidden_visual_artifacts"] + assert "unrequested visible text labels" in gate["forbidden_visual_artifacts"] + + +def test_content_fidelity_gate_rejects_english_labels_even_with_allowed_cyrillic(): + lock = extract_content_lock( + "A Socialist Realism propaganda poster of a triumphant armored rider " + "under bold Cyrillic lettering." + ) + + gate = build_content_fidelity_gate( + lock, + { + "apparent_relations": [ + "bold Cyrillic headline spans the poster", + "English text labels explain concepts in the city", + ], + "forbidden_visual_artifacts": [], + "unwanted_visible_text": False, + "output_is_artwork_itself": True, + }, + ) + + assert gate["unwanted_visible_text"] is True + assert "unrequested visible text labels" in gate["forbidden_visual_artifacts"] + + +def test_content_fidelity_gate_reads_relation_semantics_fields(): + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + + gate = build_content_fidelity_gate( + lock, + { + "apparent_relations": ["mounted soldiers appear to chase civilians"], + "relation_semantics_failed": True, + "forbidden_readings_present": ["soldiers chasing civilians"], + }, + ) + + assert gate["required_relations"] == lock.required_relations + assert gate["apparent_relations"] == ["mounted soldiers appear to chase civilians"] + assert gate["relation_semantics_failed"] is True + assert gate["forbidden_readings"] == lock.forbidden_readings + assert gate["forbidden_readings_present"] == ["soldiers chasing civilians"] + + +def test_relation_semantics_failure_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_relations": [ + { + "subject": "mounted soldiers", + "relation": "escort/protect", + "object": "fleeing civilians", + } + ], + "relation_semantics_failed": True, + "forbidden_readings_present": ["soldiers chasing civilians"], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert gated["scores"]["L4"] <= 0.25 + assert "content_fidelity_failed" in gated["risk_flags"] + assert "Relation semantics failed" in gated["rationales"]["content_fidelity"] + assert "Forbidden relation readings: soldiers chasing civilians" in gated["rationales"]["content_fidelity"] + + +def test_blind_relation_reject_caps_high_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "blind_relation_decision": "reject", + "blind_relation_reason": "blind read matches forbidden relation reading", + "blind_forbidden_readings_present": ["soldiers chasing civilians"], + "blind_primary_reading": "Mounted soldiers appear to chase civilians.", + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.25 + assert ( + "Blind relation gate rejected image" + in gated["rationales"]["content_fidelity"] + ) + assert "content_fidelity_failed" in gated["risk_flags"] + + +def test_present_required_subjects_do_not_cap_score(): + result = { + "scores": {"L1": 0.95, "L2": 0.92, "L3": 1.0, "L4": 1.0, "L5": 0.94}, + "weighted_total": 0.965, + "rationales": {}, + } + gate = { + "required_subjects": ["bamboo", "orchid grasses"], + "missing_required_subjects": [], + } + + gated = apply_content_fidelity_gate(result, gate) + + assert gated["weighted_total"] == 0.965 + assert gated["scores"]["L3"] == 1.0 + assert gated.get("risk_flags", []) == [] diff --git a/tests/test_create_hitl.py b/tests/test_create_hitl.py index 86f9e7f1..7b885bec 100644 --- a/tests/test_create_hitl.py +++ b/tests/test_create_hitl.py @@ -2,9 +2,13 @@ from __future__ import annotations +import asyncio +from unittest.mock import AsyncMock, patch + import pytest -from vulca.create import acreate, create +from vulca.create import _create_local, acreate, create +from vulca.pipeline.types import PipelineOutput from vulca.types import CreateResult @@ -36,6 +40,53 @@ def test_hitl_sync(self): assert result.status == "waiting_human" assert result.interrupted_at == "decide" + def test_create_accepts_content_lock_argument(self): + result = create( + "Ink and wash painting of bamboo beside calligraphy.", + provider="mock", + mode="local", + content_lock=True, + ) + + assert result.status == "completed" + + def test_create_accepts_output_is_artwork_itself_argument(self): + result = create( + "Socialist Realism propaganda poster with workers.", + provider="mock", + mode="local", + output_is_artwork_itself=True, + ) + + assert result.status == "completed" + + def test_create_local_exposes_content_fidelity_audit_fields(self): + output = PipelineOutput( + session_id="s1", + status="completed", + final_scores={"L1": 0.25}, + weighted_total=0.25, + risk_flags=["content_fidelity_failed"], + content_fidelity_gate={ + "forbidden_visual_artifacts": ["visible sample ID"], + "unwanted_visible_text": True, + "output_is_artwork_itself": False, + }, + evaluation_source="mock_fallback", + evaluation_error="Could not parse JSON from LLM output", + ) + + with patch("vulca.pipeline.engine.execute", new=AsyncMock(return_value=output)): + result = asyncio.run(_create_local("test artwork", provider="mock")) + + assert result.risk_flags == ["content_fidelity_failed"] + assert result.content_fidelity_gate["forbidden_visual_artifacts"] == [ + "visible sample ID" + ] + assert result.evaluation_source == "mock_fallback" + assert result.evaluation_error == "Could not parse JSON from LLM output" + assert result.raw["content_fidelity_gate"] == result.content_fidelity_gate + class TestCreateWeights: """Custom weights change the weighted_total.""" diff --git a/tests/test_evaluate.py b/tests/test_evaluate.py index d3e5fbcb..c2efb803 100644 --- a/tests/test_evaluate.py +++ b/tests/test_evaluate.py @@ -405,3 +405,142 @@ def test_eval_result_with_skills_to_dict(self): ) d = asdict(result) assert d["skills"]["brand"]["score"] == 0.8 + + +def test_evaluate_node_applies_vlm_content_fidelity_gate(): + from vulca.content_lock import extract_content_lock + from vulca.pipeline.node import NodeContext + from vulca.pipeline.nodes import EvaluateNode + + lock = extract_content_lock( + "Ink and wash painting of bamboo beside vertical Chinese calligraphy." + ) + ctx = NodeContext( + subject="track1_0002", + intent=lock.original_intent, + tradition="chinese_xieyi", + provider="gemini", + api_key="fake-key", + ) + ctx.set("image_b64", "iVBORw0KGgo=") + ctx.set("node_params", {"evaluate": {"content_lock": lock.to_dict()}}) + + scored = { + "L1": 0.95, + "L2": 0.92, + "L3": 1.0, + "L4": 1.0, + "L5": 0.94, + "L1_rationale": "Strong image.", + "L2_rationale": "Strong technique.", + "L3_rationale": "Strong style.", + "L4_rationale": "Respectful.", + "L5_rationale": "Poetic.", + "content_fidelity_gate": { + "required_subjects": ["bamboo"], + "missing_required_subjects": ["bamboo"], + "required_text_elements": ["vertical Chinese calligraphy"], + "missing_required_text_elements": [], + }, + } + + with patch("vulca._vlm.score_image", new=AsyncMock(return_value=scored)) as mock_score: + result = asyncio.run(EvaluateNode().run(ctx)) + + assert mock_score.await_args.kwargs["content_lock"] == lock.to_dict() + assert result["weighted_total"] == 0.25 + assert result["scores"]["L3"] == 0.25 + assert "content_fidelity_failed" in result["risk_flags"] + + +def test_evaluate_node_applies_vlm_relation_semantics_gate(): + from vulca.content_lock import extract_content_lock + from vulca.pipeline.node import NodeContext + from vulca.pipeline.nodes import EvaluateNode + + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + ctx = NodeContext( + subject="track1_0747", + intent=lock.original_intent, + tradition="default", + provider="gemini", + api_key="fake-key", + ) + ctx.set("image_b64", "iVBORw0KGgo=") + ctx.set("node_params", {"evaluate": {"content_lock": lock.to_dict()}}) + + scored = { + "L1": 0.95, + "L2": 0.92, + "L3": 1.0, + "L4": 1.0, + "L5": 0.94, + "L1_rationale": "Strong image.", + "L2_rationale": "Strong technique.", + "L3_rationale": "Strong style.", + "L4_rationale": "Respectful.", + "L5_rationale": "Poetic.", + "content_fidelity_gate": { + "required_relations": lock.required_relations, + "apparent_relations": ["mounted soldiers appear to chase civilians"], + "relation_semantics_failed": True, + "forbidden_readings_present": ["soldiers chasing civilians"], + }, + } + + with patch("vulca._vlm.score_image", new=AsyncMock(return_value=scored)): + result = asyncio.run(EvaluateNode().run(ctx)) + + assert result["weighted_total"] == 0.25 + assert result["scores"]["L4"] == 0.25 + assert "content_fidelity_failed" in result["risk_flags"] + assert "Relation semantics failed" in result["rationales"]["content_fidelity"] + + +def test_evaluate_node_marks_vlm_parse_fallback_explicitly(): + from vulca.pipeline.node import NodeContext + from vulca.pipeline.nodes import EvaluateNode + + ctx = NodeContext( + subject="track1_0064", + intent="A Gongbi vertical hanging scroll with lotus blossoms.", + tradition="chinese_gongbi", + provider="gemini", + api_key="fake-key", + ) + ctx.set("image_b64", "iVBORw0KGgo=") + + scored = { + "error": "Could not parse JSON from LLM output", + "L1": 0.0, + "L2": 0.0, + "L3": 0.0, + "L4": 0.0, + "L5": 0.0, + } + + with patch("vulca._vlm.score_image", new=AsyncMock(return_value=scored)): + result = asyncio.run(EvaluateNode().run(ctx)) + + assert result["evaluation_source"] == "mock_fallback" + assert result["evaluation_error"] == "Could not parse JSON from LLM output" + + +def test_extract_scoring_falls_back_to_first_json_after_scratchpad(): + from vulca._parse import parse_llm_json + from vulca._vlm import _extract_scoring + + raw = """**Phase 1 - Scratchpad** + +The model ignored the requested subject. This note includes braces {not json}. + +{"L1": 0.2, "L2": 0.2, "L3": 0.1, "L4": 0.1, "L5": 0.2} +""" + + scoring = _extract_scoring(raw) + parsed = parse_llm_json(scoring) + + assert parsed["L3"] == 0.1 diff --git a/tests/test_gemini_image_size.py b/tests/test_gemini_image_size.py index ec187fa4..8112efdd 100644 --- a/tests/test_gemini_image_size.py +++ b/tests/test_gemini_image_size.py @@ -5,6 +5,7 @@ import sys import types as py_types +import pytest from PIL import Image from vulca.providers.gemini import ( @@ -115,6 +116,43 @@ def test_declares_masked_edit_adapter_capabilities(self): assert caps.requires_mask_for_edits is True assert caps.supports_unmasked_edits is False + def test_generate_missing_candidate_parts_reports_no_image_data(self, monkeypatch): + class FakeImageConfig: + def __init__(self, **kwargs): + self.kwargs = kwargs + + class FakeGenerateContentConfig: + def __init__(self, **kwargs): + self.kwargs = kwargs + + class FakeModels: + def generate_content(self, *, model, contents, config): + return py_types.SimpleNamespace( + candidates=[ + py_types.SimpleNamespace( + content=py_types.SimpleNamespace(parts=None) + ) + ], + prompt_feedback=None, + ) + + class FakeClient: + def __init__(self, api_key): + self.models = FakeModels() + + fake_types = py_types.SimpleNamespace( + ImageConfig=FakeImageConfig, + GenerateContentConfig=FakeGenerateContentConfig, + ) + fake_genai = py_types.SimpleNamespace(Client=FakeClient, types=fake_types) + fake_google = py_types.SimpleNamespace(genai=fake_genai) + monkeypatch.setitem(sys.modules, "google", fake_google) + monkeypatch.setitem(sys.modules, "google.genai", fake_genai) + monkeypatch.setitem(sys.modules, "google.genai.types", fake_types) + + with pytest.raises(RuntimeError, match="Gemini returned no image data"): + asyncio.run(GeminiImageProvider(api_key="gemini-key").generate("test")) + def test_inpaint_with_mask_sends_source_and_mask_parts( self, tmp_path, diff --git a/tests/test_package.py b/tests/test_package.py index 4c100274..026d4b70 100644 --- a/tests/test_package.py +++ b/tests/test_package.py @@ -423,6 +423,16 @@ def test_parse_llm_json_trailing_comma(): assert result == {"a": 1, "b": 2} +def test_parse_llm_json_repairs_extra_quote_before_key(): + from vulca._parse import parse_llm_json + + text = '{"L5": 0.75, ""missing_required_subjects": [], "risk_flags": []}' + result = parse_llm_json(text) + + assert result["L5"] == 0.75 + assert result["missing_required_subjects"] == [] + + def test_parse_llm_json_invalid(): from vulca._parse import parse_llm_json diff --git a/tests/test_pipeline_engine.py b/tests/test_pipeline_engine.py index 17f3401b..60b114be 100644 --- a/tests/test_pipeline_engine.py +++ b/tests/test_pipeline_engine.py @@ -2,6 +2,9 @@ from __future__ import annotations +import base64 +import asyncio + import pytest from vulca.pipeline.node import NodeContext, PipelineNode @@ -100,6 +103,174 @@ async def test_mock_different_rounds(self): r2 = await node.run(ctx2) assert r1["candidate_id"] != r2["candidate_id"] + def test_mock_generate_suppresses_sample_id_text(self): + node = GenerateNode() + ctx = NodeContext(subject="track1_0301", intent="draw branching lines") + + result = node._mock_generate(ctx) + svg = base64.b64decode(result["image_b64"]).decode() + + assert "track1_0301" not in svg + + def test_generate_node_puts_content_lock_before_cultural_guidance(self): + from vulca.content_lock import extract_content_lock + from vulca.providers.base import ImageResult + + class CapturingProvider: + def __init__(self): + self.prompts = [] + + async def generate(self, prompt, **kwargs): + self.prompts.append(prompt) + self.kwargs = kwargs + return ImageResult( + image_b64="iVBORw0KGgo=", + mime="image/png", + metadata={"candidate_id": "captured"}, + ) + + intent = ( + "Ink and wash painting of delicate bamboo and orchid grasses beside " + "vertical Chinese calligraphy and red seals on aged paper." + ) + provider = CapturingProvider() + lock = extract_content_lock(intent) + output = asyncio.run( + execute( + FAST, + PipelineInput( + subject="track1_0002", + intent=intent, + tradition="chinese_xieyi", + provider="gemini", + image_provider=provider, + max_rounds=1, + node_params={"generate": {"content_lock": lock.to_dict()}}, + ), + ) + ) + + assert output.status == "completed" + prompt = provider.prompts[0] + assert prompt.index("NON-NEGOTIABLE CONTENT REQUIREMENTS") < prompt.index(intent) + assert "Do not replace these subjects with mountains" in prompt + + def test_generate_node_does_not_send_sample_id_as_provider_subject_with_content_lock(self): + from vulca.content_lock import extract_content_lock + from vulca.providers.base import ImageResult + + class CapturingProvider: + async def generate(self, prompt, **kwargs): + self.prompt = prompt + self.kwargs = kwargs + return ImageResult( + image_b64="iVBORw0KGgo=", + mime="image/png", + metadata={"candidate_id": "captured"}, + ) + + intent = ( + "Abstract hand-drawn branching lines fill a rectangular frame on graph " + "paper in monochrome pencil style." + ) + provider = CapturingProvider() + lock = extract_content_lock(intent) + output = asyncio.run( + execute( + FAST, + PipelineInput( + subject="track1_0301", + intent=intent, + tradition="default", + provider="gemini", + image_provider=provider, + max_rounds=1, + node_params={"generate": {"content_lock": lock.to_dict()}}, + ), + ) + ) + + assert output.status == "completed" + assert provider.kwargs["subject"] == "" + assert "track1_0301" not in provider.prompt + + def test_generate_node_puts_artifact_boundary_before_content_requirements(self): + from vulca.content_lock import extract_content_lock + from vulca.providers.base import ImageResult + + class CapturingProvider: + async def generate(self, prompt, **kwargs): + self.prompt = prompt + return ImageResult( + image_b64="iVBORw0KGgo=", + mime="image/png", + metadata={"candidate_id": "captured"}, + ) + + intent = "Socialist Realism propaganda poster with workers and red banners." + provider = CapturingProvider() + lock = extract_content_lock(intent) + output = asyncio.run( + execute( + FAST, + PipelineInput( + subject="track1_0151", + intent=intent, + tradition="default", + provider="gemini", + image_provider=provider, + max_rounds=1, + node_params={"generate": {"content_lock": lock.to_dict()}}, + ), + ) + ) + + assert output.status == "completed" + assert provider.prompt.index("ARTIFACT BOUNDARY REQUIREMENT") < provider.prompt.index( + "NON-NEGOTIABLE CONTENT REQUIREMENTS" + ) + assert "flat, front-facing propaganda poster artwork" in provider.prompt + + def test_generate_node_puts_relation_semantics_before_user_intent(self): + from vulca.content_lock import extract_content_lock + from vulca.providers.base import ImageResult + + class CapturingProvider: + async def generate(self, prompt, **kwargs): + self.prompt = prompt + return ImageResult( + image_b64="iVBORw0KGgo=", + mime="image/png", + metadata={"candidate_id": "captured"}, + ) + + intent = ( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + provider = CapturingProvider() + lock = extract_content_lock(intent) + output = asyncio.run( + execute( + FAST, + PipelineInput( + subject="track1_0747", + intent=intent, + tradition="default", + provider="gemini", + image_provider=provider, + max_rounds=1, + node_params={"generate": {"content_lock": lock.to_dict()}}, + ), + ) + ) + + assert output.status == "completed" + assert provider.prompt.index("RELATION SEMANTICS REQUIREMENTS") < provider.prompt.index( + "USER INTENT TO PRESERVE VERBATIM" + ) + assert "soldiers chasing civilians" in provider.prompt + # ── EvaluateNode ──────────────────────────────────────────────────── diff --git a/tests/test_vlm_prompt.py b/tests/test_vlm_prompt.py index d7b0e9f7..b655b53b 100644 --- a/tests/test_vlm_prompt.py +++ b/tests/test_vlm_prompt.py @@ -4,6 +4,7 @@ import pytest from vulca._vlm import ( + _CONTENT_LOCK_MAX_TOKENS, _DEFAULT_MAX_TOKENS, _ESCALATED_MAX_TOKENS, _STATIC_SCORING_PREFIX, @@ -236,3 +237,90 @@ def test_score_image_no_double_escalation(self): # _MAX_ESCALATION_ATTEMPTS=1 means at most 2 calls total assert mock_acompletion.call_count == 2 + + def test_score_image_content_lock_gets_final_large_budget_on_second_truncation(self): + truncated = _make_mock_response("length", "" + _VALID_SCORING_JSON + "") + full_resp = _make_mock_response("stop", "" + _VALID_SCORING_JSON + "") + mock_acompletion = AsyncMock(side_effect=[truncated, truncated, full_resp]) + + with patch("litellm.acompletion", mock_acompletion): + result = asyncio.run( + score_image( + img_b64="aGVsbG8=", + mime="image/png", + subject="test artwork", + tradition="chinese_xieyi", + api_key="test-key", + content_lock={"original_intent": "test artwork"}, + ) + ) + + assert mock_acompletion.call_count == 3 + assert [call.kwargs["max_tokens"] for call in mock_acompletion.call_args_list] == [ + _DEFAULT_MAX_TOKENS, + _ESCALATED_MAX_TOKENS, + _CONTENT_LOCK_MAX_TOKENS, + ] + assert result.get("L1") == pytest.approx(0.8) + + def test_score_image_adds_blind_relation_gate_for_required_relations(self): + from vulca.content_lock import extract_content_lock + + lock = extract_content_lock( + "Wartime illustration of mounted soldiers beside fleeing civilians, " + "burning village ruins, and aircraft overhead." + ) + scoring_json = ( + '{"L1": 0.9, "L1_rationale": "ok", "L1_suggestion": "try", ' + '"L1_deviation_type": "traditional", "L1_observations": "", "L1_reference_technique": "", ' + '"L2": 0.9, "L2_rationale": "ok", "L2_suggestion": "try", ' + '"L2_deviation_type": "traditional", "L2_observations": "", "L2_reference_technique": "", ' + '"L3": 0.9, "L3_rationale": "ok", "L3_suggestion": "try", ' + '"L3_deviation_type": "traditional", "L3_observations": "", "L3_reference_technique": "", ' + '"L4": 0.9, "L4_rationale": "ok", "L4_suggestion": "try", ' + '"L4_deviation_type": "traditional", "L4_observations": "", "L4_reference_technique": "", ' + '"L5": 0.9, "L5_rationale": "ok", "L5_suggestion": "try", ' + '"L5_deviation_type": "traditional", "L5_observations": "", "L5_reference_technique": "", ' + '"missing_required_subjects": [], ' + '"missing_required_text_elements": [], ' + '"missing_required_surface": [], ' + '"missing_required_style_attributes": [], ' + '"apparent_relations": ["caption-conditioned escort"], ' + '"relation_semantics_failed": false, ' + '"forbidden_readings_present": [], ' + '"forbidden_visual_artifacts": [], ' + '"unwanted_visible_text": false, ' + '"output_is_artwork_itself": true, ' + '"risk_flags": []}' + ) + blind_json = ( + '{"visible_entities": ["mounted soldiers", "civilians"], ' + '"primary_reading": "Mounted soldiers appear to chase fleeing civilians.", ' + '"apparent_relations": ["mounted soldiers chasing civilians"], ' + '"threat_cues": [], ' + '"safety_cues": [], ' + '"ambiguous_readings": [], ' + '"confidence": 0.82}' + ) + normal_resp = _make_mock_response("stop", f"{scoring_json}") + blind_resp = _make_mock_response("stop", blind_json) + mock_acompletion = AsyncMock(side_effect=[normal_resp, blind_resp]) + + with patch("litellm.acompletion", mock_acompletion): + result = asyncio.run( + score_image( + img_b64="aGVsbG8=", + mime="image/png", + subject="track1_0747", + tradition="default", + api_key="test-key", + content_lock=lock.to_dict(), + ) + ) + + assert mock_acompletion.call_count == 2 + gate = result["content_fidelity_gate"] + assert gate["blind_relation_decision"] == "reject" + assert gate["blind_forbidden_readings_present"] == [ + "soldiers chasing civilians" + ]