feat(entity-graph): implement entity graph for structured relationship tracking#9
feat(entity-graph): implement entity graph for structured relationship tracking#9Th-Underscore wants to merge 12 commits intomainfrom
Conversation
…p tracking - Added EntityGraph class to manage entities and their relationships - Integrated entity graph into StoryContextRetriever for enhanced relationship tracking - Updated character relationships in JSON to include detailed importance structure - Modified importance thresholds in characters.json for better narrative relevance - Enhanced subjects schema to accommodate new importance structure with faction details
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a persisted EntityGraph for story entities and integrates it into StoryContextRetriever; expands schema, example data, and templates with new descriptive fields; summarizer cache flow now requires/copies Changes
Sequence DiagramsequenceDiagram
participant Client
participant StoryContextRetriever
participant EntityGraph
participant JSONFiles as JSON Files (chars/groups/events/arcs)
participant DiskCache as Disk Cache (entity_graph.json)
Client->>StoryContextRetriever: Request retrieval context / new chat
StoryContextRetriever->>EntityGraph: Initialize (history_path, schema_classes, persist=True)
EntityGraph->>DiskCache: Check for persisted entity_graph.json
alt Cache hit (fingerprint matches)
EntityGraph->>EntityGraph: Load persisted graph from disk
else Cache miss or fingerprint mismatch
EntityGraph->>JSONFiles: Read source JSON (characters/groups/events/arcs)
EntityGraph->>EntityGraph: Build nodes & relationships (importance, faction, events)
EntityGraph->>DiskCache: Persist entity_graph.json
end
StoryContextRetriever->>EntityGraph: Query groups / relationships / events (importance thresholds)
EntityGraph-->>StoryContextRetriever: Return filtered results
StoryContextRetriever->>StoryContextRetriever: Merge graph results with regex/alias/text fallbacks
StoryContextRetriever-->>Client: Return enriched retrieval context
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agents/summarizer.py (1)
1373-1435:⚠️ Potential issue | 🟠 MajorThe per-session
format_templates.jsoncache is not actually used.This branch now copies
format_templates.jsoninto the world-data cache and each history directory, butFormattedData._load_format_templates()in this same file still reads fromuser_data/example/format_templates.jsonand memoizes it process-wide. As written, per-chat template snapshots never affect prompt formatting, so this new cache path is effectively dead and template changes will not travel with the session.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/summarizer.py` around lines 1373 - 1435, Summary: per-session format_templates.json files are copied into caches but never used because FormattedData._load_format_templates reads and memoizes a single global path (user_data/example/format_templates.json). Fix: modify FormattedData._load_format_templates to accept an optional path argument (or detect a caller-provided session/history directory) and first attempt to load format_templates.json from the session-specific history_path or initial_world_data_path before falling back to the global template; also change the memoization so it keys by the template file path (or disable global memoization) so each session's format_templates.json is cached/used independently. Target symbols: FormattedData._load_format_templates, any global memo/cache var used for memoization, and the code that copies format_templates.json into initial_world_data_path/history_path so the loader can find them.
🧹 Nitpick comments (1)
rag/structured_rag/entity_graph.py (1)
433-433: Make the annotation match the default value.
group_names: list[str] = Noneis the exact implicit-Optionalcase Ruff flags here. Uselist[str] | Noneif the repo targets Python 3.10+, otherwiseOptional[list[str]], so the signature matches the runtime default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` at line 433, The parameter annotation for get_relevant_events is inconsistent with its default None; update the signature so group_names is annotated as an explicit optional type (e.g., change group_names: list[str] = None to group_names: list[str] | None if the project targets Python 3.10+, otherwise use group_names: Optional[list[str]] and import Optional) so the type matches the runtime default; ensure the change is applied to the get_relevant_events definition and any corresponding type imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/summarizer.py`:
- Around line 1409-1414: The existence check is using the wrong constant
(GLOBAL_SUBJECTS_SCHEMA_TEMPLATE_PATH) so a missing global format template will
only fail at shutil.copy; change the guard to test
GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH instead, and raise FileNotFoundError
mentioning GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH (the same symbol used in the
subsequent shutil.copy to format_templates_cache_path) so the error is raised
with the correct path before attempting shutil.copy.
In `@rag/structured_rag/context_retriever.py`:
- Around line 265-303: This helper (using
self.entity_graph.get_bidirectional_relationship in
context_retriever._get_all_relevant_character_relationships) is incorrectly
merging the backward (char2 -> char1) perspective into the same rels map,
causing characters to inherit the other party's perspective; remove the code
that appends bidir["backward"] into rels so only the forward relationship
(bidir["forward"] with importance >= correlation_threshold) is included here —
do not merge or add bidir["backward"] into rels (delete the block that checks
bidir["backward"] and appends it).
In `@rag/structured_rag/entity_graph.py`:
- Around line 159-163: Currently nodes are created using the display name as the
id (e.g., EntityNode(id=char_name, name=char_name)), which allows collisions
across entity types; change node ids to be namespaced by type (e.g., prefix with
"character:", "group:", "scene:", "event:", "chapter:", "arc:") while keeping
the human-readable label in the name field. Update all EntityNode constructions
(e.g., where EntityNode is created for characters, groups, scenes, events,
chapters, arcs noted around id=char_name and the other locations called out) to
use a typed id like f"{type}:{display_name}" and also update any code that
builds or looks up relationships/edges to use the same typed id scheme so
inserts and lookups remain consistent.
- Around line 67-72: EntityGraph currently always loads entity_graph.json when
present (in the __init__ logic that calls
_load_from_disk/_build_from_source_files), causing stale graphs when source
JSONs change; modify loading to validate freshness by computing and storing a
fingerprint (e.g., mtimes or a checksum) of source files (characters.json,
groups.json, events.json, arcs.json) when saving via _save_to_disk, then in the
load path (_load_from_disk or the caller that decides to load) compare the
current fingerprint of those source files to the saved fingerprint and force
_build_from_source_files() (and re-run _save_to_disk) if they differ so the
graph is rebuilt whenever any source JSON changes.
- Around line 250-299: The event participants are never incorporated into the
graph because _build_events only creates raw EntityNode objects for
scenes/chapters/events; update _build_events to read the participants list from
each event's data (e.g., event_data.get("participants") or StoryEvent schema
key) and attach it to the created EntityNode (node.data["participants"] =
participants) and/or materialize edges: for each participant id/name ensure an
EntityNode exists (create if missing) and call the graph edge helper (e.g.,
self.add_edge or self.edges.append) to connect the event node id to the
participant node id. Also adjust get_entities_involved_in_event to read
node.data["participants"] (or follow the created edges) so it returns the
participant IDs instead of []. Ensure you reference the EntityNode instances
created in _build_events and use the existing edge-creation helper on the graph
class to keep consistent graph semantics.
- Around line 586-600: The function
get_all_important_relationships_for_characters currently ignores its threshold
parameter because it always calls get_relationship_summary without passing the
requested cutoff; update get_all_important_relationships_for_characters to
forward the threshold into get_relationship_summary (e.g., call
get_relationship_summary(char_name, threshold)) so the caller's filter is
applied, and if get_relationship_summary does not accept a threshold yet, add an
optional threshold parameter to get_relationship_summary (defaulting to 50) and
ensure its internal filtering uses that value.
- Around line 291-299: The current loop assumes chapters is a mapping and calls
chapters.items(), which fails when events_data["chapters"] is a list per
subjects_schema.json; update the build-chapters logic (around EntityNode
creation in entity_graph.py) to handle both list and dict: if chapters is a dict
iterate chapter_name, chapter_data in chapters.items(); if it's a list iterate
the list and for each chapter use a stable identifier (e.g., chapter.get("id")
or chapter.get("name") or fallback to an index) as the EntityNode id/name and
pass the chapter object as data; ensure the created nodes still use
type="chapter" and maintain the existing EntityNode(...) call signature.
In `@user_data/example/format_templates.json`:
- Line 18: The characters template is dropping relationship and event importance
metadata; update the relationship and event renderers so they include
relation.status and the importance subfields (importance.score plus
importance.faction or importance.reason) alongside the relation name.
Concretely, in the characters template where you iterate char_data.relationships
(symbols: char_data.relationships, rel_name, rel_list, relation) extend the
output for each relation to emit relation.status and relation.importance.score
and either relation.importance.faction or relation.importance.reason (handling
both mapping and list forms the same way you handle other fields), and make the
analogous change in the events rendering loop to include event.status and
event.importance.score/faction/reason.
In `@user_data/example/subjects_schema.json`:
- Around line 27-33: The update examples that replace .importance currently only
set "score" and "reason", which breaks the new 3-field importance shape by
dropping "faction"; update every importance-update prompt/example to include the
"faction" property too (e.g., set "faction" from the existing importance.faction
when present or fall back to the defaults.faction_placeholder), and ensure any
code or template that writes .importance (search for occurrences of
".importance" in prompts/examples and the example replacement snippets)
preserves or propagates importance.faction so relationships are not downgraded
to neutral on graph rebuilds.
- Around line 382-401: The StoryEvent schema was extended with formal_name,
aliases, catalyst, outcome, and participants but the StoryEvents branching/query
prompts that drive updates/reviews still only reference name, start, end, and
summary; update those prompt templates (the StoryEvents branch/query prompts) to
explicitly include instructions and validation for the new fields (formal_name,
aliases, catalyst, outcome, participants) and to reference the corresponding
placeholders (formal_name_placeholder, aliases_placeholder,
catalyst_placeholder, outcome_placeholder, participants_placeholder) so existing
events get reviewed/updated for those fields as well.
In `@utils/schema_parser.py`:
- Around line 732-737: The method currently returns a primitive description
directly when _is_primitive_schema(main_schema) is true, breaking the prior
return envelope expected by callers; change the branch in that method so it
preserves the original envelope (e.g., return an object with the same keys as
the non-primitive path such as {"main_schema": primitive_description,
"definitions": {}} or similar) instead of returning the raw primitive dict;
update the assignment to
self.relative_relevant_json_schemas[root_schema_or_type_hint] to store the
enveloped value and ensure the method's return value signature remains identical
for both primitive and non-primitive flows (referencing _is_primitive_schema,
_get_primitive_type_description, relative_relevant_json_schemas, and
root_schema_or_type_hint to locate the code).
- Around line 762-766: The current primitive-like detection treats any object
with no "properties" as primitive, which misclassifies typed maps (objects that
use "additionalProperties") and drops their definitions; update the check in the
block that inspects main_schema (the code using main_schema.get("type"),
main_schema.get("$ref"), main_schema.get("properties")) to only return True when
the object has no properties AND has no meaningful additionalProperties (i.e.,
additionalProperties is absent or falsy), and treat cases where
additionalProperties is a schema/object (non-falsy dict) as non-primitive so the
full definition is emitted; ensure you reference
main_schema.get("additionalProperties") in the condition to avoid skipping
map-of-objects schemas.
---
Outside diff comments:
In `@agents/summarizer.py`:
- Around line 1373-1435: Summary: per-session format_templates.json files are
copied into caches but never used because FormattedData._load_format_templates
reads and memoizes a single global path
(user_data/example/format_templates.json). Fix: modify
FormattedData._load_format_templates to accept an optional path argument (or
detect a caller-provided session/history directory) and first attempt to load
format_templates.json from the session-specific history_path or
initial_world_data_path before falling back to the global template; also change
the memoization so it keys by the template file path (or disable global
memoization) so each session's format_templates.json is cached/used
independently. Target symbols: FormattedData._load_format_templates, any global
memo/cache var used for memoization, and the code that copies
format_templates.json into initial_world_data_path/history_path so the loader
can find them.
---
Nitpick comments:
In `@rag/structured_rag/entity_graph.py`:
- Line 433: The parameter annotation for get_relevant_events is inconsistent
with its default None; update the signature so group_names is annotated as an
explicit optional type (e.g., change group_names: list[str] = None to
group_names: list[str] | None if the project targets Python 3.10+, otherwise use
group_names: Optional[list[str]] and import Optional) so the type matches the
runtime default; ensure the change is applied to the get_relevant_events
definition and any corresponding type imports.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3eb998f0-4433-4267-91fa-20c1900a2e32
📒 Files selected for processing (11)
agents/summarizer.pyrag/structured_rag/context_retriever.pyrag/structured_rag/entity_graph.pytodo.mduser_data/example/characters.jsonuser_data/example/events.jsonuser_data/example/format_templates.jsonuser_data/example/general_info.jsonuser_data/example/groups.jsonuser_data/example/subjects_schema.jsonutils/schema_parser.py
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 6 file(s) based on 12 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
There was a problem hiding this comment.
AI Code Review by LlamaPReview
🎯 TL;DR & Recommendation
Recommendation: Request Changes
This PR adds an entity graph for relationship tracking but introduces critical breaking changes to the data schema and creates dual-code paths that could lead to runtime failures and behavioral divergence.
📄 Documentation Diagram
This diagram documents the new entity graph workflow for initializing and querying story entity relationships.
sequenceDiagram
participant UI as User Interface
participant SCR as StoryContextRetriever
participant EG as EntityGraph
participant JSON as JSON Files
participant Cache as entity_graph.json
UI ->> SCR: Initialize with history_path
SCR ->> EG: Initialize EntityGraph(persist=True)
EG ->> JSON: Load/parse characters.json, groups.json, events.json, arcs.json
note over EG,JSON: PR #35;9 added EntityGraph for relationship tracking
EG -->> Cache: Save graph to entity_graph.json
EG -->> SCR: Graph ready with nodes and relationships
SCR ->> EG: Query relationships (e.g., get_bidirectional_relationship)
EG -->> SCR: Return relationship data
SCR -->> UI: Provide retrieval context with enhanced relationships
🌟 Strengths
- Introduces a structured entity graph for enhanced relationship tracking, improving data organization.
- Expands character and event data with richer metadata (biography, voice, preferences, catalyst, outcome, participants).
| Priority | File | Category | Impact Summary | Anchors |
|---|---|---|---|---|
| P1 | agents/summarizer.py | Architecture | Removal of enable_thinking parameter breaks UI integration, risking configuration errors. |
path:ui/loaders.py, path:ui/utils.py |
| P1 | agents/summarizer.py | Bug | Incorrect file check may cause silent copy failures for format templates. | |
| P1 | rag/.../context_retriever.py | Architecture | Conditional graph integration creates dual-code paths, increasing bug risk and complexity. | |
| P1 | rag/.../entity_graph.py | Architecture | Schema change to importance object breaks backward compatibility for JSON consumers. | path:user_data/example/characters.json, path:user_data/example/subjects_schema.json |
| P2 | rag/.../entity_graph.py | Maintainability | No-op placeholder methods return empty lists, hiding missing functionality. | |
| P2 | rag/.../entity_graph.py | Performance | rebuild_if_needed always rebuilds, eliminating cache performance gains and causing latency. |
|
| P2 | rag/.../context_retriever.py | Maintainability | Merging character lists from two sources risks duplicates and pattern creation errors. |
🔍 Notable Themes
- Architectural Fragmentation: The introduction of
EntityGraphcreates a dual-layer system (graph vs. JSON) without a unified abstraction, increasing maintenance burden and risk of inconsistency. - Data Schema Evolution: Changes to JSON structures (e.g., importance as objects) are not fully propagated across all consumers, leading to potential runtime errors or misinterpretation.
- Performance Oversights: Cache mechanisms are undermined by unconditional rebuilds, negating intended latency improvements for large datasets.
📈 Risk Diagram
This diagram illustrates the dual-code path risk where relationship lookups can use both graph and JSON sources, leading to potential inconsistency.
sequenceDiagram
participant SCR as StoryContextRetriever
participant EG as EntityGraph
participant JSON as JSON Data
SCR ->> SCR: Check if entity_graph exists (hasattr)
alt EntityGraph available
SCR ->> EG: Call get_bidirectional_relationship
note over SCR,EG: R1(P1): Dual-code path increases complexity
EG -->> SCR: Return graph-based relationship
else Fallback to JSON
SCR ->> JSON: Use _get_character_scene_relationships (JSON logic)
note over SCR,JSON: Risk of behavioral divergence
JSON -->> SCR: Return JSON-based relationship
end
SCR -->> SCR: Merge or use result (potential cache incoherence)
⚠️ **Unanchored Suggestions (Manual Review Recommended)**
The following suggestions could not be precisely anchored to a specific line in the diff. This can happen if the code is outside the changed lines, has been significantly refactored, or if the suggestion is a general observation. Please review them carefully in the context of the full file.
📁 File: agents/summarizer.py
The enable_thinking key was removed from the core base_state configuration dictionary. The related context from ui/loaders.py and ui/utils.py shows this parameter is still part of the UI's expected input elements list and sampler configuration. This creates a breaking change where the UI will pass a parameter (enable_thinking) that the underlying summarizer logic no longer recognizes or uses, potentially causing configuration mismatches or errors if the parameter is required elsewhere in the pipeline.
Suggestion:
# Add a comment explaining the parameter is deprecated and ignored
# "enable_thinking": True, # Deprecated - no longer used by summarizerRelated Code:
base_state = {
"name1": "SYSTEM",
"name2": "DAYNA",
"mode": "instruct",
"chat-instruct_command": 'Continue the chat dialogue below. Write a single reply for the character "DAYNA". Answer questions flawlessly. Follow instructions to a T.
<|prompt|>',
"context": (📁 File: rag/structured_rag/entity_graph.py
The EntityGraph._build_characters method contains logic to handle both the old integer importance format and the new object {"score": int, "reason": str, "faction": str} format introduced in the updated example JSON and schema. This is a breaking schema change for the characters.json data model. While the graph builder includes backward compatibility logic, any other part of the codebase that reads characters.json directly (e.g., the old JSON-based fallback paths in context_retriever.py) will now fail or misinterpret data if it encounters the new object format, as they expect a simple integer. The PR changes the canonical example data but does not update all consumers or provide a clear migration path.
Related Code:
def _build_characters(self):
"""Parse characters.json to build character nodes and their relationships."""
characters_path = self.history_path / "characters.json"
if not characters_path.exists():
return
characters_data = load_json(characters_path)
entries = characters_data.get("entries", characters_data)
for char_name, char_data in entries.items():
# ... node creation ...
# Store relationships
relationships = char_data.get("relationships", {})
node.relationships = relationships
# Add relationships to graph
for target_name, rel_list in relationships.items():
if not isinstance(rel_list, list):
continue
for rel in rel_list:
if not isinstance(rel, dict):
continue
importance_data = rel.get("importance", {})
if isinstance(importance_data, dict):
importance = importance_data.get("score", 0)
importance_reason = importance_data.get("reason", "")
faction = importance_data.get("faction", "neutral")
else:
importance = importance_data if isinstance(importance_data, int) else 0
importance_reason = ""
faction = "neutral"💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.
| format_templates_cache_path = initial_world_data_path / "format_templates.json" | ||
| if not GLOBAL_SUBJECTS_SCHEMA_TEMPLATE_PATH.exists(): | ||
| raise FileNotFoundError( | ||
| f"Global format templates template not found at {GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH}" | ||
| ) | ||
| shutil.copy(GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH, format_templates_cache_path) |
There was a problem hiding this comment.
P1 | Confidence: High
The FileNotFoundError condition incorrectly checks for the existence of GLOBAL_SUBJECTS_SCHEMA_TEMPLATE_PATH instead of GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH. This logic bug will cause the error to be raised based on the wrong file's presence, potentially allowing the copy operation to proceed when the source format templates file is missing, leading to a silent failure or a confusing error message.
| format_templates_cache_path = initial_world_data_path / "format_templates.json" | |
| if not GLOBAL_SUBJECTS_SCHEMA_TEMPLATE_PATH.exists(): | |
| raise FileNotFoundError( | |
| f"Global format templates template not found at {GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH}" | |
| ) | |
| shutil.copy(GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH, format_templates_cache_path) | |
| if not GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH.exists(): | |
| raise FileNotFoundError( | |
| f"Global format templates template not found at {GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH}" | |
| ) |
| """Get relationships between two characters in the same scene, regardless of importance.""" | ||
| characters_data = self.characters.get("entries", self.characters) | ||
|
|
||
| # Try using graph first for bidirectional relationship lookup |
There was a problem hiding this comment.
P1 | Confidence: High
The PR introduces a significant architectural change by adding a new EntityGraph data layer and conditionally using it alongside the existing JSON-based logic in context_retriever.py. The conditional integration pattern (if hasattr(self, 'entity_graph') and self.entity_graph) is repeated across multiple methods (_get_relevant_groups, _get_character_scene_relationships, _get_all_relevant_character_relationships), creating a dual-code path. This increases complexity and the risk of behavioral divergence, cache incoherence (graph vs JSON files), and makes debugging and testing more difficult. A cleaner architectural pattern (e.g., a single abstraction) would be preferable.
| def get_scene_characters(self, scene_name: str) -> list[str]: | ||
| """Get all characters in a scene.""" | ||
| # This would need to be enhanced with scene->character mapping | ||
| # For now, return empty list - can be expanded later | ||
| return [] |
There was a problem hiding this comment.
P2 | Confidence: High
Multiple methods in the new EntityGraph class (get_scene_characters, get_scene_for_characters, get_entities_involved_in_event) are implemented as no-op placeholders that return empty lists or None. While marked with TODO comments, shipping these as part of a foundational new feature creates a trap for future developers. Callers (like the updated context_retriever.py) might unknowingly rely on empty results, leading to subtle bugs when the methods are eventually implemented, as the system's behavior will change silently.
Code Suggestion:
def get_scene_characters(self, scene_name: str) -> list[str]:
raise NotImplementedError("Scene->character mapping not yet implemented in EntityGraph")| def rebuild_if_needed(self, force: bool = False): | ||
| """ | ||
| Rebuild the graph if source files have been modified. | ||
|
|
||
| Currently always rebuilds since we don't track file modification times. | ||
| Can be enhanced later to compare file timestamps. | ||
|
|
||
| Args: | ||
| force: If True, always rebuild. If False, currently always rebuilds. |
There was a problem hiding this comment.
P2 | Confidence: Medium
The rebuild_if_needed method, despite its name, unconditionally calls rebuild(). This negates the performance benefit of the graph's persistent cache (entity_graph.json). Every time a StoryContextRetriever is instantiated (which creates an EntityGraph), it will rebuild the graph from source JSON files, parse relationships, and save to disk, even if the source files are unchanged. For large stories, this could cause significant startup latency.
Code Suggestion:
def rebuild_if_needed(self, force: bool = False):
graph_path = self.history_path / "entity_graph.json"
source_files = ["characters.json", "groups.json", "events.json", "arcs.json"]
source_paths = [self.history_path / f for f in source_files]
# If force or graph doesn't exist, rebuild
if force or not graph_path.exists():
self.rebuild()
return
# Check if any source file is newer than the cached graph
graph_mtime = graph_path.stat().st_mtime
for src in source_paths:
if src.exists() and src.stat().st_mtime > graph_mtime:
self.rebuild()
return| self.chunker = MessageChunker(history_path, self.characters, self.groups, self.events, self.current_scene) | ||
|
|
||
| # Create character name patterns for recognition | ||
| # Create character name patterns for recognition (from both JSON and graph) |
There was a problem hiding this comment.
P2 | Confidence: Medium
The method _create_character_patterns now attempts to merge character names from two sources: the loaded JSON data and the new EntityGraph. This introduces a risk of duplicate entries and increases complexity. More importantly, it assumes the graph's node names (node.name) are directly comparable to the JSON dictionary keys. If the graph-building logic (_build_characters) uses a different ID (like char_name vs. a normalized version), the merge will fail, potentially missing characters or creating duplicate patterns.
Code Suggestion:
def _create_character_patterns(self) -> dict[str, re.Pattern]:
patterns = {}
# Use graph if available, else fall back to JSON
if hasattr(self, 'entity_graph') and self.entity_graph:
char_nodes = self.entity_graph.get_nodes_by_type("character")
char_names = [node.name for node in char_nodes]
else:
characters_data = self.characters.get("entries", self.characters)
char_names = list(characters_data.keys())
# ... build patterns from char_namesFixed 6 file(s) based on 12 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
agents/summarizer.py (1)
1351-1353: Consider centralizing template-path constants to avoid drift.
format_templates.jsonpath is now defined here and also inFormattedData._load_format_templates. A single shared module-level constant would reduce future mismatch risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/summarizer.py` around lines 1351 - 1353, The code defines GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH here and also constructs the same path inside FormattedData._load_format_templates, risking drift; refactor by centralizing the path into one module-level constant (e.g., GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH) and update FormattedData._load_format_templates to import/use that constant instead of rebuilding the path, ensuring EXTENSION_DIR usage remains consistent and removing duplicate path construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rag/structured_rag/entity_graph.py`:
- Around line 589-601: In get_relevant_events(), the code compares raw names
against stored relationship/node IDs so relationship and group events are never
found; update the comparisons and lookups to use the stored ID prefixes (e.g.,
compare r.source_id or r.target_id to f"character:{char_name}" or otherwise
construct the same "character:<name>" form used in relationships, and when
retrieving group nodes call get_node(f"group:{group_name}") to match the node
keys); ensure you still iterate self.relationships and node.data.get("events",
[]) as before but using the prefixed IDs so events are correctly collected.
- Around line 441-447: The code handles arcs_data mapping but leaves arcs_list
as a dict when arcs.json uses {"entries": {...}}, causing the subsequent loop to
iterate keys instead of arc objects; update the normalization after setting
arcs_list (when arcs_data is a dict) to convert a mapping into an iterable of
arc dicts (e.g., if arcs_list is a dict, replace it with its .values() or
list(...) so the for arc_data in arcs_list loop sees dicts), referencing the
existing arcs_data/arcs_list/arc_data variables in entity_graph.py.
- Line 575: The signature of get_relevant_events currently types group_names as
list[str] but defaults it to None, causing RUF013; update the parameter
annotation to be nullable by changing group_names: list[str] | None = None (or
Optional[list[str]] if you prefer typing.Optional) in the get_relevant_events
definition and ensure any downstream uses treat None as the absent-case (no
other code changes needed).
---
Nitpick comments:
In `@agents/summarizer.py`:
- Around line 1351-1353: The code defines GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH
here and also constructs the same path inside
FormattedData._load_format_templates, risking drift; refactor by centralizing
the path into one module-level constant (e.g.,
GLOBAL_FORMAT_TEMPLATES_TEMPLATE_PATH) and update
FormattedData._load_format_templates to import/use that constant instead of
rebuilding the path, ensuring EXTENSION_DIR usage remains consistent and
removing duplicate path construction.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21e07567-74e1-4580-80f7-55898a1ba5a9
📒 Files selected for processing (6)
agents/summarizer.pyrag/structured_rag/context_retriever.pyrag/structured_rag/entity_graph.pyuser_data/example/format_templates.jsonuser_data/example/subjects_schema.jsonutils/schema_parser.py
✅ Files skipped from review due to trivial changes (1)
- rag/structured_rag/context_retriever.py
🚧 Files skipped from review as they are similar to previous changes (3)
- user_data/example/format_templates.json
- utils/schema_parser.py
- user_data/example/subjects_schema.json
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 3 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 1 file(s) based on 3 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
rag/structured_rag/entity_graph.py (4)
713-723:rebuild_if_neededignores existing fingerprint infrastructure.This method always calls
rebuild()despite fingerprint logic existing in_load_or_build(). The method name suggests conditional behavior but causes unnecessary rebuilds every time.♻️ Proposed fix to leverage existing fingerprint comparison
def rebuild_if_needed(self, force: bool = False): """ Rebuild the graph if source files have been modified. - - Currently always rebuilds since we don't track file modification times. - Can be enhanced later to compare file timestamps. - + Args: - force: If True, always rebuild. If False, currently always rebuilds. + force: If True, always rebuild. If False, only rebuild if source files changed. """ - self.rebuild() + if force: + self.rebuild() + return + + graph_path = self.history_path / "entity_graph.json" + if not graph_path.exists(): + self.rebuild() + return + + # Compare fingerprints to detect changes + current_fingerprint = self._compute_source_fingerprint() + data = load_json(graph_path) + saved_fingerprint = data.get("source_fingerprint", {}) + + if current_fingerprint != saved_fingerprint: + self.rebuild()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` around lines 713 - 723, The method rebuild_if_needed currently always calls rebuild() and should instead leverage the existing fingerprint logic in _load_or_build: compute or obtain the current fingerprint for the source inputs (using the same mechanism used by _load_or_build), compare it to the stored/previous fingerprint, and only call self.rebuild() when force is True or the fingerprints differ; if fingerprints match do nothing. Update rebuild_if_needed to either delegate to _load_or_build for the comparison or replicate its fingerprint-computation and comparison steps, referencing the methods/fields _load_or_build, rebuild, and whatever fingerprint storage variable is used so the conditional rebuild behavior is consistent with the rest of the class.
634-653:get_bidirectional_relationshiphardcodes character prefix.Lines 642-643 assume entities are characters (
f"character:{entity1}"), but the method name and docstring are generic. Either document this limitation or accept pre-namespaced IDs.📝 Option A: Document the limitation
def get_bidirectional_relationship(self, entity1: str, entity2: str) -> dict[str, Relationship]: """ Get relationships between two entities in both directions. + + Note: Currently only supports character entities. Returns: Dict with "forward" and "backward" Relationship objects """📝 Option B: Accept namespaced IDs directly
- def get_bidirectional_relationship(self, entity1: str, entity2: str) -> dict[str, Relationship]: + def get_bidirectional_relationship(self, entity1: str, entity2: str, entity_type: str = "character") -> dict[str, Relationship]: """ Get relationships between two entities in both directions. Returns: Dict with "forward" and "backward" Relationship objects """ - # Convert to namespaced IDs (assume characters for now) - entity1_id = f"character:{entity1}" - entity2_id = f"character:{entity2}" + entity1_id = f"{entity_type}:{entity1}" + entity2_id = f"{entity_type}:{entity2}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` around lines 634 - 653, get_bidirectional_relationship currently hardcodes a "character:" prefix for entity1/entity2 (entity1_id = f"character:{entity1}") which makes the method non-generic; change it to accept pre-namespaced IDs or apply a default namespace only when needed by: check entity1 and entity2 for an existing namespace separator (':' ) and if absent prefix with the default namespace (e.g., "character:"), update the docstring to state it accepts either namespaced IDs or bare names, and keep the rest of the logic that matches against self.relationships and sets result["forward"] / result["backward"] unchanged.
495-499: Placeholder methods return empty defaults.These no-op placeholders (
get_scene_characters,get_scene_for_characters) return[]/Nonewhich is safer than raisingNotImplementedError. The comments clearly document this as intentional. Consider tracking these as TODO items if scene-character mapping is a planned feature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` around lines 495 - 499, These placeholder methods (get_scene_characters and get_scene_for_characters) intentionally return empty defaults; update their docstrings and add explicit TODO/ FIXME markers and a one-line comment explaining they are intentional stubs to be implemented later, and include an issue/track-id or link to the backlog so reviewers know this is planned work (e.g., "TODO: implement scene->character mapping; tracked in ISSUE-123"). Ensure the method names get_scene_characters and get_scene_for_characters are updated accordingly and, if desired, add a lightweight logger warning inside them to surface usage of the stub at runtime.
285-403: Repeated participant-edge logic across scenes, past events, and crucial events.The same ~20 lines for extracting participants and creating edges are duplicated three times. Consider extracting to a helper method.
♻️ Example helper extraction
def _add_participant_edges(self, node_id: str, participants: list[str]): """Create participant nodes and edges for an event/scene.""" for participant in participants: participant_id = f"character:{participant}" if participant_id not in self.nodes: self.nodes[participant_id] = EntityNode( id=participant_id, type="character", name=participant, data={} ) self.relationships.append(Relationship( source_id=node_id, target_id=participant_id, relation="participant", status="event_participation", importance=50 ))Then in each event-building section:
participants = scene_data.get("participants", []) if participants: node.data["participants"] = participants self._add_participant_edges(node_id, participants)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` around lines 285 - 403, Duplicate participant-edge creation logic appears in the scenes, past, and crucial events loops; extract it into a helper like _add_participant_edges(self, node_id: str, participants: list[str]) that (1) creates missing participant EntityNode entries in self.nodes using ids like f"character:{participant}" and (2) appends Relationship(...) objects to self.relationships with source_id=node_id, target_id=participant_id, relation="participant", status="event_participation", importance=50; then in the scenes loop (node_id = f"scene:{scene_name}"), past loop (node_id = f"event:{event_name}"), and crucial events loop (node_id = f"event:{event_name}") replace the repeated block with setting node.data["participants"] = participants and calling self._add_participant_edges(node_id, participants).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rag/structured_rag/entity_graph.py`:
- Around line 441-455: When arcs_data is a dict and you convert arcs_list from a
mapping, preserve the dict keys as possible arc titles: instead of calling
list(arcs_list.values()) in the block handling isinstance(arcs_list, dict),
convert to an iterable of (key, value) pairs or attach the key to the arc dict
so the parsing logic can fall back to the map key when arc_data.get("title") is
missing; update the loop that processes arc_data (references: arcs_data,
arcs_list, arc_data, arc_title) to accept either a tuple (key, value) or ensure
arc_data["title"] = key when missing before continuing so arcs with no embedded
"title" are not skipped.
---
Nitpick comments:
In `@rag/structured_rag/entity_graph.py`:
- Around line 713-723: The method rebuild_if_needed currently always calls
rebuild() and should instead leverage the existing fingerprint logic in
_load_or_build: compute or obtain the current fingerprint for the source inputs
(using the same mechanism used by _load_or_build), compare it to the
stored/previous fingerprint, and only call self.rebuild() when force is True or
the fingerprints differ; if fingerprints match do nothing. Update
rebuild_if_needed to either delegate to _load_or_build for the comparison or
replicate its fingerprint-computation and comparison steps, referencing the
methods/fields _load_or_build, rebuild, and whatever fingerprint storage
variable is used so the conditional rebuild behavior is consistent with the rest
of the class.
- Around line 634-653: get_bidirectional_relationship currently hardcodes a
"character:" prefix for entity1/entity2 (entity1_id = f"character:{entity1}")
which makes the method non-generic; change it to accept pre-namespaced IDs or
apply a default namespace only when needed by: check entity1 and entity2 for an
existing namespace separator (':' ) and if absent prefix with the default
namespace (e.g., "character:"), update the docstring to state it accepts either
namespaced IDs or bare names, and keep the rest of the logic that matches
against self.relationships and sets result["forward"] / result["backward"]
unchanged.
- Around line 495-499: These placeholder methods (get_scene_characters and
get_scene_for_characters) intentionally return empty defaults; update their
docstrings and add explicit TODO/ FIXME markers and a one-line comment
explaining they are intentional stubs to be implemented later, and include an
issue/track-id or link to the backlog so reviewers know this is planned work
(e.g., "TODO: implement scene->character mapping; tracked in ISSUE-123"). Ensure
the method names get_scene_characters and get_scene_for_characters are updated
accordingly and, if desired, add a lightweight logger warning inside them to
surface usage of the stub at runtime.
- Around line 285-403: Duplicate participant-edge creation logic appears in the
scenes, past, and crucial events loops; extract it into a helper like
_add_participant_edges(self, node_id: str, participants: list[str]) that (1)
creates missing participant EntityNode entries in self.nodes using ids like
f"character:{participant}" and (2) appends Relationship(...) objects to
self.relationships with source_id=node_id, target_id=participant_id,
relation="participant", status="event_participation", importance=50; then in the
scenes loop (node_id = f"scene:{scene_name}"), past loop (node_id =
f"event:{event_name}"), and crucial events loop (node_id =
f"event:{event_name}") replace the repeated block with setting
node.data["participants"] = participants and calling
self._add_participant_edges(node_id, participants).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a6a23823-9023-43d6-8cfa-91581c21d620
📒 Files selected for processing (1)
rag/structured_rag/entity_graph.py
| elif isinstance(arcs_data, dict): | ||
| arcs_list = arcs_data.get("entries", arcs_data.get("arcs", [])) | ||
| # If arcs_list is a dict (e.g., {"entries": {...}}), convert to iterable of arc dicts | ||
| if isinstance(arcs_list, dict): | ||
| arcs_list = list(arcs_list.values()) | ||
| else: | ||
| return | ||
|
|
||
| for arc_data in arcs_list: | ||
| if not isinstance(arc_data, dict): | ||
| continue | ||
|
|
||
| arc_title = arc_data.get("title", "") | ||
| if not arc_title: | ||
| continue |
There was a problem hiding this comment.
Arc title is lost when parsing mapping-shaped arcs.json.
When arcs.json uses {"entries": {"Arc Name": {...}}} format, Line 445 converts to .values() which discards the key. If the inner dict lacks a "title" field, Lines 453-455 skip that arc entirely.
🔧 Proposed fix to preserve arc name from dict key
elif isinstance(arcs_data, dict):
- arcs_list = arcs_data.get("entries", arcs_data.get("arcs", []))
- # If arcs_list is a dict (e.g., {"entries": {...}}), convert to iterable of arc dicts
- if isinstance(arcs_list, dict):
- arcs_list = list(arcs_list.values())
+ raw_arcs = arcs_data.get("entries", arcs_data.get("arcs", []))
+ # If raw_arcs is a dict (e.g., {"entries": {...}}), normalize to list preserving keys as titles
+ if isinstance(raw_arcs, dict):
+ arcs_list = []
+ for arc_name, arc_value in raw_arcs.items():
+ if not isinstance(arc_value, dict):
+ continue
+ normalized = dict(arc_value)
+ normalized.setdefault("title", arc_name)
+ arcs_list.append(normalized)
+ else:
+ arcs_list = raw_arcs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rag/structured_rag/entity_graph.py` around lines 441 - 455, When arcs_data is
a dict and you convert arcs_list from a mapping, preserve the dict keys as
possible arc titles: instead of calling list(arcs_list.values()) in the block
handling isinstance(arcs_list, dict), convert to an iterable of (key, value)
pairs or attach the key to the arc dict so the parsing logic can fall back to
the map key when arc_data.get("title") is missing; update the loop that
processes arc_data (references: arcs_data, arcs_list, arc_data, arc_title) to
accept either a tuple (key, value) or ensure arc_data["title"] = key when
missing before continuing so arcs with no embedded "title" are not skipped.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agents/summarizer.py (1)
1392-1410:⚠️ Potential issue | 🟠 MajorInvalidate
initial_world_cachewhen the schema/templates change.
world_data_cache_hashonly depends on character context, greeting, and user bio. Ifsubjects_schema.jsonorformat_templates.jsonchanges, a new chat can still hit an old cache directory and silently keep using stale copied artifacts.Consider folding the global schema/template fingerprint into
cache_content_key_string, or validating the cached copies before treating the directory as a hit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/summarizer.py` around lines 1392 - 1410, The cache key for initial_world_cache (built in cache_content_key_string and hashed via hash_key to world_data_cache_hash) omits the schema/templates fingerprint so stale subjects_schema.json or format_templates.json can cause incorrect cache hits; fix by incorporating a deterministic fingerprint (e.g., a content hash or last-modified timestamp) of subjects_schema.json and format_templates.json into cache_content_key_string before calling hash_key, or alternatively add an explicit validation step after computing initial_world_data_path that compares the on-disk files' fingerprints against current fingerprints and treat the directory as a miss if they differ (refer to cache_content_key_string, hash_key, world_data_cache_hash, required_cache_files, and initial_world_data_path).
♻️ Duplicate comments (1)
rag/structured_rag/entity_graph.py (1)
601-615:⚠️ Potential issue | 🟠 MajorPreserve arc names when normalizing mapping-shaped
arcs.json.
list(arcs_list.values())discards the dict key. If an inner arc object does not already contain"title", the next block skips it and the arc disappears from the graph.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/structured_rag/entity_graph.py` around lines 601 - 615, When normalizing mapping-shaped arcs (when arcs_data is a dict and arcs_list is a dict), the current list(arcs_list.values()) drops the mapping keys so arcs whose inner object lacks "title" get skipped; instead, convert the mapping to a list of arc dicts that preserve the original key as the arc title when missing: iterate arcs_list.items() and for each (key, value) ensure the resulting arc dict (arc_data) is a dict and set arc_data["title"] = key if "title" is absent, then assign the new list back to arcs_list so downstream code that checks arc_data.get("title") works correctly (refer to symbols arcs_data, arcs_list, arc_data and the "title" field).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/summarizer.py`:
- Around line 1404-1406: The new format_templates.json is being copied into
session cache but never used because FormattedData._load_format_templates()
always reads the global file and memoizes it process-wide; update
_load_format_templates to first check for a session-specific
format_templates.json (e.g., the same location used in
required_cache_files/session history) and load that if present, and change
memoization to be keyed by session id or removed so each session can use its own
snapshot; ensure the same session-cache lookup logic used where
required_cache_files is populated is referenced (e.g., the session history path
code) so per-session rendering uses the saved format_templates.json rather than
the global one.
In `@rag/structured_rag/context_retriever.py`:
- Around line 326-344: The graph-derived relationship keys are using raw
rel.target_id (e.g., "character:Bob") which breaks later matching against
characters_data; in the block that builds important_rels (inside the
entity_graph check using entity_graph.get_important_relationships and iterating
graph_rels) strip the namespace/prefix from rel.target_id before using it as the
dictionary key (and do the same in the other similar block around the 357-359
area), e.g., normalize target = rel.target_id.split(":",1)[-1] or call the
existing ID-normalization helper if one exists; then use that normalized target
when indexing important_rels and when later comparing against characters_data to
ensure consistent keys.
In `@rag/structured_rag/entity_graph.py`:
- Around line 172-178: The schema branch currently skips _build_events() and
_build_arcs() when self.schema_classes is set, causing scenes/events/arcs to be
omitted; update the logic so that either _build_from_schema() also creates
event/arc nodes (by recognizing scene/event/arc definitions and instantiating
them like other entities) or simply call self._build_events() and
self._build_arcs() after self._build_from_schema(); modify the code path in
entity_graph (the conditional around self.schema_classes and the
_build_from_schema method) to ensure event and arc nodes are always created in
schema-driven mode by adding the necessary creation logic or by invoking those
helper methods.
- Around line 316-331: The bidirectional rule matching fails because
schema_name.lower() yields plural forms (e.g., "characters") while relationship
IDs use singular prefixes (e.g., "character:"); update the rule-building logic
in the loop that populates bidir_rules (the block referencing schema_classes,
schema_class.get_relationship_fields(), and bidir_rules) to normalize
schema_name to the singular prefix used in IDs (e.g., set source_type =
schema_name[:-1].lower() if schema_name.endswith("s") else schema_name.lower()),
and likewise normalize target_type when reading config.get("target_type"), so
the later loop over self.relationships (using rel.source_id.split(":")[0] and
rel.target_id.split(":")[0]) can correctly match and set rel.bidirectional =
True.
- Around line 456-561: The code currently forces every participant into a
character node ("character:<participant>") which misclassifies groups; update
the event builders (the blocks that create event nodes and iterate participants)
so when creating participant nodes/ids you: 1) reuse an existing node id if
participant already exists in self.nodes; 2) if participant is a namespaced id
(contains ":"), use it as-is; 3) otherwise decide node type by checking a group
registry (e.g., self.groups or a known group name set) and create either
"group:<name>" or "character:<name>" accordingly; then create EntityNode with
the correct type/name and append Relationship exactly as before. Ensure this
change is applied in the scene, past, and crucial event-building sections where
participants are processed.
- Around line 239-247: The branch handling group character membership is dead
because the generic dict check fires first; change the condition order or
tighten the first condition so the special-case runs: check for
isinstance(field_data, dict) and field_name == "characters" and entity_type ==
"group" first and call _process_group_membership(node_id, field_data), then fall
back to the generic isinstance(field_data, dict) branch that calls
_process_relationship_dict(node_id, entity_name, target_type, field_data,
relationship_def, is_bidirectional). This ensures _process_group_membership is
invoked for group.characters before other dict processing.
In `@user_data/example/subjects_schema.json`:
- Around line 627-634: The GeneralInfo branch gate currently omits "objectives"
from what the branch_query_prompt_template asks the model to reconsider; update
the branch query prompt or its template (branch_query_prompt_template) for the
GeneralInfo branch to explicitly include "objectives" alongside writing style,
tone, themes, setting, rules, main objective, and synopsis so that
on_existing_scene checks consider secondary objectives; locate the GeneralInfo
schema entry (GeneralInfo) and modify its branch_query_prompt_template string to
mention "objectives" or add "objectives" to the list of fields the gate
evaluates.
---
Outside diff comments:
In `@agents/summarizer.py`:
- Around line 1392-1410: The cache key for initial_world_cache (built in
cache_content_key_string and hashed via hash_key to world_data_cache_hash) omits
the schema/templates fingerprint so stale subjects_schema.json or
format_templates.json can cause incorrect cache hits; fix by incorporating a
deterministic fingerprint (e.g., a content hash or last-modified timestamp) of
subjects_schema.json and format_templates.json into cache_content_key_string
before calling hash_key, or alternatively add an explicit validation step after
computing initial_world_data_path that compares the on-disk files' fingerprints
against current fingerprints and treat the directory as a miss if they differ
(refer to cache_content_key_string, hash_key, world_data_cache_hash,
required_cache_files, and initial_world_data_path).
---
Duplicate comments:
In `@rag/structured_rag/entity_graph.py`:
- Around line 601-615: When normalizing mapping-shaped arcs (when arcs_data is a
dict and arcs_list is a dict), the current list(arcs_list.values()) drops the
mapping keys so arcs whose inner object lacks "title" get skipped; instead,
convert the mapping to a list of arc dicts that preserve the original key as the
arc title when missing: iterate arcs_list.items() and for each (key, value)
ensure the resulting arc dict (arc_data) is a dict and set arc_data["title"] =
key if "title" is absent, then assign the new list back to arcs_list so
downstream code that checks arc_data.get("title") works correctly (refer to
symbols arcs_data, arcs_list, arc_data and the "title" field).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 32761d1f-cd6a-4bf1-8e8d-b832c8d281e4
📒 Files selected for processing (5)
agents/summarizer.pyrag/structured_rag/context_retriever.pyrag/structured_rag/entity_graph.pyuser_data/example/subjects_schema.jsonutils/schema_parser.py
🚧 Files skipped from review as they are similar to previous changes (1)
- utils/schema_parser.py
| required_cache_files = [ | ||
| "subjects_schema.json", | ||
| "format_templates.json", |
There was a problem hiding this comment.
The new format_templates.json snapshots are currently unused.
This code now copies format_templates.json into both the cache and the session history, but FormattedData._load_format_templates() later in this file still always reads the global example file and memoizes it process-wide. The saved snapshot never affects rendering, so this change does not actually give you per-session reproducibility.
Also applies to: 1440-1445, 1464-1466
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/summarizer.py` around lines 1404 - 1406, The new format_templates.json
is being copied into session cache but never used because
FormattedData._load_format_templates() always reads the global file and memoizes
it process-wide; update _load_format_templates to first check for a
session-specific format_templates.json (e.g., the same location used in
required_cache_files/session history) and load that if present, and change
memoization to be keyed by session id or removed so each session can use its own
snapshot; ensure the same session-cache lookup logic used where
required_cache_files is populated is referenced (e.g., the session history path
code) so per-session rendering uses the saved format_templates.json rather than
the global one.
| if hasattr(self, 'entity_graph') and self.entity_graph: | ||
| graph_rels = self.entity_graph.get_important_relationships(char_name, importance_threshold) | ||
| if graph_rels: | ||
| important_rels = {} | ||
| for rel in graph_rels: | ||
| target = rel.target_id | ||
| if target not in important_rels: | ||
| important_rels[target] = [] | ||
| important_rels[target].append({ | ||
| "relation": rel.relation, | ||
| "status": rel.status, | ||
| "aliases": rel.aliases, | ||
| "events": rel.events, | ||
| "importance": { | ||
| "score": rel.importance, | ||
| "reason": rel.importance_reason, | ||
| "faction": rel.faction | ||
| } | ||
| }) |
There was a problem hiding this comment.
Strip the namespace from graph target IDs before using them as relationship keys.
EntityGraph stores targets as IDs like character:Bob, but this method writes rel.target_id directly into the relationships dict. That breaks the later related_char in characters_data check and second-pass dedupe, so graph-derived relationships can be skipped or duplicated under keys like character:Bob.
Suggested fix
- target = rel.target_id
+ target = rel.target_id.split(":", 1)[1] if ":" in rel.target_id else rel.target_id
if target not in important_rels:
important_rels[target] = []Also applies to: 357-359
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rag/structured_rag/context_retriever.py` around lines 326 - 344, The
graph-derived relationship keys are using raw rel.target_id (e.g.,
"character:Bob") which breaks later matching against characters_data; in the
block that builds important_rels (inside the entity_graph check using
entity_graph.get_important_relationships and iterating graph_rels) strip the
namespace/prefix from rel.target_id before using it as the dictionary key (and
do the same in the other similar block around the 357-359 area), e.g., normalize
target = rel.target_id.split(":",1)[-1] or call the existing ID-normalization
helper if one exists; then use that normalized target when indexing
important_rels and when later comparing against characters_data to ensure
consistent keys.
| if isinstance(field_data, dict): | ||
| # Dict of target -> list of relationships (like character relationships) | ||
| self._process_relationship_dict( | ||
| node_id, entity_name, target_type, field_data, | ||
| relationship_def, is_bidirectional | ||
| ) | ||
| elif isinstance(field_data, dict) and field_name == "characters" and entity_type == "group": | ||
| # Group character membership (special case) | ||
| self._process_group_membership(node_id, field_data) |
There was a problem hiding this comment.
Group.characters never reaches the membership builder.
The first if isinstance(field_data, dict) already matches characters, so the elif ... field_name == "characters" block is dead. In schema mode, group membership dicts fall into _process_relationship_dict(), which skips non-list values, so no group→character membership edges are created.
Suggested fix
- if isinstance(field_data, dict):
- # Dict of target -> list of relationships (like character relationships)
- self._process_relationship_dict(
- node_id, entity_name, target_type, field_data,
- relationship_def, is_bidirectional
- )
- elif isinstance(field_data, dict) and field_name == "characters" and entity_type == "group":
+ if field_name == "characters" and entity_type == "group" and isinstance(field_data, dict):
# Group character membership (special case)
self._process_group_membership(node_id, field_data)
+ elif isinstance(field_data, dict):
+ # Dict of target -> list of relationships (like character relationships)
+ self._process_relationship_dict(
+ node_id, entity_name, target_type, field_data,
+ relationship_def, is_bidirectional
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rag/structured_rag/entity_graph.py` around lines 239 - 247, The branch
handling group character membership is dead because the generic dict check fires
first; change the condition order or tighten the first condition so the
special-case runs: check for isinstance(field_data, dict) and field_name ==
"characters" and entity_type == "group" first and call
_process_group_membership(node_id, field_data), then fall back to the generic
isinstance(field_data, dict) branch that calls
_process_relationship_dict(node_id, entity_name, target_type, field_data,
relationship_def, is_bidirectional). This ensures _process_group_membership is
invoked for group.characters before other dict processing.
| for schema_name, schema_class in self.schema_classes.items(): | ||
| rel_fields = schema_class.get_relationship_fields() | ||
| for field_name, config in rel_fields.items(): | ||
| if config.get("bidirectional", False): | ||
| source_type = schema_name.lower() | ||
| target_type = config.get("target_type", "unknown") | ||
| bidir_rules.append((source_type, target_type)) | ||
|
|
||
| # Mark relationships as bidirectional in metadata | ||
| for rel in self.relationships: | ||
| source_type = rel.source_id.split(":")[0] | ||
| target_type = rel.target_id.split(":")[0] | ||
|
|
||
| for src, tgt in bidir_rules: | ||
| if source_type == src and target_type == tgt: | ||
| rel.bidirectional = True |
There was a problem hiding this comment.
Bidirectional rules never match the stored node prefixes.
schema_classes keys are "Characters", "Groups", etc., so schema_name.lower() becomes "characters"/"groups", but relationship IDs use singular prefixes like character: and group:. In schema mode, these comparisons never match, so rel.bidirectional stays false even when the schema says otherwise.
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 318-318: Loop control variable field_name not used within loop body
Rename unused field_name to _field_name
(B007)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rag/structured_rag/entity_graph.py` around lines 316 - 331, The bidirectional
rule matching fails because schema_name.lower() yields plural forms (e.g.,
"characters") while relationship IDs use singular prefixes (e.g., "character:");
update the rule-building logic in the loop that populates bidir_rules (the block
referencing schema_classes, schema_class.get_relationship_fields(), and
bidir_rules) to normalize schema_name to the singular prefix used in IDs (e.g.,
set source_type = schema_name[:-1].lower() if schema_name.endswith("s") else
schema_name.lower()), and likewise normalize target_type when reading
config.get("target_type"), so the later loop over self.relationships (using
rel.source_id.split(":")[0] and rel.target_id.split(":")[0]) can correctly match
and set rel.bidirectional = True.
| # Extract and store participants | ||
| participants = scene_data.get("participants", []) | ||
| if participants: | ||
| node.data["participants"] = participants | ||
| # Create edges to participants | ||
| for participant in participants: | ||
| # Ensure participant node exists (create if missing) | ||
| participant_id = f"character:{participant}" | ||
| if participant_id not in self.nodes: | ||
| # Create a minimal node for this participant | ||
| self.nodes[participant_id] = EntityNode( | ||
| id=participant_id, | ||
| type="character", | ||
| name=participant, | ||
| data={} | ||
| ) | ||
|
|
||
| # Add edge from event to participant | ||
| relationship = Relationship( | ||
| source_id=node_id, | ||
| target_id=participant_id, | ||
| relation="participant", | ||
| status="event_participation", | ||
| importance=50 | ||
| ) | ||
| self.relationships.append(relationship) | ||
|
|
||
| self.nodes[node_id] = node | ||
|
|
||
| # Build past events | ||
| past = events_data.get("past", {}) | ||
| for event_name, event_data in past.items(): | ||
| node_id = f"event:{event_name}" | ||
| node = EntityNode( | ||
| id=node_id, | ||
| type="event", | ||
| name=event_name, | ||
| data=event_data | ||
| ) | ||
|
|
||
| # Extract and store participants | ||
| participants = event_data.get("participants", []) | ||
| if participants: | ||
| node.data["participants"] = participants | ||
| # Create edges to participants | ||
| for participant in participants: | ||
| # Ensure participant node exists (create if missing) | ||
| participant_id = f"character:{participant}" | ||
| if participant_id not in self.nodes: | ||
| # Create a minimal node for this participant | ||
| self.nodes[participant_id] = EntityNode( | ||
| id=participant_id, | ||
| type="character", | ||
| name=participant, | ||
| data={} | ||
| ) | ||
|
|
||
| # Add edge from event to participant | ||
| relationship = Relationship( | ||
| source_id=node_id, | ||
| target_id=participant_id, | ||
| relation="participant", | ||
| status="event_participation", | ||
| importance=50 | ||
| ) | ||
| self.relationships.append(relationship) | ||
|
|
||
| self.nodes[node_id] = node | ||
|
|
||
| # Build crucial events | ||
| crucial = events_data.get("events", {}) | ||
| for event_name, event_data in crucial.items(): | ||
| node_id = f"event:{event_name}" | ||
| node = EntityNode( | ||
| id=node_id, | ||
| type="event", | ||
| name=event_name, | ||
| data=event_data | ||
| ) | ||
|
|
||
| # Extract and store participants | ||
| participants = event_data.get("participants", []) | ||
| if participants: | ||
| node.data["participants"] = participants | ||
| # Create edges to participants | ||
| for participant in participants: | ||
| # Ensure participant node exists (create if missing) | ||
| participant_id = f"character:{participant}" | ||
| if participant_id not in self.nodes: | ||
| # Create a minimal node for this participant | ||
| self.nodes[participant_id] = EntityNode( | ||
| id=participant_id, | ||
| type="character", | ||
| name=participant, | ||
| data={} | ||
| ) | ||
|
|
||
| # Add edge from event to participant | ||
| relationship = Relationship( | ||
| source_id=node_id, | ||
| target_id=participant_id, | ||
| relation="participant", | ||
| status="event_participation", | ||
| importance=50 | ||
| ) | ||
| self.relationships.append(relationship) |
There was a problem hiding this comment.
Don't force every event participant into the character: namespace.
participants is defined as characters and/or groups in user_data/example/subjects_schema.json, but all three builders create character:<participant> nodes and edges. Group participants will be misclassified as characters, which makes graph-based event/group lookups inconsistent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rag/structured_rag/entity_graph.py` around lines 456 - 561, The code
currently forces every participant into a character node
("character:<participant>") which misclassifies groups; update the event
builders (the blocks that create event nodes and iterate participants) so when
creating participant nodes/ids you: 1) reuse an existing node id if participant
already exists in self.nodes; 2) if participant is a namespaced id (contains
":"), use it as-is; 3) otherwise decide node type by checking a group registry
(e.g., self.groups or a known group name set) and create either "group:<name>"
or "character:<name>" accordingly; then create EntityNode with the correct
type/name and append Relationship exactly as before. Ensure this change is
applied in the scene, past, and crucial event-building sections where
participants are processed.
| "setting": "str", | ||
| "rules": "list[str]", | ||
| "objectives": "list[Objective]", | ||
| "main_objective": "str", | ||
| "synopsis": "str" | ||
| }, | ||
| "defaults": { | ||
| "branch_query_prompt_template": "Based on the latest exchange and the current state of the story and chat as a whole, are there any significant changes, additions, or deletions required for this general information section (writing style, tone, themes, main objective, summary)? Respond with only 'NO' or 'UNCHANGED' if all the general information is probably up-to-date and no fields within it need to be checked individually. Otherwise, respond with 'YES'.", | ||
| "branch_query_prompt_template": "Based on the latest exchange and the current state of the story and chat as a whole, are there any significant changes, additions, or deletions required for this general information section (writing style, tone, themes, setting, rules, main objective, synopsis)? Respond with only 'NO' or 'UNCHANGED' if all the general information is probably up-to-date and no fields within it need to be checked individually. Otherwise, respond with 'YES'.", |
There was a problem hiding this comment.
Include objectives in the GeneralInfo branch gate.
GeneralInfo still has an objectives field, but this prompt only asks the model to reconsider writing style, tone, themes, setting, rules, main objective, and synopsis. During on_existing_scene, secondary-objective changes are now much less likely to trigger a review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@user_data/example/subjects_schema.json` around lines 627 - 634, The
GeneralInfo branch gate currently omits "objectives" from what the
branch_query_prompt_template asks the model to reconsider; update the branch
query prompt or its template (branch_query_prompt_template) for the GeneralInfo
branch to explicitly include "objectives" alongside writing style, tone, themes,
setting, rules, main objective, and synopsis so that on_existing_scene checks
consider secondary objectives; locate the GeneralInfo schema entry (GeneralInfo)
and modify its branch_query_prompt_template string to mention "objectives" or
add "objectives" to the list of fields the gate evaluates.
Summary
Summary by CodeRabbit
New Features
Enhancements