Add robust research persona system#257
Conversation
📝 WalkthroughWalkthroughAdds a private Research Persona subsystem: core models/storage/patching/projection, CLI commands and payload helpers, ingestion/audit/applications/runtime, staged build-persona workflow and docs, plus extensive tests and diagnostics/budget updates. ChangesResearch Persona local profile and applications
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as gpd research-persona
participant Core as research_persona_cli
participant Store as persona storage
User->>CLI: audit | show | diff/apply | export-capsule
CLI->>Core: build_*_payload(...)
Core->>Store: load/validate/apply (if not dry-run)
Store-->>Core: persona/capsule/diff results
Core-->>CLI: JSON payload
CLI-->>User: Rendered result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
|
🤖 RoastBot: The AI needed a persona system to do research. Buddy, if the AI doesn't know who it is without a personality quiz, maybe it shouldn't be writing your papers either. |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
tests/core/test_research_persona_projection.py (1)
87-104: ⚡ Quick winAdd a regression test for axis
fact_idsprivacy filtering.Current tests validate fact-value redaction, but they do not assert that prompt/public outputs avoid exposing private fact identifiers through
axes[].fact_ids. Adding this case will lock in the prompt-safety contract.✅ Suggested test addition
+def test_prompt_projection_axis_fact_ids_do_not_reference_filtered_private_facts() -> None: + persona = _persona_with_mixed_privacy_facts() + persona.axes = [ + { + "id": "axis-methods", + "name": "methods", + "value": 0.4, + "confidence": "confirmed", + "privacy": "safe_to_share", + "fact_ids": [SAFE_FACT_ID, "rp-private-local-canary"], + "evidence_refs": [], + } + ] # type: ignore[assignment] + + projection = project_research_persona(persona, purpose="prompt") + rendered = _render(projection) + assert SAFE_FACT_ID in rendered + assert "rp-private-local-canary" not in rendered🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/core/test_research_persona_projection.py` around lines 87 - 104, Extend the existing test_project_projection_includes_project_private_only_for_project_purpose to also assert that private fact identifiers are removed from axes[].fact_ids in prompt/public projections: call project_research_persona(persona, purpose="prompt") and inspect the returned projection (or its rendered form via _render) to ensure PROJECT_PRIVATE_FACT_ID does not appear in any axes[].fact_ids for the prompt projection, while it does appear for the project projection; use the same helper symbols (_render, PROJECT_PRIVATE_FACT_ID, project_research_persona, test_project_projection_includes_project_private_only_for_project_purpose) to locate and add these assertions.tests/core/test_research_persona_help_docs.py (1)
24-29: ⚡ Quick winDecouple CLI command extraction from an unrelated sentinel function.
Parsing up to
def _integrations_config_pathmakes this test fragile to unrelatedsrc/gpd/cli.pyrefactors.Proposed simplification
def _research_persona_cli_commands() -> tuple[str, ...]: cli = _read(CLI_PATH) - start_match = re.search(r"research_persona_app\s*=\s*typer[.]Typer", cli) - assert start_match is not None - start = start_match.start() - end = cli.index("def _integrations_config_path", start) - block = cli[start:end] - return tuple(re.findall(r'`@research_persona_app`\.command\("([^"]+)"\)', block)) + return tuple(re.findall(r'`@research_persona_app`\.command\("([^"]+)"\)', cli))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/core/test_research_persona_help_docs.py` around lines 24 - 29, The test currently slices the CLI text up to the unrelated sentinel function def _integrations_config_path which makes extraction fragile; instead, after finding the research_persona_app initializer with start_match (searching for research_persona_app\s*=\s*typer\.Typer), extract the commands by applying re.findall(r'`@research_persona_app`\.command\("([^"]+)"\)', cli) on the relevant scope (preferably the whole cli string or the substring starting at start) and return that tuple—replace the block/end slicing logic (start/end/block) with a direct re.findall call scoped from start or globally so the test no longer depends on _integrations_config_path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Around line 537-546: The table cells contain literal pipe characters that
break Markdown table parsing; update each command example (e.g., the `gpd
research-persona validate`, `diff`, `apply-patch`, `audit`, `forget`,
`ingest-source`, `export-capsule`, `doppelganger`, `explain-plan`, and
`taste-check` lines) to escape the pipe characters by replacing "|" with "\|"
inside the code spans so the pipes are treated as literal characters and the
table renders correctly.
In `@src/gpd/core/research_persona.py`:
- Around line 339-343: The current early return for purpose in
{"prompt","public"} checks only fact.privacy and makes the subsequent
sensitive-category guard dead; change the logic in the code that filters facts
for prompt/public (the block referencing purpose, fact.privacy, fact.category,
and _SENSITIVE_PROMPT_CATEGORIES) to combine the checks so any fact for
prompt/public is rejected if either fact.privacy != "safe_to_share" OR
fact.category.casefold() is in _SENSITIVE_PROMPT_CATEGORIES; update the
conditional (and keep using casefold() for category comparison) to return None
when either condition is true.
- Around line 1204-1213: The axis projection is leaking private fact identifiers
because axes are projected via _project_axis without reconciling axis.fact_ids
against the filtered persona.facts; compute the set of allowed fact ids from the
projected facts (the comprehension using _project_fact) and when projecting axes
(the comprehension using _project_axis) remove or filter any axis.fact_ids not
present in that allowed set (or omit the field entirely for prompt/public
purpose). Update _project_axis or the axis comprehension to perform this
filtering so axes in prompt/public payloads cannot reference non-prompt-safe
facts.
In `@src/gpd/specs/workflows/build-persona.md`:
- Around line 211-213: The workflow doc uses patch operation names add_fact,
replace_fact, add_axis, replace_axis, add_list_item, remove_list_item (and
pointer ops) but the persona builder schema expects upsert_fact, append_list,
upsert_axis, tombstone_fact; update the workflow text to use the schema's
canonical operation names (replace add_fact/replace_fact →
upsert_fact/tombstone_fact as appropriate, add_list_item/remove_list_item →
append_list or the tombstone pattern, add_axis/replace_axis → upsert_axis) so
generated patches match the research-persona patch contract; search for the
operation names in this file and replace them with the corresponding schema
names (upsert_fact, append_list, upsert_axis, tombstone_fact) and adjust any
examples or guidance to match the builder's expected semantics.
In `@src/gpd/specs/workflows/build-persona/persona-intake.md`:
- Around line 41-44: Replace the ambiguous "gpd research-persona diff"
invocation with the explicit diff form that includes the candidate patch
artifact path so the approval boundary always diffs the exact patch; locate the
"gpd research-persona diff" command in persona-intake.md and change it to call
the diff with the same patch argument used by "gpd research-persona apply-patch"
(e.g., the candidate patch variable/path) to ensure consistent, unambiguous
behavior.
In `@tests/core/test_research_persona_builder_privacy.py`:
- Around line 61-67: In tests/core/test_research_persona_builder_privacy.py the
comparison uses line.casefold() against the storage_needles tuple which contains
mixed-case tokens (e.g. "$GPD_DATA_DIR"), causing case-fold mismatches; fix by
normalizing the needles as well (e.g., build a lowercased/casefolded version of
storage_needles before the membership check) or compare using both sides
casefolded so the check uses line.casefold() in combination with
needle.casefold(); apply the same change to the second occurrence referenced
around lines 81-83 where the other tuple is checked.
In `@tests/core/test_research_persona_phase04_05_acceptance.py`:
- Around line 234-237: The test picks module_paths[0] which is ambiguous; update
test_phase4_source_ingestion_module_exposes_patch_only_api to assert a single
canonical module by checking that len(module_paths) == 1 (or using an explicit
equality assertion) before assigning module_path = module_paths[0]; use the
existing symbols _require_existing and PHASE4_MODULE_CANDIDATES to obtain
module_paths, assert exactly one candidate, then set module_path from that
single element. Apply the same change to the other analogous test block that
uses module_paths and module_path (the one around the second occurrence).
- Around line 145-148: The current public-API collector uses ast.walk(tree)
which traverses nested definitions; change it to iterate only module-level
statements (e.g., for node in tree.body) so only top-level ast.FunctionDef,
ast.AsyncFunctionDef and ast.ClassDef names are appended to names; keep the
existing underscore-name filter and return tuple(sorted(names)) as before.
In `@tests/core/test_research_persona_system_acceptance.py`:
- Around line 288-290: The test currently runs the audit-command assertion
whenever any optional module exists; change it to only run when audit-related
optional modules are present by replacing the generic
_existing_optional_modules() gate with a check for audit modules (e.g., call a
variant like _existing_optional_modules("audit") or use an explicit membership
test against your audit module list), then keep the existing has_audit_command
logic that scans command_names against AUDIT_COMMAND_HINTS and uses _message for
the assertion failure.
---
Nitpick comments:
In `@tests/core/test_research_persona_help_docs.py`:
- Around line 24-29: The test currently slices the CLI text up to the unrelated
sentinel function def _integrations_config_path which makes extraction fragile;
instead, after finding the research_persona_app initializer with start_match
(searching for research_persona_app\s*=\s*typer\.Typer), extract the commands by
applying re.findall(r'`@research_persona_app`\.command\("([^"]+)"\)', cli) on the
relevant scope (preferably the whole cli string or the substring starting at
start) and return that tuple—replace the block/end slicing logic
(start/end/block) with a direct re.findall call scoped from start or globally so
the test no longer depends on _integrations_config_path.
In `@tests/core/test_research_persona_projection.py`:
- Around line 87-104: Extend the existing
test_project_projection_includes_project_private_only_for_project_purpose to
also assert that private fact identifiers are removed from axes[].fact_ids in
prompt/public projections: call project_research_persona(persona,
purpose="prompt") and inspect the returned projection (or its rendered form via
_render) to ensure PROJECT_PRIVATE_FACT_ID does not appear in any
axes[].fact_ids for the prompt projection, while it does appear for the project
projection; use the same helper symbols (_render, PROJECT_PRIVATE_FACT_ID,
project_research_persona,
test_project_projection_includes_project_private_only_for_project_purpose) to
locate and add these assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e4c5a861-bc16-4bed-b115-361641b3f4ab
📒 Files selected for processing (49)
README.mdsrc/gpd/agents/gpd-expertise-explainer.mdsrc/gpd/agents/gpd-persona-builder.mdsrc/gpd/agents/gpd-researcher-doppelganger.mdsrc/gpd/agents/gpd-scientific-taste.mdsrc/gpd/cli.pysrc/gpd/commands/build-persona.mdsrc/gpd/commands/help.mdsrc/gpd/core/research_persona.pysrc/gpd/core/research_persona_applications.pysrc/gpd/core/research_persona_audit.pysrc/gpd/core/research_persona_cli.pysrc/gpd/core/research_persona_ingestion.pysrc/gpd/core/research_persona_runtime.pysrc/gpd/registry.pysrc/gpd/specs/references/research-persona-applications.mdsrc/gpd/specs/workflows/build-persona-stage-manifest.jsonsrc/gpd/specs/workflows/build-persona.mdsrc/gpd/specs/workflows/build-persona/approval-and-apply.mdsrc/gpd/specs/workflows/build-persona/persona-intake.mdsrc/gpd/specs/workflows/build-persona/persona-synthesis.mdsrc/gpd/specs/workflows/build-persona/source-ingestion.mdtests/core/test_research_persona_application_surfaces.pytests/core/test_research_persona_applications.pytests/core/test_research_persona_audit.pytests/core/test_research_persona_builder_acceptance.pytests/core/test_research_persona_builder_help.pytests/core/test_research_persona_builder_privacy.pytests/core/test_research_persona_builder_registry.pytests/core/test_research_persona_cli_audit.pytests/core/test_research_persona_cli_ingestion_applications.pytests/core/test_research_persona_cli_mutation.pytests/core/test_research_persona_cli_privacy.pytests/core/test_research_persona_cli_show_validate.pytests/core/test_research_persona_cli_support.pytests/core/test_research_persona_help_docs.pytests/core/test_research_persona_ingestion_core.pytests/core/test_research_persona_models.pytests/core/test_research_persona_patch_history.pytests/core/test_research_persona_phase04_05_acceptance.pytests/core/test_research_persona_privacy_contract.pytests/core/test_research_persona_projection.pytests/core/test_research_persona_runtime.pytests/core/test_research_persona_source_ingestion_workflow.pytests/core/test_research_persona_storage.pytests/core/test_research_persona_system_acceptance.pytmp/phase-01-foundation-report.mdtmp/phase-02-cli-report.mdtmp/phase-03-persona-builder-report.md
| | `gpd research-persona validate [PROFILE_JSON|-]` | Validate stored or supplied persona JSON | | ||
| | `gpd research-persona diff PATCH_JSON|-` | Preview a candidate patch without writing | | ||
| | `gpd research-persona apply-patch PATCH_JSON|-` | Apply an approved patch, with `--dry-run` for preview only | | ||
| | `gpd research-persona audit [PROFILE_JSON|-]` | Inspect persona quality, privacy, evidence, and capsule readiness without writing | | ||
| | `gpd research-persona forget FACT_ID` | Tombstone one fact, with `--dry-run` for preview only | | ||
| | `gpd research-persona ingest-source SOURCE_JSON|-` | Build a candidate patch from an explicit source document; optional `--output` writes only that candidate patch artifact | | ||
| | `gpd research-persona export-capsule --role doppelganger` | Export a prompt-safe role capsule; roles include planner, executor, verifier, paper_writer, literature, recovery, explainer, doppelganger, and taste | | ||
| | `gpd research-persona doppelganger` | Preview likely user objections, standards, and next moves through a prompt-safe Researcher Doppelganger capsule | | ||
| | `gpd research-persona explain-plan [PLAN_JSON|-]` | Preview Expertise-Aware Explanations guidance from an explainer capsule | | ||
| | `gpd research-persona taste-check [CANDIDATE_JSON|-]` | Preview Scientific Taste Model feedback from a taste capsule | |
There was a problem hiding this comment.
Escape literal | in table command examples to prevent broken rendering
Lines 537–546 include unescaped pipe characters inside table cells, so Markdown parses extra columns and drops command text in some renderers.
Suggested fix
-| `gpd research-persona validate [PROFILE_JSON|-]` | Validate stored or supplied persona JSON |
-| `gpd research-persona diff PATCH_JSON|-` | Preview a candidate patch without writing |
-| `gpd research-persona apply-patch PATCH_JSON|-` | Apply an approved patch, with `--dry-run` for preview only |
-| `gpd research-persona audit [PROFILE_JSON|-]` | Inspect persona quality, privacy, evidence, and capsule readiness without writing |
+| `gpd research-persona validate [PROFILE_JSON\|-]` | Validate stored or supplied persona JSON |
+| `gpd research-persona diff PATCH_JSON\|-` | Preview a candidate patch without writing |
+| `gpd research-persona apply-patch PATCH_JSON\|-` | Apply an approved patch, with `--dry-run` for preview only |
+| `gpd research-persona audit [PROFILE_JSON\|-]` | Inspect persona quality, privacy, evidence, and capsule readiness without writing |
| `gpd research-persona forget FACT_ID` | Tombstone one fact, with `--dry-run` for preview only |
-| `gpd research-persona ingest-source SOURCE_JSON|-` | Build a candidate patch from an explicit source document; optional `--output` writes only that candidate patch artifact |
+| `gpd research-persona ingest-source SOURCE_JSON\|-` | Build a candidate patch from an explicit source document; optional `--output` writes only that candidate patch artifact |
| `gpd research-persona export-capsule --role doppelganger` | Export a prompt-safe role capsule; roles include planner, executor, verifier, paper_writer, literature, recovery, explainer, doppelganger, and taste |
| `gpd research-persona doppelganger` | Preview likely user objections, standards, and next moves through a prompt-safe Researcher Doppelganger capsule |
-| `gpd research-persona explain-plan [PLAN_JSON|-]` | Preview Expertise-Aware Explanations guidance from an explainer capsule |
-| `gpd research-persona taste-check [CANDIDATE_JSON|-]` | Preview Scientific Taste Model feedback from a taste capsule |
+| `gpd research-persona explain-plan [PLAN_JSON\|-]` | Preview Expertise-Aware Explanations guidance from an explainer capsule |
+| `gpd research-persona taste-check [CANDIDATE_JSON\|-]` | Preview Scientific Taste Model feedback from a taste capsule |🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 537-537: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 538-538: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 539-539: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 540-540: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 542-542: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 545-545: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
[warning] 546-546: Table column count
Expected: 2; Actual: 3; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` around lines 537 - 546, The table cells contain literal pipe
characters that break Markdown table parsing; update each command example (e.g.,
the `gpd research-persona validate`, `diff`, `apply-patch`, `audit`, `forget`,
`ingest-source`, `export-capsule`, `doppelganger`, `explain-plan`, and
`taste-check` lines) to escape the pipe characters by replacing "|" with "\|"
inside the code spans so the pipes are treated as literal characters and the
table renders correctly.
| if purpose in {"prompt", "public"} and fact.privacy != "safe_to_share": | ||
| return None | ||
| if purpose in {"prompt", "public"} and fact.category.casefold() in _SENSITIVE_PROMPT_CATEGORIES: | ||
| if fact.privacy != "safe_to_share": | ||
| return None |
There was a problem hiding this comment.
Sensitive-category prompt filter is currently ineffective.
The guard on sensitive categories is dead in prompt/public mode because Line 340 already returns for any non-safe_to_share fact, so the nested check never triggers. This currently allows contact/identity/private_paper facts through whenever they are marked safe_to_share, which conflicts with the explicit _SENSITIVE_PROMPT_CATEGORIES safety intent.
🔧 Proposed fix
- if purpose in {"prompt", "public"} and fact.privacy != "safe_to_share":
- return None
- if purpose in {"prompt", "public"} and fact.category.casefold() in _SENSITIVE_PROMPT_CATEGORIES:
- if fact.privacy != "safe_to_share":
- return None
+ if purpose in {"prompt", "public"} and fact.privacy != "safe_to_share":
+ return None
+ if purpose in {"prompt", "public"} and fact.category.casefold() in _SENSITIVE_PROMPT_CATEGORIES:
+ return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if purpose in {"prompt", "public"} and fact.privacy != "safe_to_share": | |
| return None | |
| if purpose in {"prompt", "public"} and fact.category.casefold() in _SENSITIVE_PROMPT_CATEGORIES: | |
| if fact.privacy != "safe_to_share": | |
| return None | |
| if purpose in {"prompt", "public"} and fact.privacy != "safe_to_share": | |
| return None | |
| if purpose in {"prompt", "public"} and fact.category.casefold() in _SENSITIVE_PROMPT_CATEGORIES: | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/core/research_persona.py` around lines 339 - 343, The current early
return for purpose in {"prompt","public"} checks only fact.privacy and makes the
subsequent sensitive-category guard dead; change the logic in the code that
filters facts for prompt/public (the block referencing purpose, fact.privacy,
fact.category, and _SENSITIVE_PROMPT_CATEGORIES) to combine the checks so any
fact for prompt/public is rejected if either fact.privacy != "safe_to_share" OR
fact.category.casefold() is in _SENSITIVE_PROMPT_CATEGORIES; update the
conditional (and keep using casefold() for category comparison) to return None
when either condition is true.
| "facts": [ | ||
| projected | ||
| for fact in persona.facts | ||
| if (projected := _project_fact(fact, purpose=normalized_purpose)) is not None | ||
| ], | ||
| "axes": [ | ||
| projected | ||
| for axis in persona.axes | ||
| if (projected := _project_axis(axis, purpose=normalized_purpose)) is not None | ||
| ], |
There was a problem hiding this comment.
Prompt/public projection can leak private fact identifiers through axes.fact_ids.
facts are filtered by privacy, but axes are projected independently and currently preserve fact_ids as-is. If a prompt-safe axis references a non-prompt-safe fact, the private fact identifier can still be exported in prompt/public payloads.
🔧 Proposed fix
- projection: dict[str, object] = {
+ projected_facts = [
+ projected
+ for fact in persona.facts
+ if (projected := _project_fact(fact, purpose=normalized_purpose)) is not None
+ ]
+ allowed_fact_ids = {
+ fact["id"]
+ for fact in projected_facts
+ if isinstance(fact, dict) and isinstance(fact.get("id"), str)
+ }
+ projected_axes: list[dict[str, object]] = []
+ for axis in persona.axes:
+ projected = _project_axis(axis, purpose=normalized_purpose)
+ if projected is None:
+ continue
+ if normalized_purpose in {"prompt", "public"}:
+ fact_ids = projected.get("fact_ids")
+ if isinstance(fact_ids, list):
+ kept = [fid for fid in fact_ids if isinstance(fid, str) and fid in allowed_fact_ids]
+ if kept:
+ projected["fact_ids"] = kept
+ else:
+ projected.pop("fact_ids", None)
+
+ projected_axes.append(projected)
+
+ projection: dict[str, object] = {
"schema_version": persona.schema_version,
"purpose": normalized_purpose,
- "facts": [
- projected
- for fact in persona.facts
- if (projected := _project_fact(fact, purpose=normalized_purpose)) is not None
- ],
- "axes": [
- projected
- for axis in persona.axes
- if (projected := _project_axis(axis, purpose=normalized_purpose)) is not None
- ],
+ "facts": projected_facts,
+ "axes": projected_axes,
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/core/research_persona.py` around lines 1204 - 1213, The axis
projection is leaking private fact identifiers because axes are projected via
_project_axis without reconciling axis.fact_ids against the filtered
persona.facts; compute the set of allowed fact ids from the projected facts (the
comprehension using _project_fact) and when projecting axes (the comprehension
using _project_axis) remove or filter any axis.fact_ids not present in that
allowed set (or omit the field entirely for prompt/public purpose). Update
_project_axis or the axis comprehension to perform this filtering so axes in
prompt/public payloads cannot reference non-prompt-safe facts.
| - Use `add_fact`, `replace_fact`, `add_axis`, `replace_axis`, `add_list_item`, | ||
| `remove_list_item`, or pointer operations supported by the research-persona | ||
| patch schema. |
There was a problem hiding this comment.
Patch operation names conflict with the documented patch contract.
This workflow instructs add_fact/replace_fact/add_axis/..., but the builder schema in src/gpd/agents/gpd-persona-builder.md defines upsert_fact, append_list, upsert_axis, and tombstone_fact. This mismatch can produce invalid candidate patches during synthesis.
Suggested doc fix
-- Use `add_fact`, `replace_fact`, `add_axis`, `replace_axis`, `add_list_item`,
-- `remove_list_item`, or pointer operations supported by the research-persona
-- patch schema.
+- Use `upsert_fact` for fact updates, `append_list` for list projections,
+- `upsert_axis` for numeric preference axes, and `tombstone_fact` only when
+- forgetting/correcting a durable fact id, consistent with the patch schema.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Use `add_fact`, `replace_fact`, `add_axis`, `replace_axis`, `add_list_item`, | |
| `remove_list_item`, or pointer operations supported by the research-persona | |
| patch schema. | |
| - Use `upsert_fact` for fact updates, `append_list` for list projections, | |
| `upsert_axis` for numeric preference axes, and `tombstone_fact` only when | |
| forgetting/correcting a durable fact id, consistent with the patch schema. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/specs/workflows/build-persona.md` around lines 211 - 213, The
workflow doc uses patch operation names add_fact, replace_fact, add_axis,
replace_axis, add_list_item, remove_list_item (and pointer ops) but the persona
builder schema expects upsert_fact, append_list, upsert_axis, tombstone_fact;
update the workflow text to use the schema's canonical operation names (replace
add_fact/replace_fact → upsert_fact/tombstone_fact as appropriate,
add_list_item/remove_list_item → append_list or the tombstone pattern,
add_axis/replace_axis → upsert_axis) so generated patches match the
research-persona patch contract; search for the operation names in this file and
replace them with the corresponding schema names (upsert_fact, append_list,
upsert_axis, tombstone_fact) and adjust any examples or guidance to match the
builder's expected semantics.
| ```text | ||
| gpd research-persona validate | ||
| gpd research-persona diff | ||
| gpd research-persona apply-patch |
There was a problem hiding this comment.
Use the explicit diff command form in the approval boundary.
Line 43 uses gpd research-persona diff without a patch path, but the rest of this workflow family expects the candidate patch artifact to be diffed explicitly. Keep this command consistent to avoid ambiguous operator behavior.
Suggested doc fix
gpd research-persona validate
-gpd research-persona diff
+gpd research-persona diff GPD/persona/candidate-patch.json
gpd research-persona apply-patch📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```text | |
| gpd research-persona validate | |
| gpd research-persona diff | |
| gpd research-persona apply-patch |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/specs/workflows/build-persona/persona-intake.md` around lines 41 -
44, Replace the ambiguous "gpd research-persona diff" invocation with the
explicit diff form that includes the candidate patch artifact path so the
approval boundary always diffs the exact patch; locate the "gpd research-persona
diff" command in persona-intake.md and change it to call the diff with the same
patch argument used by "gpd research-persona apply-patch" (e.g., the candidate
patch variable/path) to ensure consistent, unambiguous behavior.
| storage_needles = ( | ||
| "research-persona/profile.json", | ||
| "research_persona/profile.json", | ||
| "~/.gpd/research-persona/profile.json", | ||
| "$GPD_DATA_DIR/research-persona/profile.json", | ||
| "${GPD_DATA_DIR:-~/.gpd}/research-persona/profile.json", | ||
| ) |
There was a problem hiding this comment.
Case-fold mismatch weakens private-profile detection.
line.casefold() is compared against storage_needles that include uppercase env var tokens, so those needles can be missed and unsafe lines may escape this test.
Suggested fix
def _non_negated_storage_lines(text: str, *, verbs: tuple[str, ...]) -> list[str]:
- storage_needles = (
+ storage_needles = tuple(
+ needle.casefold()
+ for needle in (
"research-persona/profile.json",
"research_persona/profile.json",
"~/.gpd/research-persona/profile.json",
"$GPD_DATA_DIR/research-persona/profile.json",
"${GPD_DATA_DIR:-~/.gpd}/research-persona/profile.json",
- )
+ )
+ )Also applies to: 81-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/core/test_research_persona_builder_privacy.py` around lines 61 - 67, In
tests/core/test_research_persona_builder_privacy.py the comparison uses
line.casefold() against the storage_needles tuple which contains mixed-case
tokens (e.g. "$GPD_DATA_DIR"), causing case-fold mismatches; fix by normalizing
the needles as well (e.g., build a lowercased/casefolded version of
storage_needles before the membership check) or compare using both sides
casefolded so the check uses line.casefold() in combination with
needle.casefold(); apply the same change to the second occurrence referenced
around lines 81-83 where the other tuple is checked.
| for node in ast.walk(tree): | ||
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)) and not node.name.startswith("_"): | ||
| names.append(node.name) | ||
| return tuple(sorted(names)) |
There was a problem hiding this comment.
Limit public API detection to module-level definitions.
Using ast.walk includes nested defs/classes, which can falsely satisfy public-helper coverage assertions.
Proposed fix
def _public_api_names(tree: ast.Module) -> tuple[str, ...]:
names: list[str] = []
- for node in ast.walk(tree):
+ for node in tree.body:
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)) and not node.name.startswith("_"):
names.append(node.name)
return tuple(sorted(names))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for node in ast.walk(tree): | |
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)) and not node.name.startswith("_"): | |
| names.append(node.name) | |
| return tuple(sorted(names)) | |
| for node in tree.body: | |
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)) and not node.name.startswith("_"): | |
| names.append(node.name) | |
| return tuple(sorted(names)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/core/test_research_persona_phase04_05_acceptance.py` around lines 145 -
148, The current public-API collector uses ast.walk(tree) which traverses nested
definitions; change it to iterate only module-level statements (e.g., for node
in tree.body) so only top-level ast.FunctionDef, ast.AsyncFunctionDef and
ast.ClassDef names are appended to names; keep the existing underscore-name
filter and return tuple(sorted(names)) as before.
| if _existing_optional_modules(): | ||
| has_audit_command = any(any(hint in command for hint in AUDIT_COMMAND_HINTS) for command in command_names) | ||
| assert has_audit_command, _message("missing optional audit CLI command", command_names) |
There was a problem hiding this comment.
Gate audit command checks on audit modules only.
Current logic triggers audit-command enforcement when any optional module exists, including non-audit modules.
Proposed fix
- if _existing_optional_modules():
+ optional_modules = set(_existing_optional_modules())
+ audit_modules = {"research_persona_audit", "research_persona_audit_log", "research_persona_runtime_audit"}
+ if optional_modules & audit_modules:
has_audit_command = any(any(hint in command for hint in AUDIT_COMMAND_HINTS) for command in command_names)
assert has_audit_command, _message("missing optional audit CLI command", command_names)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if _existing_optional_modules(): | |
| has_audit_command = any(any(hint in command for hint in AUDIT_COMMAND_HINTS) for command in command_names) | |
| assert has_audit_command, _message("missing optional audit CLI command", command_names) | |
| optional_modules = set(_existing_optional_modules()) | |
| audit_modules = {"research_persona_audit", "research_persona_audit_log", "research_persona_runtime_audit"} | |
| if optional_modules & audit_modules: | |
| has_audit_command = any(any(hint in command for hint in AUDIT_COMMAND_HINTS) for command in command_names) | |
| assert has_audit_command, _message("missing optional audit CLI command", command_names) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/core/test_research_persona_system_acceptance.py` around lines 288 -
290, The test currently runs the audit-command assertion whenever any optional
module exists; change it to only run when audit-related optional modules are
present by replacing the generic _existing_optional_modules() gate with a check
for audit modules (e.g., call a variant like _existing_optional_modules("audit")
or use an explicit membership test against your audit module list), then keep
the existing has_audit_command logic that scans command_names against
AUDIT_COMMAND_HINTS and uses _message for the assertion failure.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gpd/cli.py (1)
5416-5433:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn a non-zero exit code when persona validation fails.
src/gpd/core/research_persona_cli.py:579-596returns a{valid: false, ...}payload instead of raising, so Line 5433 currently exits 0 for invalid persona JSON. That breaks the usualvalidatecommand contract for scripts and CI.💡 Proposed fix
def research_persona_validate( input_path: str | None = typer.Argument( None, metavar="[PROFILE_JSON|-]", help="Optional profile JSON path, or - for stdin. Defaults to the stored profile.", ), ) -> None: """Validate a research persona profile document or the stored profile.""" document = _load_json_document_or_error(input_path) if input_path is not None else None payload = _research_persona_payload( "build_validate_payload", cwd=_get_cwd(), document=document, input_path=input_path, ) _output(payload) + if isinstance(payload, dict) and payload.get("valid") is False: + raise typer.Exit(code=1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/gpd/cli.py` around lines 5416 - 5433, The validate command currently always exits 0 because research_persona_validate builds a payload via _research_persona_payload and writes it with _output without checking the result; update research_persona_validate to inspect the returned payload (e.g., check payload.get("valid") or equivalent boolean returned by _research_persona_payload), and if validation failed (valid is False) call sys.exit(non-zero) or raise typer.Exit(code=1) after emitting the payload so scripts/CI see a failing exit code; keep using _load_json_document_or_error for input handling and still call _output(payload) before exiting.
🧹 Nitpick comments (1)
tests/core/test_cli.py (1)
613-618: ⚡ Quick winDerive expected tier-mix text from canonical counts instead of hard-coding literals.
This assertion will keep breaking on profile catalog updates even when behavior is correct. Build the expected string from
_profile_tier_mix("review")so the test stays semantic-focused.Proposed patch
def _assert_cost_posture_semantics(output: str) -> None: assert _COST_TEST_RUNTIME in output assert "review" in output assert "runtime defaults" in output - assert "tier-1=14, tier-2=13, tier-3=1" in output + expected_mix = ", ".join( + f"{tier}={count}" + for tier in ("tier-1", "tier-2", "tier-3") + if (count := _profile_tier_mix("review").get(tier, 0)) > 0 + ) + assert expected_mix in output assert "Advisory only; counts profile-to-tier assignments" in output assert "set-tier-models" in output🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/core/test_cli.py` around lines 613 - 618, The test _assert_cost_posture_semantics currently asserts a hard-coded tier mix string which will break when profile counts change; replace the literal check for "tier-1=14, tier-2=13, tier-3=1" with a derived expected string built by calling _profile_tier_mix("review") (e.g., expected_tier_mix = _profile_tier_mix("review")) and assert that expected_tier_mix is in output, removing the hard-coded literal so the test stays semantic-focused; keep the other assertions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/gpd/specs/workflows/help.md`:
- Around line 287-290: Update the gpd:build-persona command signature to match
the usage examples by either expanding the signature `gpd:build-persona
[focus|--from-current-project|--interview-only]` to include the other supported
flags (`--paper`, `--bibtex`, `--manual-patch`) or replace the signature with a
concise form that indicates more options (e.g., `gpd:build-persona
[focus|--from-current-project|--interview-only|...]`) and add a short note that
`--paper`, `--bibtex`, and `--manual-patch` are valid source-ingestion options
referenced in the usage examples; ensure the signature text and the usage
examples remain consistent.
---
Outside diff comments:
In `@src/gpd/cli.py`:
- Around line 5416-5433: The validate command currently always exits 0 because
research_persona_validate builds a payload via _research_persona_payload and
writes it with _output without checking the result; update
research_persona_validate to inspect the returned payload (e.g., check
payload.get("valid") or equivalent boolean returned by
_research_persona_payload), and if validation failed (valid is False) call
sys.exit(non-zero) or raise typer.Exit(code=1) after emitting the payload so
scripts/CI see a failing exit code; keep using _load_json_document_or_error for
input handling and still call _output(payload) before exiting.
---
Nitpick comments:
In `@tests/core/test_cli.py`:
- Around line 613-618: The test _assert_cost_posture_semantics currently asserts
a hard-coded tier mix string which will break when profile counts change;
replace the literal check for "tier-1=14, tier-2=13, tier-3=1" with a derived
expected string built by calling _profile_tier_mix("review") (e.g.,
expected_tier_mix = _profile_tier_mix("review")) and assert that
expected_tier_mix is in output, removing the hard-coded literal so the test
stays semantic-focused; keep the other assertions unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 83c05621-28cb-454b-8a29-f6894b395603
📒 Files selected for processing (36)
src/gpd/agents/gpd-expertise-explainer.mdsrc/gpd/agents/gpd-researcher-doppelganger.mdsrc/gpd/agents/gpd-scientific-taste.mdsrc/gpd/cli.pysrc/gpd/commands/build-persona.mdsrc/gpd/commands/help.mdsrc/gpd/core/config.pysrc/gpd/core/context.pysrc/gpd/core/return_fields.pysrc/gpd/core/workflow_staging.pysrc/gpd/specs/references/help/detailed-command-reference.mdsrc/gpd/specs/references/orchestration/model-profiles.mdsrc/gpd/specs/references/research/research-persona-applications.mdsrc/gpd/specs/workflows/build-persona-stage-manifest.jsonsrc/gpd/specs/workflows/help.mdsrc/gpd/specs/workflows/set-profile.mdtests/README.mdtests/adapters/test_runtime_projection_diagnostics_budget.pytests/core/test_agent_prompt_budget.pytests/core/test_cli.pytests/core/test_command_prompt_budget.pytests/core/test_config.pytests/core/test_prompt_surface_diagnostics_budget.pytests/core/test_prompt_surface_diagnostics_cli.pytests/core/test_research_persona_application_surfaces.pytests/core/test_research_persona_builder_help.pytests/core/test_research_persona_help_docs.pytests/core/test_research_persona_models.pytests/core/test_research_persona_patch_history.pytests/core/test_research_persona_phase04_05_acceptance.pytests/core/test_research_persona_privacy_contract.pytests/core/test_research_persona_storage.pytests/core/test_research_persona_system_acceptance.pytests/core/test_workflow_init_authority.pytests/core/test_workflow_init_builder_registry.pytests/repo_graph_contract.json
💤 Files with no reviewable changes (2)
- src/gpd/specs/references/research/research-persona-applications.md
- src/gpd/specs/workflows/build-persona-stage-manifest.json
✅ Files skipped from review due to trivial changes (5)
- src/gpd/specs/workflows/set-profile.md
- tests/core/test_prompt_surface_diagnostics_cli.py
- src/gpd/specs/references/help/detailed-command-reference.md
- tests/core/test_command_prompt_budget.py
- src/gpd/agents/gpd-scientific-taste.md
🚧 Files skipped from review as they are similar to previous changes (10)
- src/gpd/agents/gpd-expertise-explainer.md
- src/gpd/commands/help.md
- src/gpd/agents/gpd-researcher-doppelganger.md
- tests/core/test_research_persona_privacy_contract.py
- tests/core/test_research_persona_builder_help.py
- src/gpd/commands/build-persona.md
- tests/core/test_research_persona_system_acceptance.py
- tests/core/test_research_persona_patch_history.py
- tests/core/test_research_persona_storage.py
- tests/core/test_research_persona_models.py
| **`gpd:build-persona [focus|--from-current-project|--interview-only]`** | ||
| Build a private research persona patch from explicit interview or consented source ingestion | ||
| Usage: `gpd:build-persona --interview-only`; `gpd:build-persona --from-current-project "math/code balance and citation style"`; `gpd:build-persona --paper paper/main.tex`; `gpd:build-persona --bibtex references/library.bib`; `gpd:build-persona --manual-patch /tmp/persona-patch.json` | ||
| Notes: Emits a candidate ResearchPersonaPatch only; apply it separately with `gpd research-persona apply-patch`. Interviewing, paper imports, BibTeX imports, repository scans, and manual patch review require exact source consent. Source-derived candidates use `gpd research-persona ingest-source SOURCE_JSON|- --output PATCH_JSON`. Never writes persona storage directly. |
There was a problem hiding this comment.
Fix the signature to match the usage examples.
The command signature on line 287 shows [focus|--from-current-project|--interview-only], but the usage examples on lines 289-290 demonstrate additional options (--paper, --bibtex, --manual-patch) that aren't listed in the signature. This creates a documentation inconsistency.
Either expand the signature to include all available options or add an indicator like [...] to signal that additional options exist beyond those shown.
📝 Proposed fix to make the signature comprehensive
-**`gpd:build-persona [focus|--from-current-project|--interview-only]`**
+**`gpd:build-persona [focus|--from-current-project|--interview-only|--paper PATH|--bibtex PATH|--manual-patch PATH]`**Alternatively, if the signature should remain brief:
-**`gpd:build-persona [focus|--from-current-project|--interview-only]`**
+**`gpd:build-persona [focus|--from-current-project|--interview-only|...]`**Then add a note explaining that additional source-ingestion options (--paper, --bibtex, --manual-patch) are available.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **`gpd:build-persona [focus|--from-current-project|--interview-only]`** | |
| Build a private research persona patch from explicit interview or consented source ingestion | |
| Usage: `gpd:build-persona --interview-only`; `gpd:build-persona --from-current-project "math/code balance and citation style"`; `gpd:build-persona --paper paper/main.tex`; `gpd:build-persona --bibtex references/library.bib`; `gpd:build-persona --manual-patch /tmp/persona-patch.json` | |
| Notes: Emits a candidate ResearchPersonaPatch only; apply it separately with `gpd research-persona apply-patch`. Interviewing, paper imports, BibTeX imports, repository scans, and manual patch review require exact source consent. Source-derived candidates use `gpd research-persona ingest-source SOURCE_JSON|- --output PATCH_JSON`. Never writes persona storage directly. | |
| **`gpd:build-persona [focus|--from-current-project|--interview-only|--paper PATH|--bibtex PATH|--manual-patch PATH]`** | |
| Build a private research persona patch from explicit interview or consented source ingestion | |
| Usage: `gpd:build-persona --interview-only`; `gpd:build-persona --from-current-project "math/code balance and citation style"`; `gpd:build-persona --paper paper/main.tex`; `gpd:build-persona --bibtex references/library.bib`; `gpd:build-persona --manual-patch /tmp/persona-patch.json` | |
| Notes: Emits a candidate ResearchPersonaPatch only; apply it separately with `gpd research-persona apply-patch`. Interviewing, paper imports, BibTeX imports, repository scans, and manual patch review require exact source consent. Source-derived candidates use `gpd research-persona ingest-source SOURCE_JSON|- --output PATCH_JSON`. Never writes persona storage directly. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/specs/workflows/help.md` around lines 287 - 290, Update the
gpd:build-persona command signature to match the usage examples by either
expanding the signature `gpd:build-persona
[focus|--from-current-project|--interview-only]` to include the other supported
flags (`--paper`, `--bibtex`, `--manual-patch`) or replace the signature with a
concise form that indicates more options (e.g., `gpd:build-persona
[focus|--from-current-project|--interview-only|...]`) and add a short note that
`--paper`, `--bibtex`, and `--manual-patch` are valid source-ingestion options
referenced in the usage examples; ensure the signature text and the usage
examples remain consistent.
Research Persona: a memory layer for how a scientist actually thinks
This PR turns
build-personafrom a profile sketch into a serious, inspectable research-persona system. The goal is not just to remember a user’s name or topic area, but to let GPD build a structured model of a researcher’s intellectual terrain: papers, collaborators, tools, mathematical style, experimental instincts, theoretical taste, explanation preferences, privacy boundaries, and recurring research moves.The result is a persona layer that can be used safely by commands and agents without dumping raw user memory into prompts.
What this unlocks
Implementation
research-persona ingest-sourceresearch-persona doppelgangerresearch-persona explain-planresearch-persona taste-checkresearch-persona auditgpd-researcher-doppelgangergpd-expertise-explainergpd-scientific-tastebuild-personaworkflow with a source-ingestion stage and richer approval/apply semantics.Design principles
Verification
uv run ruff check .: passed.uv run python scripts/sync_repo_graph_contract.py --checkuv run python scripts/render_public_surface.py --checkuv run python scripts/render_help_surface.py --checkuv run python scripts/render_bootstrap_installer_metadata.py --checkgit diff --check: passed.13132 passed, 14 skipped, 20 warningsin118.69s.26983776033.Summary by CodeRabbit
Release Notes
New Features
gpd research-personaCLI commands: show, validate, audit, diff, apply-patch, forget, and export-capsule.gpd init build-personacommand to build a candidate research persona profile through a staged workflow with explicit consent and approval.Documentation
Tests