Skip to content

[Feat] CoreSolver 2-Phase 풀이 구현 및 문제 풀이 API 엔드포인트#43

Open
haein45 wants to merge 7 commits into
devfrom
feat/8-core-solver
Open

[Feat] CoreSolver 2-Phase 풀이 구현 및 문제 풀이 API 엔드포인트#43
haein45 wants to merge 7 commits into
devfrom
feat/8-core-solver

Conversation

@haein45
Copy link
Copy Markdown
Collaborator

@haein45 haein45 commented May 19, 2026

📌 관련 이슈

🏷️ PR 타입

  • ✨ 기능 추가 (Feature)
  • 🐛 버그 수정 (Bug Fix)
  • ♻️ 리팩토링 (Refactoring)
  • 📝 문서 수정 (Documentation)
  • 🎨 스타일 변경 (Style)
  • ✅ 테스트 추가 (Test)

📝 작업 내용

  • Preprocessor — @커맨드 파싱, ocr_text/tags 추출
  • Router — Flash LLM + structured output으로 general_chat / math_task 분류
  • Planner — Flash LLM으로 plan / difficulty / selected_model / use_page 결정, use_page 시 page_start SSE 발행
  • PlanExecutor — Command API 단일 라우팅, Send API 병렬 디스패치, 완료 시 CreditSettler 이동
  • GeneralNode — Flash LLM 토큰 스트리밍, token SSE 발행, credit_log 0.5cr 기록
  • CreditSettler — credit_log 합산, credit_settled SSE 발행
  • VideoNode — 스텁 구현 (plan step 상태 done 처리)
  • builder.py — lazy singleton 패턴, PDFNode 등 무거운 임포트를 _build() 내부로 지연 (weasyprint OSError 방지)
  • solve.py — 직접 CoreSolver 호출에서 get_graph().ainvoke(state) 전체 그래프 흐름으로 전환

📸 스크린샷

# 서버 정상 기동
INFO:     Application startup complete.

# 수학 문제 요청 처리 흐름 (rate limit 전까지 정상 동작 확인)
Preprocessor → Router → Planner → PlanExecutor → CoreSolver(Phase1 완료) → Phase2(rate limit)

# lint / 테스트
uv run ruff check .  → All checks passed!
uv run pytest        → 62 passed, 7 skipped

# 실행결과 
(proovy-agent) haeinoh@Haeinui-MacBookAir proovy-agent % curl -N -X POST http://localhost:8000/api/v1/solve -H "Content-Type: application/json" -d '{"problem": "2x + 3 = 5일때 x의 값은?", "user_id":
  "test-user"}'
event: solve_progress
data: {"text": "수학 문제를 분석하고 코드로 검증하는 중입니다..."}

event: tool_start
data: {"name": "code_generate", "label": "검증 코드 생성 중..."}

event: tool_result
data: {"name": "code_generate", "output": "```python\nimport sympy\nx = sympy.symbols('x')\nequation = 2*x + 3 - 5\nresult = sympy.solve(equation, x)\nprint(result[0])\n```"}

event: tool_start
data: {"name": "code_execute", "label": "코드 실행 중..."}

event: tool_result
data: {"name": "code_execute", "output": "stdout:\n1\n", "success": true}

event: solve_progress
data: {"text": "검증 재시도 중... (2/5)"}

: ping - 2026-05-20 09:27:54.267244+00:00

event: token
data: {"content": "알"}

event: token
data: {"content": "았어. 방정식 2x + 3 = 5를 푸는 것을"}

event: token
data: {"content": " 도와줄게.\n\n1. 방정식 2x + 3 = "}

event: token
data: {"content": "5에서 x를 구하려면 먼저 x항을 한쪽에, 상수항을 다른 쪽에 정리해야 해. 이렇게 하려면 방정식 양쪽에서 3을 빼줘"}

event: token
data: {"content": ".\n   2x + 3 - 3 = 5 - 3\n   이렇게 하면 2x = 2가 되지.\n\n2."}

event: token
data: {"content": " 이제 x를 구하려면 방정식 양쪽을 2로 나눠줘.\n   2x / 2 = 2 / 2\n   이렇게 하면 x = 1이 되지.\n\n따라서 방정식 2x + 3 = "}

event: token
data: {"content": "5에서 x의 값은 1이야.\n"}

event: credit_settled
data: {"total": 2.0, "log": [{"node": "core_solver", "action": "llm_call", "model": "flash", "cost": 1.0}, {"node": "core_solver", "action": "code_execute", "model": null, "cost": 1.0}]}

(proovy-agent) haeinoh@Haeinui-MacBookAir proovy-agent % curl -N -X POST http://localhost:8000/api/v1/solve -H "Content-Type: application/json" -d '{"problem": "안녕하세요", "user_id": "test-user"}'
event: token
data: {"content": "안녕하세요"}

event: token
data: {"content": ", 무엇"}

event: token
data: {"content": "을 도와드릴까요?"}
스크린샷 2026-05-20 오후 6 31 33 스크린샷 2026-05-20 오전 2 08 39

✅ 체크리스트

  • 코드 리뷰를 받을 준비가 완료되었습니다
  • 테스트를 작성하고 모두 통과했습니다
  • 코드 스타일 가이드를 준수했습니다
  • 셀프 리뷰를 완료했습니다

📎 기타 참고사항

  • pdf_node/ 코드는 수정하지 않음. 로컬 Mac 환경에서 weasyprint 실행을 위해 brew install pango 필요
  • OpenRouter 무료 티어(Gemini 2.0 Flash) 분당 15회 제한으로 CoreSolver Phase 2에서 rate limit 발생 가능. 크레딧 충전 시 해소됨
  • VideoNode는 별도 브랜치에서 구현 예정

Summary by CodeRabbit

릴리스 노트

  • 새 기능

    • 수학 문제 풀이를 위한 새로운 /solve API 엔드포인트 추가 (Server-Sent Events로 실시간 스트리밍)
    • LLM 기반 의도 분류로 일반 대화와 수학 문제 구분
    • 단계별 계획 생성 및 자동 실행 시스템
    • 코드 생성 및 검증을 통한 수학 문제 해결 프로세스
    • 사용량 추적 및 크레딧 정산 기능
  • 테스트

    • 엔드포인트 및 핵심 기능에 대한 단위 테스트 추가

Review Change Stack

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Warning

Rate limit exceeded

@haein45 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 35 minutes and 23 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 618645e1-f733-479d-9775-40b1e2d2df58

📥 Commits

Reviewing files that changed from the base of the PR and between fe77cf1 and 7a49d1b.

📒 Files selected for processing (9)
  • src/proovy_agent/app/api/v1/solve.py
  • src/proovy_agent/common/llm/client.py
  • src/proovy_agent/graph/agents/core_solver/agent.py
  • src/proovy_agent/graph/builder.py
  • src/proovy_agent/graph/nodes/planner.py
  • tests/app/test_solve_endpoint.py
  • tests/graph/test_builder.py
  • tests/graph/test_credit_settler.py
  • tests/graph/test_plan_executor.py
📝 Walkthrough

워크스루

POST /api/v1/solve 요청을 ProovyState로 초기화해 LangGraph를 실행합니다. 도구(code_generate/code_execute)로 Phase 1 검증을 반복하고 Phase 2에서 검증된 결과를 LLM 스트리밍으로 SSE 전송하며, executor/emitter 컨텍스트와 크레딧 기록을 함께 처리합니다.

변경 사항

전체 문제 풀이 시스템

Layer / File(s) Summary
실행 컨텍스트 변수 설정
src/proovy_agent/common/sandbox/executor_var.py, src/proovy_agent/common/sse/context.py
현재 샌드박스 current_executor와 SSE current_emitterContextVar로 정의합니다.
코드 생성 및 실행 도구
src/proovy_agent/graph/tools/code_generate.py, src/proovy_agent/graph/tools/code_execute.py, src/proovy_agent/graph/tools/__init__.py
code_generate는 Flash LLM으로 검증용 Python 코드를 생성하고 SSE 이벤트를 발행하며, code_execute는 Daytona 샌드박스에서 코드를 비동기 실행하고 실행 결과 이벤트를 발행합니다. 패키지 __all__을 통해 두 도구를 공개합니다.
2단계 CoreSolver 에이전트
src/proovy_agent/graph/agents/core_solver/agent.py
Phase 1에서 도구 결합 LLM+도구 반복 검증(최대 반복), Phase 2에서 검증 요약 기반 LLM 스트리밍 설명을 수행하며 executor 수명주기, current_executor 설정/복구, SSE progress/error 이벤트, credit_log 기록을 포함합니다.
그래프 빌더 및 실행 노드
src/proovy_agent/graph/builder.py, src/proovy_agent/graph/nodes/*
get_graph()/빌더와 router, preprocessor, planner, plan_executor, general_node, video_node, credit_settler 노드를 추가하여 LangGraph 실행 경로와 조건부 분기를 구성합니다.
문제 풀이 API 엔드포인트
src/proovy_agent/app/schemas/solve.py, src/proovy_agent/app/api/v1/solve.py, src/proovy_agent/app/api/v1/router.py
SolveRequest 스키마를 추가하고 POST /api/v1/solve에서 SSE EventSourceResponse를 즉시 반환하도록 구현했습니다. 엔드포인트는 emitter 컨텍스트를 설정하고 백그라운드에서 그래프 실행을 수행합니다.
테스트 및 검증
tests/app/*, tests/graph/*
solve 엔드포인트와 core_solver, builder, plan_executor, credit_settler 등 주요 로직에 대한 단위/통합 테스트가 추가되었습니다.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

준원

"🐰 작은 당근 들고 왔네,
문제 읽고 코드를 빚어내며,
실행 소리로 검증을 맞추고,
토큰이 SSE로 춤추네—
토끼가 깡충, 잘했어! 🥕"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR 제목이 주요 변경사항인 CoreSolver 2-Phase 풀이 구현과 /api/v1/solve 엔드포인트를 명확하게 요약합니다.
Description check ✅ Passed PR 설명이 템플릿 모든 필수 섹션(관련 이슈, PR 타입, 작업 내용, 체크리스트)을 완성하고 구체적인 구현 상세, 테스트 결과, 스크린샷을 포함합니다.
Linked Issues check ✅ Passed 코드 변경사항이 #44의 모든 주요 목표를 충족합니다: Preprocessor/Router/Planner/PlanExecutor/GeneralNode/CoreSolver/CreditSettler/VideoNode 구현, Command/Send API 활용, SSE 이벤트 표준화, 테스트 완료.
Out of Scope Changes check ✅ Passed 모든 변경사항이 #44 요구사항 범위 내에 있으며, 미구현 항목(VLM OCR, 크레딧 예약, 환불, image_generate, GET threads, VideoNode 실제 구현)은 의도적으로 Phase 2로 제외되었습니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/8-core-solver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@haein45 haein45 changed the title feat: CoreSolver 2-Phase 풀이 구현 및 문제 풀이 API 엔드포인트 [Feat] CoreSolver 2-Phase 풀이 구현 및 문제 풀이 API 엔드포인트 May 19, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/proovy_agent/app/api/v1/solve.py`:
- Around line 42-57: The current_emitter token is set in the handler context but
reset inside the background task, causing ValueError; move the
current_emitter.set(emitter) call into the async _run() function (so set and
current_emitter.reset(token) happen in the same context), capture the returned
token inside _run, and call current_emitter.reset(token) in the _run() finally
block alongside await emitter.close(); ensure any references to token outside
_run are removed and that emitter.emit/error/close remain inside _run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c811b74e-a4cf-496b-a421-e4dfeeeeaa6c

📥 Commits

Reviewing files that changed from the base of the PR and between 266fdd3 and 4f7e070.

📒 Files selected for processing (9)
  • src/proovy_agent/app/api/v1/router.py
  • src/proovy_agent/app/api/v1/solve.py
  • src/proovy_agent/app/schemas/solve.py
  • src/proovy_agent/common/sandbox/executor_var.py
  • src/proovy_agent/common/sse/context.py
  • src/proovy_agent/graph/agents/core_solver/agent.py
  • src/proovy_agent/graph/tools/__init__.py
  • src/proovy_agent/graph/tools/code_execute.py
  • src/proovy_agent/graph/tools/code_generate.py

Comment thread src/proovy_agent/app/api/v1/solve.py Outdated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@haein45
Copy link
Copy Markdown
Collaborator Author

haein45 commented May 19, 2026

@coderabbitai 지금 확인 가능해?

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

The PR description images show a terminal demonstrating a successful end-to-end test of the /api/v1/solve SSE endpoint. The first image shows the full pipeline: a page_start event, a solve_progress event ("수학 문제를 분석하고 코드로 검증하는 중입니다..."), tool_start/tool_result events for code_generate (generating Python verification code) and code_execute (running the code with stdout output), and then another solve_progress showing iteration "2/5". The second and third images show the subsequent token events streaming the Korean explanation of the quadratic equation x² + 3x + 2 = 0 step by step, ending with the final answer x = -1, x = -2.


@haein45 네, 지금 바로 확인 가능해요! 🐰

PR #43 리뷰를 시작할게요.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Preprocessor / Router / Planner / PlanExecutor / GeneralNode /
CreditSettler / VideoNode(스텁) 노드 구현 및 LangGraph StateGraph 빌드.
POST /api/v1/solve 엔드포인트를 전체 그래프 흐름으로 전환.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/proovy_agent/graph/builder.py`:
- Around line 12-20: The declared return types for get_graph() and _build() are
incorrect: they are annotated as StateGraph but actually return the result of
builder.compile(), which is a CompiledStateGraph; update the function signatures
and any _graph variable/type annotation to use CompiledStateGraph (and add an
import for CompiledStateGraph if missing) so the type hints match the
builder.compile() return type; ensure references to get_graph, _build, _graph,
and builder.compile are updated accordingly.

In `@src/proovy_agent/graph/nodes/plan_executor.py`:
- Around line 31-33: The early return in plan_executor that does "if not ready:
return Command(goto='credit_settler')" can cause premature transition while
other parallel branches are still in "running"; change the readiness check in
plan_executor to verify that there are no remaining running steps across
parallel branches (not just absence of new pending steps) before returning
Command(goto='credit_settler'). Specifically, augment the logic that uses the
ready flag to also inspect the aggregate state of in-flight steps (e.g., any
running/pending entries in the plan or step status store) and only emit
Command(goto='credit_settler') when all branches/steps are in terminal states;
otherwise keep execution suspended (return None or a retry command) so the
executor waits for running branches to finish.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f18a49e3-0224-4b0a-9d15-f0f5e1587353

📥 Commits

Reviewing files that changed from the base of the PR and between 4f7e070 and 8d74955.

📒 Files selected for processing (9)
  • src/proovy_agent/app/api/v1/solve.py
  • src/proovy_agent/graph/builder.py
  • src/proovy_agent/graph/nodes/credit_settler.py
  • src/proovy_agent/graph/nodes/general_node.py
  • src/proovy_agent/graph/nodes/plan_executor.py
  • src/proovy_agent/graph/nodes/planner.py
  • src/proovy_agent/graph/nodes/preprocessor.py
  • src/proovy_agent/graph/nodes/router.py
  • src/proovy_agent/graph/nodes/video_node.py
✅ Files skipped from review due to trivial changes (1)
  • src/proovy_agent/graph/nodes/video_node.py

Comment thread src/proovy_agent/graph/builder.py Outdated
Comment thread src/proovy_agent/graph/nodes/plan_executor.py
builder.py: get_graph/_build/_graph 반환 타입을 CompiledStateGraph로 수정
plan_executor: running 상태 step 존재 시 credit_settler 조기 전환 방지

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@chowon442 chowon442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 현재 테스트 목록에서 plan_executor, core_solver, solve_endpoint, credit_settler, get_graph 관련 테스트가 없습니다. 이번 작업에 대한 테스트 추가 부탁드립니다.

  • Planner structured output에서 selected_model을 LLM이 직접 반환하게 하는 것은 운영 정책과 도메인 판단이 섞이는 구조로 보입니다.

    • LLM은 difficulty만 반환하고, difficulty -> selected_model 매핑은 코드 또는 config/env에서 관리하는 편이 더 안전합니다. 모델 alias나 실제 모델은 비용/성능/운영 정책에 따라 바뀔 수 있으므로 프롬프트와 schema에 고정하지 않는 게 좋을 것 같아요!
    • 그리고 이건 별로 중요한 건 아닌데 difficulty 관련 값에 Enum을 써도 괜찮을 것 같아요. (지금도 괜찮긴 해요)

Comment on lines +49 to +59
return [
Send(
_ACTION_TO_NODE[step.action],
{
**state.model_dump(),
"plan": [s.model_dump() for s in plan],
"executing_step_idx": idx,
},
)
for idx, step in ready
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlanExecutor 병렬 경로가 LangGraph 런타임에서 실패합니다.

  • plan_executor 노드가 list[Send]를 직접 반환합니다.
  • 현재 LangGraph에서 노드 반환값은 state update dict 또는 Command여야 합니다.
  • Send는 conditional edge에서 반환하거나, 노드에서는 Command(goto=[Send(...), ...]) 형태로 감싸야 합니다.

Comment on lines +37 to +43
if len(ready) == 1:
idx, step = ready[0]
plan[idx] = step.model_copy(update={"status": "running"})
return Command(
update={"plan": plan, "executing_step_idx": idx},
goto=_ACTION_TO_NODE[step.action],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan_executor는 단일 ready step을 running으로 바꾼 뒤 _ACTION_TO_NODE[step.action]로 라우팅합니다.

  • PDF 요청의 경우 step.action == "pdf"이므로 pdf_node로 이동합니다.
  • 그런데 현재 구현된 PDFNode는 실행 후 해당 step을 done으로 갱신하지 않아, 다시 plan_executor로 돌아왔을 때 PDF step이 계속 running으로 남습니다.
  • 직접 pull을 받아서 수정해주시거나, @gaeunee2 님에게 전달 부탁드립니다. (PDFNode를 이 그래프 프로토콜에 맞게 감싸거나 수정해서 executing_step_idx의 step을 done/error로 업데이트해야 합니다.)

Comment on lines +71 to +132
async def _phase1_verify(
state: ProovyState,
emitter: object | None,
) -> tuple[str, list, int]:
"""Phase 1: 내부 풀이 + 코드 검증. (verified_summary, new_messages, execute_count) 반환."""
llm = get_llm(state.selected_model)
llm_with_tools = llm.bind_tools(_TOOLS)

system_msg = SystemMessage(_build_verify_system(state))
messages: list = list(state.messages)
execute_count = 0

if emitter:
await emitter.emit(
"solve_progress",
{"text": "수학 문제를 분석하고 코드로 검증하는 중입니다..."},
)

for iteration in range(_MAX_ITERATIONS):
trimmed = _trim_tool_messages(messages)
response = await llm_with_tools.ainvoke([system_msg, *trimmed])
messages.append(response)

if not response.tool_calls:
# 도구 호출 없음 → LLM이 검증 완료 판단
content = response.content
if isinstance(content, list):
content = "".join(
block.get("text", "") if isinstance(block, dict) else str(block)
for block in content
)
return str(content), messages, execute_count

# 도구 호출 실행
for tool_call in response.tool_calls:
tool_name = tool_call["name"]
tool = _TOOLS_BY_NAME.get(tool_name)
if tool is None:
result = f"Unknown tool: {tool_name}"
else:
try:
result = await tool.ainvoke(tool_call["args"])
if tool_name == "code_execute":
execute_count += 1
except Exception as exc:
result = f"Tool error: {exc}"

messages.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))

if emitter and iteration > 0:
await emitter.emit(
"solve_progress",
{"text": f"검증 재시도 중... ({iteration + 1}/{_MAX_ITERATIONS})"},
)

# 최대 반복 도달 — 마지막 AI 메시지를 결과로 사용
last_ai = next(
(m for m in reversed(messages) if isinstance(m, AIMessage) and not m.tool_calls),
None,
)
summary = str(last_ai.content) if last_ai else "검증 결과를 확인하지 못했습니다."
return summary, messages, execute_count
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreSolver가 실제 코드 검증 없이도 "검증된 풀이"로 설명을 생성할 수 있습니다.

  • 도구 호출이 없으면 곧바로 검증 완료로 간주합니다.
  • 최대 반복에 도달해도 실패로 멈추지 않고 summary를 만들어 Phase 2로 갑니다. 그리고 Phase 2는 무조건 실행됩니다.
  • 의도하신 동작인지 확인 부탁드립니다. 서비스의 핵심 목적이 "코드 실행으로 검증된 풀이만 사용자에게 노출"이므로, 최소 1회 이상의 성공한 code_execute를 추적하고 실패 시 최종 설명 생성 대신 명시적 실패 상태로 가는 방향이 좋을까요?

Comment thread src/proovy_agent/app/api/v1/solve.py Outdated
Comment on lines +51 to +55
task: asyncio.Task[None] = asyncio.create_task(_run())
_active_tasks.add(task)
task.add_done_callback(_active_tasks.discard)

return EventSourceResponse(emitter.stream())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE 연결 종료 후에도 그래프/LLM/샌드박스 작업이 계속 실행될 수 있습니다!

  • 엔드포인트가 producer를 EventSourceResponse 생명주기 밖의 background task로 분리합니다.
  • 클라이언트가 끊겨도 _run()은 취소되지 않습니다. 게다가 큐가 bounded라 consumer가 사라진 뒤 emit이 누적되면 producer가 막히고 cleanup까지 못 갈 수 있습니다.
  • EventSourceResponsedata_sender_callable/close handler를 쓰거나, disconnect 시 task를 cancel하고 SandboxManager.destroy_executor()까지 도달하는지 보장해야 합니다.
  • 어떻게 처리하는게 좋을지 검토 후, SSE 연결 종료 시에 대한 처리 방향을 문서에도 반영 부탁드립니다.

Comment on lines +199 to +206
credit_entries.append(
CreditEntry(
node="core_solver",
action="llm_call",
model=state.selected_model,
cost=model_cost,
)
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

크레딧 정산이 실제 LLM 호출 수를 크게 과소계산합니다.

  • CoreSolver는 전체 실행에 대해 LLM credit을 한 번만 기록합니다. 하지만 실제로는 Phase 1 반복 호출, Phase 2 streaming 호출, code_generate 내부 Flash 호출이 추가로 발생합니다.
  • 또한 Router/Planner의 Flash 호출도 현재 credit_log에 들어가지 않습니다.
  • CreditSettler를 붙인 PR이므로 실제 호출 단위로 중앙 accounting wrapper를 두는 편이 안전합니다. 다른 방식도 고려해보시고 좋은 방향으로 진행 부탁드립니다!

- solve.py: SSE 스트림 disconnect 시 백그라운드 태스크 취소 처리
- builder.py: CompiledStateGraph 타입 annotation 수정, PDFNode wrapper 추가
- core_solver: _code_execute_succeeded 헬퍼, llm_call_count/codegen_count 세분화 추적
- plan_executor: Command(goto=[Send(...)]) LangGraph 2.0 API 준수, running 가드 추가
- planner/router: credit_log 반환 및 page_start SSE 이벤트 추가
- 테스트: core_solver, plan_executor, credit_settler, builder, solve_endpoint 단위 테스트 추가

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/proovy_agent/graph/nodes/planner.py (1)

51-63: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

math_task 경로에서는 solve step을 후처리로 보장해야 합니다.

planner()가 LLM 출력을 그대로 신뢰해서 video/pdf만 반환하거나 steps=[]를 받으면, plan_executor는 실행할 step 없이 바로 credit_settler로 끝납니다. 이 경로에서는 최소 1개의 solve step을 강제하거나, 없으면 명시적으로 실패시키는 검증이 필요합니다.

수정 예시
     plan = [PlanStep(action=s.action, description=s.description) for s in result.steps]
+    if not any(step.action == "solve" for step in plan):
+        raise ValueError("planner must include at least one solve step for math_task")
+
     selected_model = _DIFFICULTY_TO_MODEL[result.difficulty]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/proovy_agent/graph/nodes/planner.py` around lines 51 - 63, In the
planner() function ensure the "math_task" path always yields at least one
PlanStep with action "solve": after building plan from result.steps (and before
returning/credit logging and before plan_executor runs), check result.path or
state.path for "math_task" and verify any(s.action == "solve" for s in
result.steps); if none, either append a PlanStep(action="solve",
description="Solve the math problem") to plan or raise a clear validation error
so plan_executor/credit_settler cannot proceed with an empty or non-solving
plan; update the return to reflect the adjusted plan and credit behavior.
tests/graph/test_plan_executor.py (1)

21-88: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

테스트 함수들에 -> None을 일괄 추가해 주세요.

Line 21~88의 테스트 함수 선언에서 반환 타입 힌트가 빠져 있습니다.

제안 수정안
-def test_solve_has_no_deps():
+def test_solve_has_no_deps() -> None:
@@
-def test_pdf_waits_for_solve():
+def test_pdf_waits_for_solve() -> None:
@@
-def test_pdf_ready_after_solve_done():
+def test_pdf_ready_after_solve_done() -> None:
@@
-def test_video_and_pdf_both_ready_after_solve():
+def test_video_and_pdf_both_ready_after_solve() -> None:
@@
-def test_running_step_is_not_ready():
+def test_running_step_is_not_ready() -> None:
@@
 `@pytest.mark.asyncio`
-async def test_all_done_routes_to_credit_settler():
+async def test_all_done_routes_to_credit_settler() -> None:
@@
 `@pytest.mark.asyncio`
-async def test_single_step_routes_to_correct_node():
+async def test_single_step_routes_to_correct_node() -> None:
@@
 `@pytest.mark.asyncio`
-async def test_parallel_steps_wrapped_in_command():
+async def test_parallel_steps_wrapped_in_command() -> None:
@@
 `@pytest.mark.asyncio`
-async def test_running_step_does_not_transition_to_credit_settler():
+async def test_running_step_does_not_transition_to_credit_settler() -> None:

As per coding guidelines Use Python 3.12+ with mandatory type hints in all code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/graph/test_plan_executor.py` around lines 21 - 88, Add explicit return
type hints "-> None" to every test function in this block (functions like
test_solve_has_no_deps, test_pdf_waits_for_solve,
test_pdf_ready_after_solve_done, test_video_and_pdf_both_ready_after_solve,
test_running_step_is_not_ready, test_all_done_routes_to_credit_settler,
test_single_step_routes_to_correct_node, test_parallel_steps_wrapped_in_command,
test_running_step_does_not_transition_to_credit_settler) so each def line
declares a return type; do this for both sync tests and async tests (keep the
`@pytest.mark.asyncio` decorators and async def for plan_executor-related tests)
to comply with the project’s mandatory type-hint rule.
🧹 Nitpick comments (2)
tests/app/test_solve_endpoint.py (1)

31-42: ⚡ Quick win

이 테스트는 아직 스트림 producer를 검증하지 못합니다.

지금은 헤더만 확인해서 _run()이 시작되지 않거나 get_graph().ainvoke()가 호출되지 않아도 통과할 수 있습니다. 최소 한 이벤트를 소비하고 mock_graph.ainvoke.assert_awaited_once()까지 확인해 두는 게 안전합니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/app/test_solve_endpoint.py` around lines 31 - 42, Update the
test_valid_request_returns_sse_stream to actually consume at least one SSE event
and assert the mocked graph was awaited: after posting to "/api/v1/solve"
(keeping the existing get_graph patch and mock_graph with mock_graph.ainvoke =
AsyncMock()), iterate the response streaming body (e.g. use
response.iter_lines() / response.iter_text() or another TestClient streaming
iterator) and read the first non-empty SSE line/event, then assert
mock_graph.ainvoke.assert_awaited_once() in addition to the existing status and
content-type checks; ensure you reference the test function name
test_valid_request_returns_sse_stream, the patched get_graph, and
mock_graph.ainvoke when making these assertions.
src/proovy_agent/app/api/v1/solve.py (1)

47-49: ⚡ Quick win

같은 실패에 error SSE가 두 번 나갈 수 있습니다.

core_solver()는 검증 실패 시 이미 error 이벤트를 보낸 뒤 예외를 다시 올립니다. 여기서 generic error를 한 번 더 emit하면 클라이언트는 동일 실패를 두 개의 오류 이벤트로 받게 됩니다. 사용자에게 이미 전달된 예외는 중복 emit을 피할 수 있게 구분해 두는 편이 좋습니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/proovy_agent/app/api/v1/solve.py` around lines 47 - 49, core_solver()
already emits an "error" SSE for validation failures and re-raises, so the
generic except block in solve.py currently causes duplicate SSEs; change the
except clause to capture the exception (except Exception as e:) and only call
await emitter.emit("error", ...) if the exception does not carry a marker
indicating the error was already emitted (e.g., check hasattr(e, "sse_emitted")
or getattr(e, "sse_emitted", False) == False); also ensure core_solver() sets
that marker on exceptions it emits (e.g., set e.sse_emitted = True) so the
solve.py handler can reliably skip re-emitting, while still calling
logger.exception("solve 실행 중 오류 발생") and letting the exception propagate as
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/proovy_agent/graph/agents/core_solver/agent.py`:
- Around line 71-73: The current _code_execute_succeeded function only treats
"exit_code: 1" as failure, causing false positives for other non-zero exit
codes; update _code_execute_succeeded to parse the numeric exit_code from the
result (e.g., with a regex for "exit_code:\s*(-?\d+)") and return False if an
exit_code is present and not equal to 0, and also return False if the result
contains "error:" (case-insensitive); if no exit_code is present, fall back to
the existing "error:" check to decide success.

In `@src/proovy_agent/graph/builder.py`:
- Around line 62-68: The current broad except Exception around importing and
instantiating PDFNode hides real runtime/initialization bugs; change the handler
to only catch import-related failures (e.g., except (ImportError,
ModuleNotFoundError) as e) so that failures inside PDFNode() bubble up, and
update the warning log (logging.getLogger(__name__).warning) to include the
exception details; keep the fallback to _pdf_stub and keep using
_pdf_step_done_wrapper(PDFNode()) for the normal path.

In `@tests/app/test_solve_endpoint.py`:
- Around line 45-56: The test test_thread_id_auto_generated currently only
checks status 200 but must verify that a thread_id was auto-generated and passed
to the graph; after posting via client.post (using the patched get_graph
returning mock_graph), assert mock_graph.ainvoke was awaited and inspect the
arguments it was called with to ensure the passed-in state's thread_id is
present/non-empty (e.g., access the first await call on mock_graph.ainvoke and
assert the state's thread_id is not None/empty); update the test to perform this
check in addition to the existing status_code assertion.

In `@tests/graph/test_builder.py`:
- Around line 15-28: Add explicit return type annotations "-> None" to the two
test functions so they comply with the project type-hint rule: update the defs
for test_get_graph_builds_only_once and test_get_graph_returns_build_result to
include "-> None" after the parameter list (i.e., def
test_get_graph_builds_only_once() -> None: and def
test_get_graph_returns_build_result() -> None:) while keeping the existing
bodies and mocks unchanged.

In `@tests/graph/test_credit_settler.py`:
- Around line 15-51: Four async test functions (test_sums_total_credit,
test_emits_credit_settled_event, test_no_emitter_does_not_raise,
test_empty_log_total_is_zero) are missing the required return type hint; update
each function definition to include the explicit return type annotation "->
None" on the async def signature (e.g., async def test_sums_total_credit(...) ->
None:) so they comply with the Python 3.12+ mandatory type-hinting guideline.

---

Outside diff comments:
In `@src/proovy_agent/graph/nodes/planner.py`:
- Around line 51-63: In the planner() function ensure the "math_task" path
always yields at least one PlanStep with action "solve": after building plan
from result.steps (and before returning/credit logging and before plan_executor
runs), check result.path or state.path for "math_task" and verify any(s.action
== "solve" for s in result.steps); if none, either append a
PlanStep(action="solve", description="Solve the math problem") to plan or raise
a clear validation error so plan_executor/credit_settler cannot proceed with an
empty or non-solving plan; update the return to reflect the adjusted plan and
credit behavior.

In `@tests/graph/test_plan_executor.py`:
- Around line 21-88: Add explicit return type hints "-> None" to every test
function in this block (functions like test_solve_has_no_deps,
test_pdf_waits_for_solve, test_pdf_ready_after_solve_done,
test_video_and_pdf_both_ready_after_solve, test_running_step_is_not_ready,
test_all_done_routes_to_credit_settler, test_single_step_routes_to_correct_node,
test_parallel_steps_wrapped_in_command,
test_running_step_does_not_transition_to_credit_settler) so each def line
declares a return type; do this for both sync tests and async tests (keep the
`@pytest.mark.asyncio` decorators and async def for plan_executor-related tests)
to comply with the project’s mandatory type-hint rule.

---

Nitpick comments:
In `@src/proovy_agent/app/api/v1/solve.py`:
- Around line 47-49: core_solver() already emits an "error" SSE for validation
failures and re-raises, so the generic except block in solve.py currently causes
duplicate SSEs; change the except clause to capture the exception (except
Exception as e:) and only call await emitter.emit("error", ...) if the exception
does not carry a marker indicating the error was already emitted (e.g., check
hasattr(e, "sse_emitted") or getattr(e, "sse_emitted", False) == False); also
ensure core_solver() sets that marker on exceptions it emits (e.g., set
e.sse_emitted = True) so the solve.py handler can reliably skip re-emitting,
while still calling logger.exception("solve 실행 중 오류 발생") and letting the
exception propagate as needed.

In `@tests/app/test_solve_endpoint.py`:
- Around line 31-42: Update the test_valid_request_returns_sse_stream to
actually consume at least one SSE event and assert the mocked graph was awaited:
after posting to "/api/v1/solve" (keeping the existing get_graph patch and
mock_graph with mock_graph.ainvoke = AsyncMock()), iterate the response
streaming body (e.g. use response.iter_lines() / response.iter_text() or another
TestClient streaming iterator) and read the first non-empty SSE line/event, then
assert mock_graph.ainvoke.assert_awaited_once() in addition to the existing
status and content-type checks; ensure you reference the test function name
test_valid_request_returns_sse_stream, the patched get_graph, and
mock_graph.ainvoke when making these assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22600553-34c1-43fa-a559-ec251437f348

📥 Commits

Reviewing files that changed from the base of the PR and between 8d74955 and fe77cf1.

📒 Files selected for processing (13)
  • src/proovy_agent/app/api/v1/solve.py
  • src/proovy_agent/graph/agents/core_solver/agent.py
  • src/proovy_agent/graph/builder.py
  • src/proovy_agent/graph/nodes/plan_executor.py
  • src/proovy_agent/graph/nodes/planner.py
  • src/proovy_agent/graph/nodes/router.py
  • tests/app/test_solve_endpoint.py
  • tests/graph/__init__.py
  • tests/graph/agents/__init__.py
  • tests/graph/agents/test_core_solver.py
  • tests/graph/test_builder.py
  • tests/graph/test_credit_settler.py
  • tests/graph/test_plan_executor.py

Comment thread src/proovy_agent/graph/agents/core_solver/agent.py Outdated
Comment thread src/proovy_agent/graph/builder.py
Comment thread tests/app/test_solve_endpoint.py
Comment thread tests/graph/test_builder.py Outdated
Comment thread tests/graph/test_credit_settler.py
haein45 and others added 2 commits May 20, 2026 23:27
- gemini-2.0-flash → gemini-2.5-flash: 구조화 출력 정확도 개선
- claude-opus-4-5 → claude-sonnet-4-5: 도구 호출 속도 3-4배 개선, 품질 동등

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _code_execute_succeeded: exit_code 정규식으로 0 외 모든 비정상 코드 감지
- builder.py: PDFNode 임포트 오류만 잡도록 except 범위 축소, 오류 메시지 포함
- solve.py + core_solver: sse_emitted 마커로 중복 SSE error 이벤트 방지
- planner.py: LLM이 solve step 누락 시 자동 삽입 (plan_executor 안전성 보장)
- 테스트: -> None 반환 타입 힌트 추가, ainvoke 호출 및 thread_id 검증 추가

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@haein45
Copy link
Copy Markdown
Collaborator Author

haein45 commented May 21, 2026

모두 수정해서 반영했습니다!

@haein45 haein45 requested a review from chowon442 May 22, 2026 00:42
Copy link
Copy Markdown
Member

@chowon442 chowon442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/proovy_agent/common/sse/emitter.py

await self._queue.put(None)

disconnect 이후 consumer가 사라진 상태에서 큐가 가득 차 있으면 close()가 여기서 대기할 수 있습니다.

SSEEmitter 큐는 bounded queue인데, token stream처럼 이벤트가 많이 쌓인 상태에서 _stream() 소비가 중단되면 sentinel을 넣는 put(None)도 block됩니다. 그러면 _run()의 finally가 끝나지 않아 cleanup 완료가 불안정해집니다.

최소 재현에서도 256개 이벤트로 큐를 채운 뒤 close()를 호출하면 timeout이 납니다. disconnect cleanup 보장을 위해 close는 non-blocking 방식으로 sentinel을 넣거나, queue drain/drop 정책을 명확히 해야 할 것 같습니다.


async with self._close_lock:
    if self._closed:
        logger.debug("emit() 무시됨 — 이미 닫힌 이미터 (event=%s)", event)
        return
    await self._queue.put(SSEEvent(event=event, data=data))

emit()도 lock을 잡은 상태에서 bounded queue put()을 기다립니다. 큐가 full이면 producer가 여기서 멈추고, close()도 같은 lock을 기다리게 되어 disconnect/cleanup 경로가 더 취약해집니다.

SSE는 best-effort 실시간 전달 성격이 강하므로, full queue에서 backpressure를 줄지/drop할지/timeout 처리할지 정책을 명확히 하는 게 필요해 보입니다.

Comment on lines +62 to +68
try:
from proovy_agent.graph.nodes.pdf_node.pdf_node import PDFNode

pdf_node = _pdf_step_done_wrapper(PDFNode())
except (ImportError, ModuleNotFoundError) as e:
logging.getLogger(__name__).warning("PDFNode 로드 실패 — 스텁으로 대체합니다. 원인: %s", e)
pdf_node = _pdf_stub
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broad Exception을 줄인 방향은 맞는데, #44의 Builder 요구사항에는 weasyprint native dependency 문제로 인한 OSError 방지가 포함되어 있습니다.

현재는 import 계열만 fallback 처리해서, WeasyPrint/Cairo/Pango 같은 native library 로딩 실패가 OSError로 발생하는 환경에서는 graph build 자체가 실패할 수 있습니다.

PDFNode() 내부 로직 버그는 숨기지 않되, import/native loading 단계에서 발생하는 OSError는 fallback 대상으로 포함하는 방식이 안전해 보입니다.

Comment on lines +62 to +65
finally:
# 클라이언트 disconnect 시 백그라운드 태스크 취소
if not task.done():
task.cancel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이 정책은 이전 SSE 이슈의 방향과 충돌할 수 있어 확인이 필요합니다.

#9는 SSE 연결이 끊겨도 messages를 durable source로 두고 복원할 수 있다는 방향을 설명합니다. 그런데 현재 구현은 disconnect 시 graph task를 cancel하므로 풀이 진행 자체가 중단됩니다.

의도적으로 "disconnect = solve cancel"로 정책을 바꾼 것이라면 문서/이슈 설명도 같이 맞춰야 하고, 아니라면 SSE stream 수명주기와 graph 실행 수명주기를 분리해야 할 것 같습니다.

Comment on lines +49 to +62
return Command(
update={"plan": plan},
goto=[
Send(
_ACTION_TO_NODE[step.action],
{
**state.model_dump(),
"plan": [s.model_dump() for s in plan],
"executing_step_idx": idx,
},
)
for idx, step in ready
],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이 병렬 경로는 실제 graph 실행에서 깨질 수 있습니다.

Send payload가 dict라서 video_node/pdf_node가 받는 stateProovyState가 아니게 됩니다. 해당 노드들은 state.plan, state.executing_step_idx처럼 attribute access를 사용하므로 병렬 실행 시 AttributeError: 'dict' object has no attribute 'plan'가 발생합니다.

또 payload를 ProovyState로 바꾸더라도 videopdf 브랜치가 같은 superstep에서 둘 다 plan을 업데이트하면, 현재 plan 필드는 reducer가 없는 plain list라 LangGraph가 concurrent update를 병합하지 못합니다.

실제 병렬 fan-out을 graph로 실행하는 회귀 테스트를 추가하고, plan 업데이트 병합 방식도 같이 정리해야 할 것 같습니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat] 전체 그래프 파이프라인 구현 (Preprocessor → Router → Planner → PlanExecutor → CoreSolver → CreditSettler)

2 participants