Improve Foresight demo replay evidence and profiles#685
Open
liyizhouAI wants to merge 36 commits into
Open
Conversation
- Merge 41 upstream commits (i18n for 7 languages, security fixes, new features) - Rebrand all MiroFish references to Foresight/先见之明 across 37 files - Re-apply dark theme CSS overhaul (pure black/gray, no blue tints) - Re-apply Teleport-based theme toggle (inline with brand, 20px gap) - Restore Foresight logo and favicon - Update GitHub links to liyizhouAI/foresight - Update locale files, README, package.json Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Override inner container max-width constraints (.main-content, .dashboard-section, .content-area, .main-content-area, .panel-wrapper) so dark mode content fills widescreen displays like light mode does natively. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Replace Zep Cloud ($25/mo) with open-source Graphiti + Neo4j: - New graphiti_client.py: unified wrapper with sync bridge for async Graphiti - Modified graph_builder.py: use Graphiti add_episode instead of Zep batch API - Modified zep_entity_reader.py: Neo4j Cypher queries replace Zep pagination - Modified zep_tools.py: Graphiti search replaces Zep Cloud search - Modified zep_graph_memory_updater.py: Graphiti add_episode replaces Zep add - Modified oasis_profile_generator.py: GraphitiClient replaces Zep client - Updated config.py: NEO4J_URI/USER/PASSWORD replace ZEP_API_KEY - Updated requirements.txt: graphiti-core + neo4j replace zep-cloud - Added PRD.md: product requirements document with token analysis Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- MiniMaxEmbedder: custom EmbedderClient for MiniMax's non-OpenAI-compatible embedding API (uses 'texts' field instead of 'input') - Pass LLM config to OpenAIRerankerClient to avoid OPENAI_API_KEY requirement - Both embedder and reranker now use MiniMax API credentials Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Graphiti LLM: SiliconFlow Qwen/Qwen2.5-7B-Instruct (free, supports structured output) - Graphiti Embedding: SiliconFlow BAAI/bge-m3 (free, OpenAI-compatible) - MiniMax Coding Plan doesn't support structured output needed by Graphiti - Separate GRAPHITI_LLM_* config from main LLM_* config - Remove _wait_for_episodes call (Graphiti processes synchronously) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Upgrade from Qwen2.5-7B to Qwen2.5-32B-Instruct (faster, better quality) - Add real-time progress messages: "正在处理第 X/Y 个文本块" - Log per-episode processing time - Expand progress range from 15-55% to 15-90% (no more episode polling step) - Add GRAPHITI_LLM_* separate config for graph building LLM Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Add retry/home buttons when errors occur in graph building - Improve error messages: Network Error, timeout, missing files - Clear guidance for users when page refresh loses upload state - API endpoint switched from HTTP to HTTPS (api.foresight.yizhou.chat) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Convert Neo4j DateTime objects to strings via _safe_str() helper - Query all edge types (RELATES_TO + MENTIONS), not just RELATES_TO - Fix get_node_edges to match any relationship type Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Remove CORS headers from nginx (let Flask-CORS handle it alone) - Fix Neo4j edge query to match all relationship types (RELATES_TO + MENTIONS) - Convert Neo4j DateTime objects to strings for JSON serialization - Add .serena/, .vercel/, .venv_direct/ to gitignore Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Increase chunk size 500→2000 (reduces chunks from ~125 to ~32 for 60K docs) - Use Graphiti add_episode_bulk for parallel processing (5 episodes per batch) - Fallback to sequential if bulk fails - Target: 60K doc graph build in ~10-15 min instead of 45 min Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Graphiti doesn't assign custom labels like Zep does - all nodes get generic 'Entity' label. Updated filter to accept all named Entity nodes instead of requiring custom labels like 'Person', 'Organization'. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
… button
This is the v0.3 milestone commit before the v0.4 big version push.
Major themes: process replay, runtime stability, cost observability.
## New Features
- **Manus-style process replay** (frontend + backend)
- `GET /api/simulation/<id>/replay` returns full workflow + agents + rounds + aggregate
- `frontend/src/views/SimulationReplayView.vue` 3-column layout (workflow / actions / stats)
- bottom scrubber with play/pause/step + 5 speed levels (0.5x-10x)
- filters out stale actions from previous runs via latest simulation_start timestamp
- **Token usage tracking** (`backend/app/utils/token_tracker.py`)
- process-wide stage→model→tokens counter
- LLMClient auto-records prompt/completion tokens after each call
- stages tagged at API entry: step1_ontology, step2_graph_build, step3_prepare, step5_report
- `GET /api/usage/summary` for live stats + CNY cost estimate
- `GET /api/usage/estimate-simulation` for OASIS subprocess estimation
- pricing table for GLM/SiliconFlow/MiniMax/OpenAI/Anthropic models
- documented as internal-use, removed from customer-facing builds
- **Step 2 "Skip & Continue" button**
- lets user stop profile generation early and proceed with what's already generated
- `simulation_manager.request_accelerate()` + cancel_check in oasis_profile_generator
- new endpoint `POST /api/simulation/prepare/accelerate`
## Critical Bug Fix
- **SIGTERM no longer kills running simulation subprocess**
- root cause: `SimulationRunner.register_cleanup()` registered SIGTERM/SIGINT/SIGHUP handlers
that called `os.killpg` on every tracked sim child, even though spawn already used
`start_new_session=True` to give children isolated sessions
- fix: neutered `register_cleanup` to a no-op; `cleanup_all_simulations` itself preserved
for explicit stop_simulation paths
- validated: killed Flask backend twice, simulation subprocess kept running
- impact: hot-reload backend code without interrupting in-flight simulations
## Performance & Tuning
- semaphore 30 → 100 (twitter + reddit) for higher LLM concurrency
- discovered 200 agents as memory/cost/statistical sweet spot for 8G server
(503 agents OOMs both platforms; 200 agents fits cleanly with 95% confidence margin)
## Documentation
- **PRD.md** rewritten as v0.3 baseline (10 chapters + 2 appendices, 639 lines)
- product positioning across 3 usage modes (one-shot / model-reuse / SaaS)
- v0.4 roadmap: domestic platforms (douyin/wechat/xiaohongshu/weibo), fork sim, multi-tenant
- operational lessons: HF mirror, Tencent PyPI mirror, GLM-4-Flash choice, agent count
- decision log with dates
- per-stage token/cost breakdown for typical 200-agent run
- **README.md / README-ZH.md** updated with replay step + Graphiti+Neo4j
## Files Touched
19 files changed, 1105 insertions(+), 246 deletions(-)
First successful E2E test uncovered 7 blocking bugs. All fixed and deployed to production server. User confirmed full pipeline now runs through. ## Bug Fixes ### #1 LLMClient exponential backoff retry - recognize RateLimitError / 429 / 5xx / 1302 / timeout / connection errors - 5 retries with 1→2→4→8→16s backoff + random jitter - file: backend/app/utils/llm_client.py ### 666ghj#2 GLM → Qwen 32B dual-LLM fallback (ontology generation) - primary LLM (智谱 GLM-4-Flash) retries exhausted → single attempt fallback to SiliconFlow Qwen 2.5-32B via GRAPHITI_LLM_* config - solves GLM low RPM quota intermittent throttling - fallback does NOT retry (avoid cascade) - file: backend/app/services/ontology_generator.py ### 666ghj#3 Qwen 32K context overflow - MAX_TEXT_LENGTH_FOR_LLM 50000 → 28000 chars - ensures prompt+response stays under 32768 tokens - truncated docs get marker line "[文档已截断以适应 LLM 上下文窗口]" - file: backend/app/services/ontology_generator.py ### 666ghj#4 Neo4j entity summary flatten via monkey-patch - Qwen occasionally returns {summary: {value, type, title, description}} which Neo4j rejects (properties must be primitives) - monkey-patch Neo4jEntityNodeOperations.save / save_bulk - recursive _flatten_entity_property extracts .value from nested dicts, json.dumps as string fallback - file: backend/app/services/graphiti_client.py ### 666ghj#5 Frontend stuck at /process/new with no pending state - pendingUpload is in-memory reactive, lost on refresh / direct URL - MainView.handleNewProject early-returned on empty state without navigating, leaving UI permanently in "waiting for ontology" limbo - fix: detect empty state, log redirect msg, router.replace to Home after 800ms - file: frontend/src/views/MainView.vue ### 666ghj#6 Semaphore 100 → 30 (OASIS simulation) - high concurrency triggered GLM rate limit cascade during profile/action LLM calls - drop to 30 eliminates 1302 retries, total runtime impact < 10% - file: backend/scripts/run_parallel_simulation.py ### 666ghj#7 SimulationReplayView redesigned to Manus cinematic style - previous 3-column analyst panel felt like a dashboard, not a replay - new single-column immersive layout matching Manus's "observation window": - top breadcrumb: "Foresight is running Reddit simulation · Round 8/15" - main stage: browser chrome + native-style platform post card (Reddit subreddit header / Twitter tweet header) - action type badges, agent avatars with gradient palettes - stage meta bar: per-round action counts by type - bottom scrubber row: timestamp chip + Jump to live button + live indicator (pulsing green dot when sim is running) + step/play/step buttons + 5 speed levels - bottom task bar: current pipeline step icon/label/progress - dark theme (#0A0A0B base + #FF5722 accent) - analyst mode toggle (◫/▦) preserves original 3-column view - auto-polling every 10s while sim is running, auto-tracks live position - file: frontend/src/views/SimulationReplayView.vue ## Docs - PRD.md bumped to v0.3.1 with full hotfix changelog + decision log entries
…build End of a long debugging rabbit hole. After ~10 attempts patching Graphiti's compatibility with non-OpenAI LLMs (GLM, Qwen via SiliconFlow), every fix uncovered another bug. Made the strategic call to abandon Graphiti entirely for the graph-build path and write a minimal custom builder. ## Why Graphiti had to go - Qwen 32B 32K context overflow on cumulative episode retrieval (60K-82K tokens) - SiliconFlow Qwen 72B also 32K (not 128K as docs implied) - GLM 4 Flash gives 128K context but triggers "20015 parameter invalid" on Graphiti's structured output calls (logprobs in reranker, empty input in embedder, nested dict in Neo4j writes) - Each patch target was 1-2 levels deep in Graphiti internals - 4 separate monkey-patches (EntityNode.save, bulk_utils, driver layer, reranker, embedder) still couldn't cover the extract_nodes path ## New architecture backend/app/services/custom_graph_builder.py (NEW, 348 lines): - Uses Foresight's own LLMClient (retry + GLM→Qwen fallback + token tracker) - 1 LLM call per chunk (Graphiti needed 4-5) - Single prompt extracts entities + relationships as JSON - Writes directly to Neo4j via cypher MERGE (no Graphiti dependency) - Schema identical to what Graphiti produced — downstream get_all_nodes / get_all_edges / zep_entity_reader all work unchanged - ThreadPoolExecutor (10 workers) for concurrent chunk extraction - Sequential Neo4j writes via shared session (avoids name conflicts in dedup) - DDL wrapped in try/except (tolerates pre-existing Graphiti indexes) - Entity name dedup via in-memory map → single uuid per canonical name - Regex-sanitized label / relation type to prevent cypher injection backend/app/api/graph.py: - /api/graph/build endpoint now calls CustomGraphBuilder instead of builder.add_text_batches() → client.add_episodes_batch() (the old Graphiti path) backend/app/services/graph_builder.py: - _build_graph_worker also updated to use CustomGraphBuilder (dead code path but kept in sync) backend/app/models/project.py: - Default chunk_size 500 → 250 (to keep individual LLM prompts small) backend/app/services/graphiti_client.py: - Kept all monkey-patches for backward compat — they now only affect legacy Graphiti code paths that CustomGraphBuilder bypasses entirely: * _patch_neo4j_driver (AsyncSession.run nested-dict sanitize) * _patch_reranker_for_non_openai (GLM logprobs workaround) * _patch_embedder_empty_input (SiliconFlow empty-input guard) * _patch_entity_node_ops (EntityNode.save sanitize) * _patch_add_nodes_and_edges_bulk_tx (bulk-episode sanitize) - These are kept for safety; Graphiti code path is no longer invoked in the production build flow but methods like add_episode still exist on the class ## Performance - Sequential: ~194 chunks × 2-5s = 10-15 min - Concurrent (10 workers): ~194 / 10 × 3-5s = **1-2 min** (5-8x speedup) - Rate limiting handled by LLMClient retry/backoff, not raw thread contention ## Downstream compatibility Verified: - get_all_nodes: MATCH (n:Entity) WHERE n.group_id = $gid RETURN n, labels(n) - get_all_edges: MATCH (a)-[r]->(b) WHERE r.group_id = $gid RETURN r, type(r) - get_node / get_node_edges: MATCH by uuid CustomGraphBuilder writes: - (n:Entity [optional second label]) with uuid, name, summary, group_id, created_at - [r:RELATION_TYPE] with uuid, name, fact, group_id, created_at Schema matches exactly. ## Pipeline wiring Step 1 ontology generation → Step 2 graph build (CustomGraphBuilder) → Step 3 profile generation (ZepEntityReader queries Neo4j) → Step 4 simulation No changes needed downstream of Step 2.
scripts/deploy.sh unifies backend/frontend deployment: - --backend (default): rsync app+scripts → ubuntu@124.223.92.72:/opt/foresight/backend + kill/restart Flask + health check - --frontend: vite build + coscmd upload to COS bucket foresight-1317962478 + tccli CDN purge - --full: both - --no-restart: skip Flask restart - --dry-run: preview only Safety: - Python syntax check before rsync (aborts on error) - health check loop after restart (aborts if /health fails 5x) - rsync --delete excludes __pycache__ / *.pyc / .pytest_cache - rsync --rsync-path="sudo rsync" for remote permissions - set -euo pipefail throughout - "$@" not eval (handles spaces in project path) Solves: - Manual deploys of backend bits scattered across sessions missed v0.3 accelerate methods on server until this was caught in production - No consistent way to push frontend + purge CDN in one step
PRD.md: - Version bumped to v0.3.3 - Architecture diagram: Graphiti → CustomGraphBuilder - v0.3.3 changelog: CustomGraphBuilder, deploy.sh, CDN SPA fix, COS deploy - Infrastructure decisions table updated - Decision log: 4 new entries for 2026-04-16 - Appendix C: Full Graphiti deprecation record (10 patch attempts, why self-built) - Appendix D: Complete deployment workflow (backend/frontend/CDN/COS) README.md / README-ZH.md: - Step 1 description updated: Graphiti → custom builder with 10x concurrency
- LLM JSON truncation: add _repair_truncated_json() to handle max_tokens cutoff - Ontology generation: increase max_tokens 4096→8192 to prevent truncation - Config generation: parallel batch processing (3 threads × 30 agents/batch) for ~3x speedup - Rate limit: reduce semaphore 30→8 to avoid GLM API 429 storms - HuggingFace offline: set HF_HUB_OFFLINE=1 to skip unreachable hf-mirror.com - Simulation recovery: auto-reset stuck "preparing" states on Flask restart - Prepare endpoint: handle concurrent prepare requests gracefully - Frontend: handle page-refresh-during-prepare edge case - API: return 429 instead of 500 for LLM rate limit errors
…able semaphore - Filter out non-social entity types (API, SDK, Database, etc.) from agent configs reduces agent count ~20%, fewer LLM calls per round - Make semaphore configurable via SIMULATION_SEMAPHORE env var (default 50, up from 30) allows higher concurrent LLM requests for faster round execution - Reddit semaphore also reads from env var - Estimated speedup: ~2x (25min → ~12min for 40 rounds)
- LLMClient: increase retries 5→8, backoff 1s→2s, max 30s→60s for GLM rate limit recovery - Report agent: add 3s delay between sections to avoid sustained 429 - Add theme-toggle-anchor to SimulationRun/Report/Interaction/Replay views - All pages now have dark/light theme toggle button
Report generation now supports REPORT_LLM_* env vars to route report LLM calls through a separate provider (SiliconFlow Qwen-32B) instead of sharing the GLM rate limit with simulation. Also passes the report LLM client to ZepToolsService so tool calls (PanoramaSearch, QuickSearch, InsightForge, InterviewAgents) use the same provider.
Simulation completion detection now automatically triggers report generation after a 1.5s delay, eliminating the need for users to manually click the "generate report" button.
…direct 1. Report prompt now extracts user's core questions from simulation_requirement and structures the report to directly answer them (e.g. "will it go well?", "where are the problems?") instead of generic future-prediction questions. 2. Step3Simulation auto-triggers report generation 1.5s after simulation completes. 3. ReportView auto-redirects to latest successful report when current report has failed, via /api/report/check/<simulation_id> endpoint.
- Add SimulationAnalyticsService: direct actions.jsonl access for stats, top posts, agent quotes (positive/negative), sentiment breakdown - Add simulation_analytics tool to ReportAgent ReACT loop - Rewrite prompts: demand specific numbers, prohibit vague language, require verbatim agent quotes (min 5 per section) - Refactor report_agent.py: extract prompts → report_prompts.py, data classes → report_data.py (both under 800 lines) - Add ReportInfographic.vue: metrics cards, action distribution bars, sentiment breakdown, top agents, timeline sparkline - Add infographic API endpoint: GET /api/report/<id>/infographic - Pre-compute infographic data during report generation - Increase max_tokens from 4096 to 8192 for detailed sections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verification