Skip to content

feat: Mark enhancement — AI analysis, share, export (#12)#41

Closed
jessie-coco wants to merge 20 commits intokevinho:mainfrom
coco-xyz:feature/12-mark-enhancement
Closed

feat: Mark enhancement — AI analysis, share, export (#12)#41
jessie-coco wants to merge 20 commits intokevinho:mainfrom
coco-xyz:feature/12-mark-enhancement

Conversation

@jessie-coco
Copy link
Copy Markdown
Collaborator

Summary

Implements issue #12 — Mark (bookmark) enhancement with AI-powered features.

New capabilities:

  • AI Analysis: POST /api/marks/:id/analyze — LLM generates structured analysis (summary, key topics, significance, related areas) + extracts topic tags
  • Share Links: POST /api/marks/:id/share creates a public share token; DELETE revokes it. Public endpoint (GET /api/marks/shared/:token) returns safe fields only (no user_id or internal IDs)
  • Export: GET /api/marks/export — Markdown (default) or JSON format with optional date range and status filters
  • Digest Preference Tuning: Generator injects bookmark topic tags into the LLM prompt so digests prioritize content matching user interests

Changes:

  • Migration 014: adds analysis, analyzed_at, share_token, digest_id, tags columns to marks table
  • src/db.mjs: 8 new DB functions (getMark, updateMarkAnalysis, setMarkShareToken, getMarkByShareToken, revokeMarkShare, listMarksForExport, getUserMarkTopics)
  • src/server.mjs: 6 new API endpoints + shared callLlmApi helper
  • src/generator.mjs: bookmark topic injection into digest generation prompt
  • test/e2e.sh: 15 new tests (all passing)

Test plan

  • All 15 new tests pass (section 19)
  • All existing tests still pass (102/109, 7 pre-existing failures in sections 14+17 due to missing API_KEY/LLM config)
  • Boot security review
  • Test with live LLM (analyze endpoint)

🤖 Generated with Claude Code

Jessie and others added 20 commits March 1, 2026 04:47
Detailed 4-phase roadmap for ClawFeed v0.8 → v2.0 upgrade:
- Phase 1 (v0.9–v1.0): Data pipeline + personalization
- Phase 2 (v1.0–v1.5): Multi-channel push + AI interaction
- Phase 3 (v1.5–v2.0): Platform API + Source Market
- Phase 4 (v2.0+): Monetization + enterprise features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: raw_items collection pipeline (Phase 1)

Cherry-pick from kevinho/clawfeed PR #15. Decouples source collection
from digest generation:

- Add raw_items table (migration 010) with dedup via UNIQUE constraint
- Add collector.mjs: standalone fetcher for RSS, HN, Reddit, GitHub
  Trending, Website sources with SSRF protection and concurrency pool
- Add db.mjs CRUD: insertRawItemsBatch, listRawItems,
  listRawItemsForDigest, getRawItemStats, cleanOldRawItems,
  touchSourceFetch, recordSourceError, getSourcesDueForFetch
- Add API endpoints: GET /api/raw-items, /api/raw-items/stats,
  /api/raw-items/for-digest
- Auto-pause sources after 5 consecutive fetch failures
- 30-day TTL cleanup for old raw_items

Closes #2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address security review findings from Boot + Lucy

High:
- Fix SSRF DNS rebinding (TOCTOU): pin resolved IP via custom lookup
  callback so http.get uses the same IP that was validated
- Fix IPv6-mapped IPv4 bypass: extract and validate the embedded IPv4
  from ::ffff:x.x.x.x addresses
- Add source-level permission check: /api/raw-items and /api/raw-items/stats
  now scoped to user's subscribed sources only

Medium:
- Replace DJB2 32-bit hash with sha256 for dedup_key (lower collision risk)
- Add content:encoded support in RSS parser
- Read COLLECTOR_INTERVAL/CONCURRENCY from process.env (consistency)

Other:
- Add graceful shutdown (SIGTERM/SIGINT) for --loop mode
- Add resp.setEncoding('utf8') to prevent implicit Buffer→string conversion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Jessie <jessie@coco.site>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add admin API for collector management:
- GET /api/collect/status — returns sources_due, sources_active,
  sources_paused, last_fetch_at, raw_items_total, raw_items_24h
- POST /api/collect/trigger — spawns one-shot collection process
  (API key auth required for both endpoints)

Add getCollectorStatus() DB helper for aggregated collector metrics.
Add 7 new E2E test assertions for collector endpoints (73 total).

The collector.mjs --loop mode (from PR #6) provides the core scheduling
engine. This PR adds the API layer for monitoring and manual triggers,
completing the cron integration needed for PM2-managed deployment.

Closes #4

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add LLM-powered per-user digest generation engine. Each user gets
digests based on their subscribed sources instead of seeing a single
global digest.

Changes:
- New src/generator.mjs: digest generation engine with OpenAI-compatible
  LLM API, per-user and system digest modes, CLI interface
- db.mjs: createDigest now accepts user_id; new functions for
  getUsersDueForDigest, getActiveSubscriptionSourceIds, getLastDigestTime
- server.mjs: GET /api/digests returns personalized digests for logged-in
  users; new GET /api/digest-status endpoint
- Migration 011: index on digests(user_id, type, created_at)
- .env.example: LLM config vars (LLM_API_URL/KEY/MODEL/TIMEOUT)
- package.json: generate and generate:daily scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…assignment

- callLlm: clearTimeout in resp.on('end') handler (timer leak)
- GET /api/digests: new users with no personal digests fall back to system digests
- generator.mjs: remove dead db2 = db assignment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Generator engine (src/generator.mjs): LLM-powered per-user digest from subscriptions
- System digest fallback for new users (user_id=NULL, public sources)
- DB: createDigest with user_id, getUsersDueForDigest, getActiveSubscriptionSourceIds
- API: GET /api/digests (personal + system fallback), GET /api/digest-status
- Migration 011: index on digests(user_id, type, created_at)
- Config: LLM_API_URL/KEY/MODEL/TIMEOUT, GENERATOR_MAX_ITEMS

Reviewed-by: jessie-coco, boot-coco
Closes #3
- POST /api/chat endpoint with LLM integration using digest content as context
- Chat bubble UI (bottom-right) with expandable chat box
- Conversation history persisted in sessionStorage
- Dark/light theme support, mobile responsive
- E2E tests for chat API (sections 17.1-17.4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sts (#10)

- Add email_preferences + email_log DB tables (migration 012)
- Add responsive HTML email template (dark header, clean layout)
- Add emailer.mjs: Resend-based sender with retry logic, dry-run mode
- Add server endpoints: GET/PUT /api/email/preferences, GET/POST /api/email/unsubscribe
- Add 11 E2E test cases (all passing)
- Add npm scripts: email, email:daily, email:weekly
- Update .env.example with RESEND_API_KEY, EMAIL_FROM, BASE_URL

Pending: Resend account setup + coco.xyz domain DNS verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add migration 012_telegram.sql (telegram_links, link_codes, push_log)
- Add DB functions for Telegram linking, preferences, push logging
- Create src/telegram.mjs — long-polling bot with /start, /digest, /stop, /settings
- Add server endpoints: GET/POST/PUT/DELETE /api/settings/telegram
- Hook generator.mjs to dispatch push after digest creation (fire-and-forget)
- Add TELEGRAM_BOT_TOKEN to .env.example
- Add npm telegram script

Link flow: user /start in bot → gets 6-digit code → enters in web UI → account linked.
Push: after digest generated → fork telegram.mjs --push → sends to subscribers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H1: Escape HTML entities before markdown→HTML conversion, sanitize
    link hrefs to http/https only (prevents XSS from RSS content)
H2: GET /api/email/unsubscribe now shows confirmation page only,
    POST executes the actual unsubscribe (prevents email client
    link prefetchers from auto-unsubscribing users)
M1: Remove dead variable in sendDigestToUser
M2: upsertEmailPreference now returns fresh data after update
M4: .env.example BASE_URL defaults to localhost

Added 3 new E2E tests (18.12-18.14) for prefetcher safety verification.
Updated EMAIL_FROM to hxa.net per Kevin's domain decision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- H1: Require auth + add rate limiting (20 req/min per user) on /api/chat
- H2: Add 64KB response size limit + 8K content cap on LLM responses
- H3: digest_id input validation (numeric only)
- H4: Escape error messages in chat UI (XSS fix)
- H5: Sanitize role from sessionStorage history (DOM injection fix)
- M1: Remove failed user message from history on connection error
- M3: Clear chatDigestId when navigating away from digest viewer
- Updated E2E tests: auth requirement + 5 test cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ombies

C-1 (Critical): pushDigestToTelegram now checks digest.user_id matches
subscriber — prevents sending one user's personalized digest to all
Telegram subscribers. System digests (user_id=null) still go to everyone.

S-1 (High): Link code brute-force mitigation — tracks failed attempts
per code, invalidates after 5 wrong guesses within TTL window.

C-2 (Medium): Generator fork uses stdio:'ignore' instead of 'pipe' to
prevent pipe buffer exhaustion causing zombie child processes.

C-3 (Medium): Digest messages sent without parse_mode to avoid Telegram
API rejecting LLM-generated content with unescaped markdown chars.

S-3 (Medium): saveTelegramLink uses ON CONFLICT to preserve existing
enabled/digest_types preferences when re-linking.

T-1: Added 19 unit tests covering all fixes (telegram.test.mjs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H-1 (High): Block all bot commands in group chats — only respond in
private DMs. Prevents link code exposure and digest data leaks in groups.

H-2 (High): Store chat_username in telegram_links table. Previously the
username parameter was accepted but silently discarded.

H-3 (High): Remove internal chat_id from POST /link API response.

M-3 (Medium): Add 30s timeout to Telegram API calls (getUpdates gets
its own longer timeout matching POLL_TIMEOUT + 10s buffer).

M-4 (Medium): /digest command now filters out system digests (user_id
IS NULL) to match push notification behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge main (post-PR #15) into feat/telegram-push. Resolved conflicts:
- db.mjs: keep both email (012) and telegram (013) migrations + functions
- server.mjs: keep both email and telegram API endpoints
- Migration renamed: 012_telegram.sql → 013_telegram.sql

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Telegram Bot push notifications (#9)
Keep both chat widget endpoints and email/telegram endpoints
from merged PRs #15 and #16. Tests sections 17 (chat) and
18 (email) both preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Migration 014: adds analysis, share_token, tags, digest_id columns to marks
- AI analysis endpoint (POST /api/marks/:id/analyze): LLM generates summary,
  topic tags, significance. Tags stored for digest preference tuning.
- Share links (POST/DELETE /api/marks/:id/share): generate/revoke public
  share tokens. Public endpoint returns safe fields only (no user_id).
- Export (GET /api/marks/export): markdown and JSON formats with date filtering
- Digest preference integration: generator.mjs injects user's bookmark topic
  tags into LLM prompt to prioritize relevant content
- Shared callLlmApi helper extracted for reuse across endpoints
- 15 new e2e tests covering all new endpoints + auth + isolation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jessie-coco jessie-coco closed this Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants