feat(archive): add archive-posting.mjs — save job postings as PDF bef…#697
feat(archive): add archive-posting.mjs — save job postings as PDF bef…#697Krxshna wants to merge 8 commits into
Conversation
…ore they disappear
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Node.js CLI ChangesArchive Posting Feature
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Welcome to career-ops, @Krxshna! Thanks for your first PR. A few things to know:
We'll review your PR soon. Join our Discord if you have questions. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@archive-posting.mjs`:
- Line 206: The console.log call using the template literal console.log(`\
${url}`) prints a stray backslash instead of the intended newline/emoji; update
that call to match the other usage (e.g., replace the `\ ${url}` fragment with
`\n🔗 ${url}`) so the output shows a newline and link emoji before the
url—locate the console.log that references the variable url and make this
replacement.
In `@test-all.mjs`:
- Line 364: The test currently interpolates untrusted liveJobUrl into a shell
string passed to run (see the run(...) invocation that builds `node
archive-posting.mjs "${liveJobUrl}"`), which allows command injection; change
the call to pass the program and arguments separately (e.g., use
execFile/child_process spawn or run with an args array) so the URL is an
argument rather than shell-parsed, and additionally validate or strictly
whitelist/sanitize liveJobUrl (or at minimum reject quotes/semicolons/newlines)
before passing it; update any related consumer in archive-posting.mjs to read
the URL from process.argv instead of relying on a shell-expanded string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ec7db61c-1d0d-48de-a586-3f314893b195
📒 Files selected for processing (3)
archive-posting.mjspackage.jsontest-all.mjs
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test-all.mjs (1)
355-365:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
liveJobUrlbefore passing it to the archiver (Line 364).
liveJobUrlcomes from external API data and is used directly as a crawl target. Please enforcehttps+ hostname allowlisting to prevent SSRF-style fetches if upstream data is malformed or compromised.Suggested hardening patch
let liveJobUrl = null; try { const res = await fetch('https://boards-api.greenhouse.io/v1/boards/anthropic/jobs?content=false'); const { jobs } = await res.json(); liveJobUrl = jobs?.[0]?.absolute_url ?? null; + if (liveJobUrl) { + try { + const u = new URL(liveJobUrl); + const allowedHosts = new Set(['boards.greenhouse.io', 'job-boards.greenhouse.io']); + if (u.protocol !== 'https:' || !allowedHosts.has(u.hostname)) { + liveJobUrl = null; + } + } catch { + liveJobUrl = null; + } + } } catch { /* offline — degrade gracefully */ }As per coding guidelines "Check for command injection, path traversal, and SSRF. Ensure scripts handle missing data/ directories gracefully."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test-all.mjs` around lines 355 - 365, The code uses the external liveJobUrl directly in run('node', ['archive-posting.mjs', liveJobUrl], ...) which can enable SSRF/command-injection; validate liveJobUrl before invoking the archiver by parsing it (new URL(...)) and enforcing protocol === 'https:' and that url.hostname is in an allowlist of trusted hostnames (reject or warn and skip if missing/invalid), and only pass the validated string to run; ensure the validation logic is applied where liveJobUrl is set/checked and that a failing validation results in the same skip path (warn('live archive skipped ...')) rather than calling archive-posting.mjs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test-all.mjs`:
- Around line 355-365: The code uses the external liveJobUrl directly in
run('node', ['archive-posting.mjs', liveJobUrl], ...) which can enable
SSRF/command-injection; validate liveJobUrl before invoking the archiver by
parsing it (new URL(...)) and enforcing protocol === 'https:' and that
url.hostname is in an allowlist of trusted hostnames (reject or warn and skip if
missing/invalid), and only pass the validated string to run; ensure the
validation logic is applied where liveJobUrl is set/checked and that a failing
validation results in the same skip path (warn('live archive skipped ...'))
rather than calling archive-posting.mjs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 54a72fab-248e-4017-a38c-6ea663ee11fa
📒 Files selected for processing (1)
test-all.mjs
…r upstream merge The upstream rebase added sections 11 (VERSION FILE) and a LOCATION FILTER block, leaving our ARCHIVE-POSTING section with a duplicate number and a missing closing brace on the liveJobUrl else-block. Renumbered ARCHIVE-POSTING → 12, LOCATION FILTER → 13, and added the missing } to restore valid syntax (96 passed, 0 failed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ore they disappear
What does this PR do?
Adds
archive-posting.mjs— a Playwright script that saves a live job posting as a rendered PDF tojds/before it disappears. Auto-detects company and role from the page title, supports--pipelinemode to batch-archive all pending URLs at once.Out of scope for this PR (can be follow-up issues):
Related issue
Closes #553
Type of change
Checklist
node test-all.mjsand all tests passAligns with the current-phase goal of zero-token utilities (same pattern as scan.mjs —
pure Playwright, no LLM cost). Addresses the data preservation gap identified in #553.
Questions? Join the Discord for faster feedback.
Summary by CodeRabbit