[review only — do not merge] Snapshot: dev state across work streams by ageem23 · Pull Request #1 · ageem23/career-ops

ageem23 · 2026-04-30T18:24:38Z

Internal review-only PR. Not for merge.

Goal: get CodeRabbit findings on the snapshot commit (semantic title-filter phase, LinkedIn provider, triage/scoring utilities, scan.mjs round-3 edits, small system touch-ups) before splitting into focused per-feature branches for upstream PRs.

Each work stream listed in the snapshot commit message will be split into its own branch from upstream/main once findings are addressed.

Summary by CodeRabbit

Release Notes

New Features
- Added LinkedIn job source provider with authentication, session management, and automated job scraping
- Enhanced title filtering with AI-powered semantic analysis as fallback for non-matching titles
- New CLI tools: filter pattern analyzer, LinkedIn configuration validator, and pipeline score updater
Documentation
- Extended provider contract documentation with authentication lifecycle hooks and semantic options
- Enhanced configuration templates with LinkedIn setup and semantic filtering guidance

Updates input keys from kebab-case to snake_case so the welcome workflow can dispatch its messages again. The dependabot bump to first-interaction@v3 (santifer#371) silently broke the action because v3 renamed repo-token, pr-message and issue-message to repo_token, pr_message and issue_message. Restores onboarding for first-time contributors.

Surfaces terminal pipeline states (REJECTED, DISCARDED) in the dashboard top navigation, reachable via left/right tab navigation. Includes regression test verifying each new tab isolates the matching status rows.

…nd current scripts Adds modes/{fr,ja,pt,ru}/, modes/latex.md, GEMINI.md, generate-latex.mjs, scan.mjs, doctor.mjs, check-liveness.mjs, liveness-core.mjs, analyze-patterns.mjs, followup-cadence.mjs, gemini-eval.mjs, test-all.mjs, and .gemini/commands/ to the SYSTEM_PATHS array. The list had drifted from the actual repo as languages and scripts were added incrementally. Closes santifer#337.

Auto-detects tectonic on PATH and uses it as the preferred LaTeX engine, falling back to pdflatex when tectonic is not available. Tectonic is a much lighter LaTeX install (~150MB vs ~5GB for MiKTeX/MacTeX) and is now widely available via Homebrew, apt, and direct binaries. Closes santifer#394.

…tion Adds cv.output_format setting to profile.example.yml (html default, latex opt-in) and updates auto-pipeline.md to branch between modes/pdf.md and modes/latex.md based on the config. Also moves canva_resume_design_id under the cv: block for path consistency. Closes santifer#396.

… is stale Adds a fallback path so that when the local VERSION file lags behind the latest GitHub release tag, the updater can still detect updates correctly. Resolves the case where users on v1.5.0 were being told they were up-to-date despite v1.6.0 being available. Closes santifer#316.

DATA_CONTRACT.md was only documenting modes/de/* among the localized mode directories. Adds the four other languages (fr, ja, pt, ru) that were already shipped in the repo. Closes santifer#338.

Surfaces the tracker ID in the dashboard pipeline list so users can reference rows by ID in DMs, issue reports, and discussions. Includes regression tests for the new column rendering.

Patch + bug fixes only in this version range, no API surface changes. Signed-off by dependabot.

… overlap ratio in roleFuzzyMatch Reduces false matches in roleFuzzyMatch by filtering seniority tokens (junior, senior, lead, staff, principal, etc.) and location tokens before computing overlap, plus enforcing a minimum overlap ratio. Closes santifer#329 (over-matching causing duplicate detection to flag unrelated roles as same).

Translates modes/batch.md from Spanish to English in place. Companion to santifer#403 (apply.md). Both files were the source of Spanish content leaking into report outputs on certain code paths.

… variants Adds three regex patterns to HARD_EXPIRED_PATTERNS in liveness-core.mjs to catch closed postings that were silently slipping through liveness checks. Includes a regression test against a real Singapore mycareersfuture.gov.sg posting that exhibits the 'Applications have closed' format.

Translates modes/apply.md from Spanish to English in place. Several Discord users (most recently mavetor) reported reports coming back partially in Spanish on certain code paths — this resolves the inconsistency by aligning apply.md with the rest of the system documentation language.

…le template Adds verified Canada/Vancouver companies plus an automation companies block to templates/portals.example.yml. Pure data addition — no code changes — that opens an angle the example template wasn't covering for non-US users.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

@vl4duu

Adds .claude-plugin/marketplace.json and .claude-plugin/plugin.json so career-ops can be installed via claude plugin marketplace add ./ && claude plugin install career-ops. Pointed at the existing .claude/skills/ path — zero breaking change. Path migration deferred to follow-up RFC. Co-authored with @vl4duu (original design from santifer#275).

Adds release-please-config.json with release-type: simple + extra-files configuration to sync package.json version via JSONPath. Also bumps VERSION 1.3.0 → 1.6.0 and package.json 1.0.0 → 1.6.0 to align with the v1.6.0 published release. Resolves the version drift between VERSION, package.json, .release-please-manifest.json, and CHANGELOG that broke update-system.mjs detection. Closes santifer#336, santifer#523.

…ani2112

…ani2112 Adds new "Community Voices" section in CONTRIBUTORS.md for members who landed roles using career-ops and opted in publicly via the i-got-hired issue template. First entry: @logumani2112 (Backend Developer .NET, 50 listings, 1 month timeline, A-F scoring as most useful feature). Permission opt-in granted in santifer#440. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ration Adds writing-samples/ user-layer folder + DATA_CONTRACT entry + modes/_shared.md Writing Style Calibration section that scans user samples to extract tone, sentence structure, punctuation habits, vocabulary, and voice signatures. Caches result in modes/_profile.md ## Writing Style section to skip re-scan on subsequent sessions. Privacy-first: no verbatim copying, no PII retention, only abstract style descriptors. Opt-in via samples present (skips silently if none provided). Real-world validated: author landed an interview using this feature on 30 Apr. Closes santifer#380.

Single-commit snapshot of in-flight work for sharing with collaborators. Not intended for upstream merge — each work stream below should be split into its own focused PR after review (and, where the maintainer has flagged it, after an RFC). Work streams included: - Plugin architecture round-3 (scan.mjs semantic phase, portals.example, CLAUDE.md) — paused pending RFC per santifer's feedback on PR santifer#454 - LinkedIn provider (providers/linkedin.mjs, test-linkedin-config.mjs) — needs RFC per same governance pattern as PR santifer#454 - Semantic title-filter phase (scan-semantic.mjs) - Triage / scoring utilities (triage-pending.mjs, update-pipeline-scores.mjs, analyze-filter-patterns.mjs) - Small touch-ups: batch/batch-runner.sh, dashboard/main.go, modes/contacto.md, package.json - .gitignore additions for article-digest.md and .claude/settings.json Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-30T18:24:44Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)

main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2f76bc54-9449-4fef-8e72-93198349bafd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch personal/dev-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ageem23 · 2026-04-30T18:30:28Z

@coderabbitai review

coderabbitai · 2026-04-30T18:30:54Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

batch/batch-runner.sh (1)
73-81: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard --model before reading $2.

With set -u, invoking the script as batch-runner.sh --model aborts on an unbound $2 instead of producing a usable error. Please validate that a value is present before assigning MODEL here (ideally with the same guard for the other value-taking flags in this parser).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/batch-runner.sh` around lines 73 - 81, The flag parser currently
assigns value-taking flags (e.g., --model -> MODEL) without verifying that a
following argument exists, which causes an unbound variable error under set -u;
update the case branches that consume a value (at least the --model branch, and
ideally --parallel, --start-from, --max-retries, --min-score) to check that a
value is present (for example test that $# -ge 2 or that "${2-}" is non-empty)
before assigning MODEL="$2" and shifting, and if the value is missing emit a
clear usage/error message and exit non-zero.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 23: Remove the overly-broad ignore entry "batch/*" from .gitignore
because it causes system files like batch/batch-runner.sh to be ignored,
prevents negation entries such as "!batch/logs/.gitkeep" and
"!batch/tracker-additions/.gitkeep" from re-including files, and makes the
earlier "batch/bat*" pattern redundant; keep the existing specific patterns (the
entries around "batch/bat*", "!batch/logs/.gitkeep", and
"!batch/tracker-additions/.gitkeep") and delete the "batch/*" line so system and
explicitly negated files remain tracked.

In `@analyze-filter-patterns.mjs`:
- Around line 36-43: The CLI numeric flags may produce NaN via parseInt
(including when a flag is present with no value); update the parsing around
argValue, sinceDays, minFreq, and topN to validate the returned value: after
calling argValue('--since','30') etc., attempt to parseInt and then check
Number.isFinite(result) and result > 0 (or other domain constraints), and if
invalid either fall back to the default or throw/exit with a clear usage error
message; also ensure argValue detects the “flag present but no next token” case
(i.e., args.indexOf(flag) returns last index) and treat that as invalid so you
log/exit rather than propagating NaN into downstream date math and threshold
logic.

In `@dashboard/main.go`:
- Around line 29-42: Remove the fixed "i < 6" cap and instead loop until the
filesystem root by using an infinite loop that updates parent :=
filepath.Dir(cur) and breaks when parent == cur; for each filepath.Join(cur,
"applications.md") and filepath.Join(cur, "data", "applications.md") call use
os.Stat and if err == nil return cur; if err != nil and !os.IsNotExist(err)
treat it as a real filesystem error (do not silently treat it as "not found") —
for example return start (preserving existing behavior) or propagate/log the
error — and only ignore errors where os.IsNotExist(err) is true so the search
continues up to the root.

In `@modes/contacto.md`:
- Around line 65-79: The "Name verification gate" block is written in English
inside a primarily Spanish mode and should be made language-consistent: either
translate the entire "Name verification gate (MANDATORY — do this BEFORE
presenting the message to the user):" section (including checks 1–4: Exact-match
check, No-substitution check, Right-person check, Present format) into Spanish
so it matches the rest of the contacto mode, or move this gate into a
language-specific file for English; ensure the translated terms and the example
salutation line (“Sending to: {Full Name} ({linkedin URL}) — addressing them as
"{salutation in message}"”) preserve the exact verification steps and the
instruction to STOP and ask for confirmation if any check fails.
- Around line 74-77: Update the "Present format" guidance so the "Sending to"
line is explicitly described as a pre-send verification header and not part of
the outbound LinkedIn message body or its 300‑character limit; specifically
modify the "Present format" block (the one that shows: **Sending to:** {Full
Name} ({linkedin URL}) — addressing them as "{salutation in message}") to state
it is metadata only, must be displayed above the message for verification, must
not be copied into the message body, and must not be counted against the
300‑char message limit.

In `@providers/linkedin.mjs`:
- Around line 224-236: The unwrapRedirect function currently accepts any URL
scheme when decoding the nested LinkedIn redirect; update the validation so that
after parsing the decoded value (the variable decoded) into a URL object you
explicitly check the URL's protocol and only accept 'http:' or 'https:'—if the
protocol is anything else, return the original trimmed href; ensure this check
happens inside the try block after creating the URL and before returning decoded
(and keep existing fallback returns intact).
- Around line 211-217: slugify currently can return an empty string for
non-Latin titles, causing saveJd to write to jds/.md and collide; update slugify
(or the caller saveJd) to fall back to a stable non-empty identifier when
slugify(text) === '' (e.g., compute a short stable hash of the original text or
use a deterministic UID derived from the title/company and timestamp) and use
that fallback for file names/URLs; additionally, respect language modes by
detecting German/French/Japanese postings and, if config/profile.yml has
language.modes_dir set (e.g., modes/de, modes/fr, modes/ja), load modes from
that directory instead of default modes/ to suggest appropriate language modes.

In `@scan-semantic.mjs`:
- Around line 89-93: The current fallback finds braceStart and braceEnd using
lastIndexOf which can include extra trailing braces; instead scan forward from
braceStart to find the matching closing brace by tracking a depth counter
(increment on '{', decrement on '}') and stop when depth returns to zero to get
the index of the first balanced object, then call JSON.parse on
text.slice(braceStart, matchingIndex + 1); update the code around
braceStart/braceEnd and the JSON.parse call to use this matchingIndex approach
so only the first balanced JSON object is parsed.
- Around line 130-131: hasSemanticBackend currently swallows errors from
resolveBackend (hiding bad CAREER_OPS_SEMANTIC_BACKEND overrides); change it to
let invalid override errors propagate (or return a structured Error) instead of
returning null—i.e., remove the empty catch that returns null and either rethrow
the caught error from hasSemanticBackend or return an object like { error } so
callers can log it before falling back; update references to
hasSemanticBackend/resolveBackend (and any callers expecting null) to handle the
propagated error or structured error accordingly.

In `@scan.mjs`:
- Around line 271-298: The code allows a bare "--login" to slip through because
loginProviderId can be undefined; update the login handling to detect when the
flag is present but no provider id was supplied (check loginFlag !== -1 &&
!loginProviderId) and immediately print a usage/error message and exit (set
process.exitCode = 1 and return or process.exit(1)). Place this check before
calling providers.get(loginProviderId) and before invoking provider.login(),
referencing the existing loginFlag and loginProviderId symbols so a missing
value fails fast instead of continuing into the normal scan path.

In `@templates/portals.example.yml`:
- Around line 30-35: The comment in the semantic phase block is misleading about
override precedence; update the text so it explicitly states the actual
resolution order used at runtime: first check the CAREER_OPS_SEMANTIC_BACKEND
environment variable, then check for ANTHROPIC_API_KEY, then look for the
`claude` CLI in PATH, and if none are available neutrals are rejected; mention
the possible values for CAREER_OPS_SEMANTIC_BACKEND (api|cli) and reference the
semantic phase/title filtering to make intent clear.
- Around line 119-123: The comment for the archetypes block incorrectly states
it is used only when ANTHROPIC_API_KEY is set; update the text to reflect that
the archetypes are consumed by both API and CLI semantic flows
(scan-semantic.mjs passes archetypes to both backends). Edit the archetypes
description to remove the "only" qualifier and explicitly note that these
canonical role-family descriptions are used by the semantic phase for both the
API and Claude CLI paths so users know the section is not ignored when using the
CLI.

In `@test-linkedin-config.mjs`:
- Around line 136-140: After calling loadConfig(), validate that
config?.tracked_companies is an array before using .map(): if config is missing
or tracked_companies is undefined treat it as an empty array, but if
tracked_companies exists and is not an Array, log an error and exit process with
code 2; update the code around the variables config, entries and linkedinEntries
(the block that sets const config = loadConfig(); const entries =
config?.tracked_companies || []; const linkedinEntries = ...) to perform an
explicit Array.isArray check and handle the non-array case as described.
- Around line 112-120: The KNOWN set advertises delay_pages and delay_searches
as supported but the validator in test-linkedin-config.mjs never checks their
shapes, which causes providers/linkedin.mjs to later assume a [min,max] tuple
and fail; update the validation loop to validate that entry.delay_pages and
entry.delay_searches (when present) are arrays of two numeric values with
min<=max (or else push a warning/error), or remove those keys from the KNOWN set
if you choose not to validate them; reference the KNOWN constant and the entry
object in test-linkedin-config.mjs and ensure compatibility with the tuple
expectation in providers/linkedin.mjs.

In `@triage-pending.mjs`:
- Around line 188-206: The location-scoring logic in the triage code hardcodes a
Chicago-first, US-remote preference (see the location variable, the regex checks
and the score/breakdown mutations) which should instead read per-user geo
preferences from profile config; update the scorer to load the profile's geo
preferences (e.g., include allowlist/denylist, remote-preference, and
preferred-US-hubs) from config/profile.yml or the profile API and replace the
inlined regex/weights with a small policy evaluator that applies those
configured rules to entry.location, then adjust score and breakdown accordingly
(keep the same score/breakdown API but derive +5/+3/-10 etc. from config values
rather than hardcoded literals and remove the fixed Chicago/IL assumptions).

In `@update-pipeline-scores.mjs`:
- Around line 13-24: The script currently calls fs.readFileSync(STATE) and
fs.readdirSync(REPORTS) unguarded and will crash on a clean checkout; modify the
logic around STATE/REPORTS so missing artifacts are treated as empty input:
check existence (fs.existsSync or try/catch) before reading and if STATE is
missing set stateLines = [] (so urlMap stays empty) and if REPORTS is missing
set reportFiles = [] (or skip report processing), then return/exit early or
no-op the update step so the pipeline is left untouched; update references to
stateLines, urlMap, and reportFiles accordingly to handle empty arrays.
- Around line 38-40: findUrl currently only matches http(s) URLs so
"local:jds/..." links get dropped when rewriting pipeline; update the findUrl
function to recognize and return local:jds/... URIs as well (e.g. match either
https?://\S+ or local:jds/\S+ using a single regex) so callers that rewrite
entries (the code paths around the existing findUrl and the logic in the 56-76
region) will preserve those local:jds links.

---

Outside diff comments:
In `@batch/batch-runner.sh`:
- Around line 73-81: The flag parser currently assigns value-taking flags (e.g.,
--model -> MODEL) without verifying that a following argument exists, which
causes an unbound variable error under set -u; update the case branches that
consume a value (at least the --model branch, and ideally --parallel,
--start-from, --max-retries, --min-score) to check that a value is present (for
example test that $# -ge 2 or that "${2-}" is non-empty) before assigning
MODEL="$2" and shifting, and if the value is missing emit a clear usage/error
message and exit non-zero.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1ce9e49a-3afd-4c87-af73-375e7d082103

📥 Commits

Reviewing files that changed from the base of the PR and between 7f77834 and 410961c.

📒 Files selected for processing (14)

.gitignore
CLAUDE.md
analyze-filter-patterns.mjs
batch/batch-runner.sh
dashboard/main.go
modes/contacto.md
package.json
providers/linkedin.mjs
scan-semantic.mjs
scan.mjs
templates/portals.example.yml
test-linkedin-config.mjs
triage-pending.mjs
update-pipeline-scores.mjs

batch/* matched the whole batch/ directory, which: - broke the !batch/logs/.gitkeep and !batch/tracker-additions/.gitkeep re-includes (gitignore can't un-ignore once the parent dir is ignored) - ignored system files like batch/batch-runner.sh that ship with the repo - was redundant with the more specific batch/logs/*, batch/bat*, batch/batch-state.tsv, batch/batch-input.tsv, and batch/tracker-additions/**/*.tsv rules already in the file Removing it lets the .gitkeep negations work and keeps system files trackable while the specific TSV/log rules continue to ignore state artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… URLs Two fixes for the pipeline rewriter: 1. Match local:jds/... in addition to http(s):// when extracting URLs from pipeline lines. Without this, every LinkedIn-provider entry (which uses the local: convention because the search URL isn't reachable post-session) was silently dropped from the regenerated pipeline. 2. Treat a missing batch-state.tsv or reports/ directory as "nothing to refresh" instead of throwing. A clean checkout or pre-first-batch run no longer crashes on script startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`node scan.mjs --login` (no provider id, or followed by another flag) used to silently fall through into a normal scan because loginProviderId would be undefined and the login branch's `if (loginProviderId)` check skipped. That's a surprising side effect for what's clearly a setup/auth flag. Now rejects bare --login or --login followed by another flag with a usage hint and exit 1, before any provider work runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…end override Two fixes for the semantic phase: 1. extractJson() walks the response one character at a time tracking brace depth (and JSON string state) instead of slicing to the last '}'. Models sometimes emit valid JSON followed by trailing commentary that contains stray braces; the previous implementation would treat the whole chunk as malformed and reject every neutral title for that batch. 2. hasSemanticBackend() now logs the resolveBackend() error to stderr before returning null when CAREER_OPS_SEMANTIC_BACKEND holds an invalid value. The behavior contract (return null when no backend is usable) is unchanged so existing callers still fall back, but a config typo no longer silently disables the semantic phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… schemes Two fixes for the LinkedIn provider: 1. slugify() previously stripped to '' for titles/companies written entirely in non-Latin scripts (Japanese, Arabic, Cyrillic, etc.), causing every such posting to collide onto jds/.md / local:jds/.md. Falls back to a short SHA-1 prefix (jd-<10hex>) when the Latin slug is empty so each unique input still gets its own file. 2. unwrapRedirect() decoded LinkedIn /safety/go redirects with `new URL(decoded)` and returned the result without checking the scheme. javascript:, file:, data: etc. would silently land in _application_url and the JD frontmatter. Now drops anything outside http:/https: and returns ''. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(scan): pre-evaluation fuzzy dedup against pipeline.md

# Conflicts: # scan.mjs

The fuzzy match requires ≥2 overlapping content words after stopword stripping. That caused byte-identical short roles to slip past it: "VP Engineering" → tokens: [] (vp <3 chars, engineering stopword) "CTIO AI Engineering Manager" → tokens: ["ctio"] (engineer + manager stopword'd, ai <3 chars) Two identical strings tokenizing to ≤1 content word each would yield overlap < 2 → no match. Real applications.md had 8 such duplicate clusters that dedup-tracker.mjs refused to merge even though the strings were literally equal. Fix: layer an exact normalize-and-compare in front of the fuzzy path, in both call sites: dedup-utils.roleMatch — gains the layered fallback (used by dedup-tracker.mjs post-hoc cleanup). scan.mjs.findCompanyRoleDup — adds the same fallback at scan-time so future scans dedup these cases before evaluation. Seen entries now also cache normRole alongside tokens and exactKey so the check is a single map lookup, not a re-tokenize. Measured impact on the current applications.md (447 → 432 rows): - dedup-tracker.mjs now removes 15 duplicates (was 5). - verify-pipeline.mjs reports 0 warnings (was 11). - Cases newly merged: Mitchell Martin Lead AI Engineer, Medidata SVP of Engineering, LTV.ai Head of Engineering, OP Recruiting VP Engineering, Veeva Systems VP Engineering (3→1), UKG Careers Lead AI Engineer, Motion Recruitment VP of Engineering, PwC CTIO AI Engineering Manager, Visionary Tech Lead AI Architect. The fuzzy path is unchanged for all longer/substantive titles — exact match only fires when the normalized strings are equal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address CodeRabbit finding on PR #25: the exact-match fallback in roleMatch relies on normalizeRole as the canonical key, but the existing normalizer had two gaps that defeated the fallback: 1. Unsupported punctuation was stripped without inserting a space, so "VP,Engineering" collapsed to "vpengineering" — a single bogus token that fuzzy can't recover. Now any run of disallowed chars maps to a single space, then \s+ collapse normalizes. 2. Slash separators kept their surrounding whitespace, so "AI/ML Engineer" and "AI / ML Engineer" hashed differently. Now \s*\/\s* canonicalizes to a no-whitespace "/" before the final space collapse. After the fix all three "VP Engineering" / "VP,Engineering" / "VP, Engineering" produce the same key, and "AI/ML Engineer" / "AI / ML Engineer" do too. Regression-tested against all the existing PR #25 unit cases — no behavior change for the already-passing fuzzy and exact-match scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(dedup): exact-normalized match before fuzzy path

# Conflicts: # scan.mjs

# Conflicts: # dashboard/internal/ui/screens/pipeline.go # dashboard/internal/ui/screens/viewer.go # modes/contacto.md # modes/pipeline.md # providers/_http.mjs # providers/ashby.mjs # providers/greenhouse.mjs # providers/lever.mjs # scan.mjs

# Conflicts: # batch/batch-runner.sh # modes/contacto.md # scan.mjs # templates/portals.example.yml

Esc no longer quits since PR santifer#526 made it clear search instead. The upstream help bar was updated to match; the local merge lost that edit so the bar advertised a quit key that did nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

batch-runner.sh records every evaluated offer in batch/batch-state.tsv but never writes back to data/pipeline.md. Offers processed via batch mode stay in the pipeline "Pendientes" inbox indefinitely -- the next scan and the next `/career-ops pipeline` run re-surface them, producing duplicate reports and tracker rows. For anyone running batch regularly the inbox never drains. Add reconcile-pipeline.mjs: for each completed/skipped entry in batch-state.tsv whose URL is still in pipeline.md "Pendientes", move the line to "Procesadas" with its report link, score and PDF flag. It is idempotent -- an already-moved entry is a no-op -- so it is safe to run after every batch. batch-runner.sh now calls it from merge_tracker(), between the tracker merge and the integrity check. Also exposed as `npm run reconcile` for standalone use, and covered by test-all.mjs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- reconcile-pipeline.mjs: constrain user-supplied --state/--pipeline paths to the repository tree. A path that escapes via `..` or an absolute target outside the repo is now rejected before any read or write, closing a path-traversal vector. - reconcile-pipeline.mjs: when creating a missing processed section, match the pending section's language -- `## Processed` for English pipelines, `## Procesadas` for Spanish -- instead of always writing the Spanish header. - batch/README.md: note the script reconciles `skipped` entries too (not just `completed`), and that entries without a report file on disk are left in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

resolveInsideRepo() validated --state/--pipeline lexically only, so a symlink inside the repo could still resolve to a target outside it. Resolve the repo root and the target (or its parent, when the target does not exist yet) with realpathSync before the boundary check. Addresses CodeRabbit review feedback on santifer#712. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A directory passed as --state/--pipeline (e.g. `--state batch`) cleared existsSync() and the boundary check, then crashed later with an unhandled EISDIR from readFileSync/copyFileSync. resolveInsideRepo() now rejects directory targets up front with a clear message. Addresses CodeRabbit review feedback on santifer#712. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In prompt mode, contacto emitted the contacts table and add-task.mjs block itself (MODE B Steps 5/6) -- but in prompt mode the LinkedIn research has not run, so those came out as placeholders with no verified names and no dates. Move the add-task.mjs block into the research prompt as section 5, so the web Claude (which does the research) produces it with the real verified names. Each task line now carries a required --due date computed from a new "Today's date" anchor in the prompt. Claude Code resolves the App# once up front and bakes it into the prompt; it no longer emits a table or task block itself in prompt mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add nightly-careerops.sh: a wrapper that runs the portal scan, imports new pipeline.md offers into batch-input.tsv (URL-deduped, capped at 30/night), and runs the batch evaluator. Intended to be driven by a daily OS scheduler for unattended runs. Add a candidate-agnostic Evaluation Early-Exit rule to the _profile template: workers run Blocks A-B first and short-circuit to an abbreviated SKIP report when an offer hits a clear disqualifier, skipping the expensive Blocks C-F. All specifics are read from config/profile.yml and cv.md, so the rule is portable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

core.autocrlf=true on Windows would rewrite *.sh to CRLF on checkout, which breaks the bash shebang under a scheduler and on macOS/Linux. Pin *.sh to eol=lf so nightly-careerops.sh and batch-runner.sh stay portable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The 21:00 nightly was consistently leaving Pendientes overflow because of the 30-offer cap. Add a --no-scan flag so a second Task Scheduler entry (career-ops-nightly-late at 03:00) can clear those leftovers without spending the scan budget twice. The 03:00 timing lets Claude's 5-hour token bucket reset by 08:00. Log files are suffixed with -late under --no-scan so the two runs don't clobber each other. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Auto-sleep 60s before scan.mjs on non-interactive runs (Task Scheduler after wake-from-sleep) so the VPN can settle before providers are hit. Configurable via --wait=SECS / --no-wait. - Morning summary leads with a scan-error banner (count + first 5 providers) so failures don't get buried in the verbose log. - 03:00 late run now appends to the evening summary instead of overwriting it — both runs visible in tmp/nightly-latest-summary.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

applyStatusUpdate already cascades a thank-you task on Interview; mirror that for terminal statuses by auto-completing every pending task for the application when its status moves to Rejected or Discarded. Stops stale follow-up reminders from surfacing in the dashboard once the app is closed out. Adds autoCompletePendingTasksForApp() and converts the cascade dispatch to a switch on the normalized status. Bails early when app.Number <= 0, consistent with the existing Interview path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two improvements to triangulate scan failures (DNS vs TCP vs TLS vs HTTP) when the nightly runs fire after wake-from-sleep: - nightly-careerops.sh: pre-flight probe to a handful of representative hosts before scan.mjs runs. Retries up to 3x with 20s backoff on transient failures; logs per-host transport success/fail so the morning log has a leading indicator of connectivity state. Opt out with --no-preflight. Probe uses GET (not HEAD) because some endpoints reject HEAD with no body. - scan.mjs: error capture now walks err.cause and surfaces code/errno/syscall/host instead of collapsing every transport failure to "fetch failed". Distinguishes ENOTFOUND (DNS) from ECONNRESET (VPN drop / IP-reputation block) from TLS handshake errors at a glance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

autoCompletePendingTasksForApp called UpdateTaskStatus in a loop, and each call does a full readTasksFile + writeTasksFile. For an app with N pending tasks that was 1+N full reads and N full rewrites of tasks.md (amplified to KxN in the bulk-status path), which made status changes visibly slow — worse under the OneDrive-synced data dir. Add data.CompletePendingTasksForApp: read tasks.md once, flip every matching pending task to done in memory, write once (and skip the write entirely when nothing matches). Rewrite autoCompletePendingTasksForApp to call it, dropping the redundant ParseTasks read and the per-task write loop. Behavior is unchanged; N reads + N writes -> 1 read + 1 write per app. Adds tasks_test.go covering the target-app-only flip, preserved completion dates, untouched other-app tasks, and the no-match no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

BobbyWang0120 and others added 20 commits April 26, 2026 18:46

feat(dashboard): add rejected and discarded pipeline tabs

7d05967

Surfaces terminal pipeline states (REJECTED, DISCARDED) in the dashboard top navigation, reachable via left/right tab navigation. Includes regression test verifying each new tab isolates the matching status rows.

docs(data-contract): list fr/ja/pt/ru localized modes alongside de

a579e01

DATA_CONTRACT.md was only documenting modes/de/* among the localized mode directories. Adds the four other languages (fr, ja, pt, ru) that were already shipped in the repo. Closes santifer#338.

feat(dashboard): show tracker IDs in pipeline list

8d289c6

Surfaces the tracker ID in the dashboard pipeline list so users can reference rows by ID in DMs, issue reports, and discussions. Includes regression tests for the new column rendering.

chore(deps): bump @google/generative-ai from 0.21.0 to 0.24.1

931d692

Patch + bug fixes only in this version range, no API surface changes. Signed-off by dependabot.

docs(modes): translate batch.md from Spanish to English

0531616

Translates modes/batch.md from Spanish to English in place. Companion to santifer#403 (apply.md). Both files were the source of Spanish content leaking into report outputs on certain code paths.

chore(main): release 1.6.0 (santifer#375)

d725306

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ageem23 marked this pull request as ready for review April 30, 2026 18:28

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

ageem23 and others added 5 commits April 30, 2026 13:50

ageem23 and others added 10 commits May 17, 2026 14:24

Merge pull request #24 from ageem23/feature/scan-fuzzy-dedup

6115124

fix(scan): pre-evaluation fuzzy dedup against pipeline.md

Merge remote-tracking branch 'origin/main' into personal/dev-snapshot

b0bca19

# Conflicts: # scan.mjs

Merge pull request #25 from ageem23/feature/dedup-exact-fallback

521a981

fix(dedup): exact-normalized match before fuzzy path

Merge remote-tracking branch 'origin/main' into personal/dev-snapshot

b417f06

# Conflicts: # scan.mjs

Merge branch 'main' into personal/dev-snapshot

4d8b155

# Conflicts: # batch/batch-runner.sh # modes/contacto.md # scan.mjs # templates/portals.example.yml

ageem23 force-pushed the personal/dev-snapshot branch from 9e01163 to b111735 Compare May 20, 2026 18:29

ageem23 and others added 2 commits May 20, 2026 13:36

github-actions Bot added 🔧 scripts 📦 dependencies ⚠️ agent-behavior 🔴 core-architecture 📄 docs 🌐 i18n 📊 dashboard ⚙️ ci labels May 20, 2026

ageem23 and others added 9 commits May 20, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[review only — do not merge] Snapshot: dev state across work streams#1

[review only — do not merge] Snapshot: dev state across work streams#1
ageem23 wants to merge 206 commits into
feat/325-apify-plugin-architecturefrom
personal/dev-snapshot

ageem23 commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Review skipped

Uh oh!

ageem23 commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ageem23 commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ageem23 commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ageem23 commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading