Skip to content

[review only — do not merge] Snapshot: dev state across work streams#1

Open
ageem23 wants to merge 206 commits into
feat/325-apify-plugin-architecturefrom
personal/dev-snapshot
Open

[review only — do not merge] Snapshot: dev state across work streams#1
ageem23 wants to merge 206 commits into
feat/325-apify-plugin-architecturefrom
personal/dev-snapshot

Conversation

@ageem23
Copy link
Copy Markdown
Owner

@ageem23 ageem23 commented Apr 30, 2026

Internal review-only PR. Not for merge.

Goal: get CodeRabbit findings on the snapshot commit (semantic title-filter phase, LinkedIn provider, triage/scoring utilities, scan.mjs round-3 edits, small system touch-ups) before splitting into focused per-feature branches for upstream PRs.

Each work stream listed in the snapshot commit message will be split into its own branch from upstream/main once findings are addressed.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added LinkedIn job source provider with authentication, session management, and automated job scraping
    • Enhanced title filtering with AI-powered semantic analysis as fallback for non-matching titles
    • New CLI tools: filter pattern analyzer, LinkedIn configuration validator, and pipeline score updater
  • Documentation

    • Extended provider contract documentation with authentication lifecycle hooks and semantic options
    • Enhanced configuration templates with LinkedIn setup and semantic filtering guidance

BobbyWang0120 and others added 20 commits April 26, 2026 18:46
Updates input keys from kebab-case to snake_case so the welcome workflow can dispatch its messages again.

The dependabot bump to first-interaction@v3 (santifer#371) silently broke the action because v3 renamed repo-token, pr-message and issue-message to repo_token, pr_message and issue_message. Restores onboarding for first-time contributors.
Surfaces terminal pipeline states (REJECTED, DISCARDED) in the dashboard top navigation, reachable via left/right tab navigation. Includes regression test verifying each new tab isolates the matching status rows.
…nd current scripts

Adds modes/{fr,ja,pt,ru}/, modes/latex.md, GEMINI.md, generate-latex.mjs, scan.mjs, doctor.mjs, check-liveness.mjs, liveness-core.mjs, analyze-patterns.mjs, followup-cadence.mjs, gemini-eval.mjs, test-all.mjs, and .gemini/commands/ to the SYSTEM_PATHS array. The list had drifted from the actual repo as languages and scripts were added incrementally. Closes santifer#337.
Auto-detects tectonic on PATH and uses it as the preferred LaTeX engine, falling back to pdflatex when tectonic is not available. Tectonic is a much lighter LaTeX install (~150MB vs ~5GB for MiKTeX/MacTeX) and is now widely available via Homebrew, apt, and direct binaries. Closes santifer#394.
…tion

Adds cv.output_format setting to profile.example.yml (html default, latex opt-in) and updates auto-pipeline.md to branch between modes/pdf.md and modes/latex.md based on the config. Also moves canva_resume_design_id under the cv: block for path consistency. Closes santifer#396.
… is stale

Adds a fallback path so that when the local VERSION file lags behind the latest GitHub release tag, the updater can still detect updates correctly. Resolves the case where users on v1.5.0 were being told they were up-to-date despite v1.6.0 being available. Closes santifer#316.
DATA_CONTRACT.md was only documenting modes/de/* among the localized mode directories. Adds the four other languages (fr, ja, pt, ru) that were already shipped in the repo. Closes santifer#338.
Surfaces the tracker ID in the dashboard pipeline list so users can reference rows by ID in DMs, issue reports, and discussions. Includes regression tests for the new column rendering.
Patch + bug fixes only in this version range, no API surface changes. Signed-off by dependabot.
… overlap ratio in roleFuzzyMatch

Reduces false matches in roleFuzzyMatch by filtering seniority tokens (junior, senior, lead, staff, principal, etc.) and location tokens before computing overlap, plus enforcing a minimum overlap ratio. Closes santifer#329 (over-matching causing duplicate detection to flag unrelated roles as same).
Translates modes/batch.md from Spanish to English in place. Companion to santifer#403 (apply.md). Both files were the source of Spanish content leaking into report outputs on certain code paths.
… variants

Adds three regex patterns to HARD_EXPIRED_PATTERNS in liveness-core.mjs to catch closed postings that were silently slipping through liveness checks. Includes a regression test against a real Singapore mycareersfuture.gov.sg posting that exhibits the 'Applications have closed' format.
Translates modes/apply.md from Spanish to English in place. Several Discord users (most recently mavetor) reported reports coming back partially in Spanish on certain code paths — this resolves the inconsistency by aligning apply.md with the rest of the system documentation language.
…le template

Adds verified Canada/Vancouver companies plus an automation companies block to templates/portals.example.yml. Pure data addition — no code changes — that opens an angle the example template wasn't covering for non-US users.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Adds .claude-plugin/marketplace.json and .claude-plugin/plugin.json so career-ops can be installed via claude plugin marketplace add ./ && claude plugin install career-ops. Pointed at the existing .claude/skills/ path — zero breaking change. Path migration deferred to follow-up RFC. Co-authored with @vl4duu (original design from santifer#275).
Adds release-please-config.json with release-type: simple + extra-files configuration to sync package.json version via JSONPath. Also bumps VERSION 1.3.0 → 1.6.0 and package.json 1.0.0 → 1.6.0 to align with the v1.6.0 published release. Resolves the version drift between VERSION, package.json, .release-please-manifest.json, and CHANGELOG that broke update-system.mjs detection. Closes santifer#336, santifer#523.
…ani2112

Adds new "Community Voices" section in CONTRIBUTORS.md for members who
landed roles using career-ops and opted in publicly via the i-got-hired
issue template. First entry: @logumani2112 (Backend Developer .NET, 50
listings, 1 month timeline, A-F scoring as most useful feature).

Permission opt-in granted in santifer#440.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ration

Adds writing-samples/ user-layer folder + DATA_CONTRACT entry + modes/_shared.md Writing Style Calibration section that scans user samples to extract tone, sentence structure, punctuation habits, vocabulary, and voice signatures. Caches result in modes/_profile.md ## Writing Style section to skip re-scan on subsequent sessions.

Privacy-first: no verbatim copying, no PII retention, only abstract style descriptors. Opt-in via samples present (skips silently if none provided).

Real-world validated: author landed an interview using this feature on 30 Apr.

Closes santifer#380.
Single-commit snapshot of in-flight work for sharing with collaborators.
Not intended for upstream merge — each work stream below should be split
into its own focused PR after review (and, where the maintainer has flagged
it, after an RFC).

Work streams included:
- Plugin architecture round-3 (scan.mjs semantic phase, portals.example,
  CLAUDE.md) — paused pending RFC per santifer's feedback on PR santifer#454
- LinkedIn provider (providers/linkedin.mjs, test-linkedin-config.mjs) —
  needs RFC per same governance pattern as PR santifer#454
- Semantic title-filter phase (scan-semantic.mjs)
- Triage / scoring utilities (triage-pending.mjs, update-pipeline-scores.mjs,
  analyze-filter-patterns.mjs)
- Small touch-ups: batch/batch-runner.sh, dashboard/main.go, modes/contacto.md,
  package.json
- .gitignore additions for article-digest.md and .claude/settings.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2f76bc54-9449-4fef-8e72-93198349bafd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch personal/dev-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ageem23 ageem23 marked this pull request as ready for review April 30, 2026 18:28
@ageem23
Copy link
Copy Markdown
Owner Author

ageem23 commented Apr 30, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
batch/batch-runner.sh (1)

73-81: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard --model before reading $2.

With set -u, invoking the script as batch-runner.sh --model aborts on an unbound $2 instead of producing a usable error. Please validate that a value is present before assigning MODEL here (ideally with the same guard for the other value-taking flags in this parser).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/batch-runner.sh` around lines 73 - 81, The flag parser currently
assigns value-taking flags (e.g., --model -> MODEL) without verifying that a
following argument exists, which causes an unbound variable error under set -u;
update the case branches that consume a value (at least the --model branch, and
ideally --parallel, --start-from, --max-retries, --min-score) to check that a
value is present (for example test that $# -ge 2 or that "${2-}" is non-empty)
before assigning MODEL="$2" and shifting, and if the value is missing emit a
clear usage/error message and exit non-zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 23: Remove the overly-broad ignore entry "batch/*" from .gitignore
because it causes system files like batch/batch-runner.sh to be ignored,
prevents negation entries such as "!batch/logs/.gitkeep" and
"!batch/tracker-additions/.gitkeep" from re-including files, and makes the
earlier "batch/bat*" pattern redundant; keep the existing specific patterns (the
entries around "batch/bat*", "!batch/logs/.gitkeep", and
"!batch/tracker-additions/.gitkeep") and delete the "batch/*" line so system and
explicitly negated files remain tracked.

In `@analyze-filter-patterns.mjs`:
- Around line 36-43: The CLI numeric flags may produce NaN via parseInt
(including when a flag is present with no value); update the parsing around
argValue, sinceDays, minFreq, and topN to validate the returned value: after
calling argValue('--since','30') etc., attempt to parseInt and then check
Number.isFinite(result) and result > 0 (or other domain constraints), and if
invalid either fall back to the default or throw/exit with a clear usage error
message; also ensure argValue detects the “flag present but no next token” case
(i.e., args.indexOf(flag) returns last index) and treat that as invalid so you
log/exit rather than propagating NaN into downstream date math and threshold
logic.

In `@dashboard/main.go`:
- Around line 29-42: Remove the fixed "i < 6" cap and instead loop until the
filesystem root by using an infinite loop that updates parent :=
filepath.Dir(cur) and breaks when parent == cur; for each filepath.Join(cur,
"applications.md") and filepath.Join(cur, "data", "applications.md") call use
os.Stat and if err == nil return cur; if err != nil and !os.IsNotExist(err)
treat it as a real filesystem error (do not silently treat it as "not found") —
for example return start (preserving existing behavior) or propagate/log the
error — and only ignore errors where os.IsNotExist(err) is true so the search
continues up to the root.

In `@modes/contacto.md`:
- Around line 65-79: The "Name verification gate" block is written in English
inside a primarily Spanish mode and should be made language-consistent: either
translate the entire "Name verification gate (MANDATORY — do this BEFORE
presenting the message to the user):" section (including checks 1–4: Exact-match
check, No-substitution check, Right-person check, Present format) into Spanish
so it matches the rest of the contacto mode, or move this gate into a
language-specific file for English; ensure the translated terms and the example
salutation line (“Sending to: {Full Name} ({linkedin URL}) — addressing them as
"{salutation in message}"”) preserve the exact verification steps and the
instruction to STOP and ask for confirmation if any check fails.
- Around line 74-77: Update the "Present format" guidance so the "Sending to"
line is explicitly described as a pre-send verification header and not part of
the outbound LinkedIn message body or its 300‑character limit; specifically
modify the "Present format" block (the one that shows: **Sending to:** {Full
Name} ({linkedin URL}) — addressing them as "{salutation in message}") to state
it is metadata only, must be displayed above the message for verification, must
not be copied into the message body, and must not be counted against the
300‑char message limit.

In `@providers/linkedin.mjs`:
- Around line 224-236: The unwrapRedirect function currently accepts any URL
scheme when decoding the nested LinkedIn redirect; update the validation so that
after parsing the decoded value (the variable decoded) into a URL object you
explicitly check the URL's protocol and only accept 'http:' or 'https:'—if the
protocol is anything else, return the original trimmed href; ensure this check
happens inside the try block after creating the URL and before returning decoded
(and keep existing fallback returns intact).
- Around line 211-217: slugify currently can return an empty string for
non-Latin titles, causing saveJd to write to jds/.md and collide; update slugify
(or the caller saveJd) to fall back to a stable non-empty identifier when
slugify(text) === '' (e.g., compute a short stable hash of the original text or
use a deterministic UID derived from the title/company and timestamp) and use
that fallback for file names/URLs; additionally, respect language modes by
detecting German/French/Japanese postings and, if config/profile.yml has
language.modes_dir set (e.g., modes/de, modes/fr, modes/ja), load modes from
that directory instead of default modes/ to suggest appropriate language modes.

In `@scan-semantic.mjs`:
- Around line 89-93: The current fallback finds braceStart and braceEnd using
lastIndexOf which can include extra trailing braces; instead scan forward from
braceStart to find the matching closing brace by tracking a depth counter
(increment on '{', decrement on '}') and stop when depth returns to zero to get
the index of the first balanced object, then call JSON.parse on
text.slice(braceStart, matchingIndex + 1); update the code around
braceStart/braceEnd and the JSON.parse call to use this matchingIndex approach
so only the first balanced JSON object is parsed.
- Around line 130-131: hasSemanticBackend currently swallows errors from
resolveBackend (hiding bad CAREER_OPS_SEMANTIC_BACKEND overrides); change it to
let invalid override errors propagate (or return a structured Error) instead of
returning null—i.e., remove the empty catch that returns null and either rethrow
the caught error from hasSemanticBackend or return an object like { error } so
callers can log it before falling back; update references to
hasSemanticBackend/resolveBackend (and any callers expecting null) to handle the
propagated error or structured error accordingly.

In `@scan.mjs`:
- Around line 271-298: The code allows a bare "--login" to slip through because
loginProviderId can be undefined; update the login handling to detect when the
flag is present but no provider id was supplied (check loginFlag !== -1 &&
!loginProviderId) and immediately print a usage/error message and exit (set
process.exitCode = 1 and return or process.exit(1)). Place this check before
calling providers.get(loginProviderId) and before invoking provider.login(),
referencing the existing loginFlag and loginProviderId symbols so a missing
value fails fast instead of continuing into the normal scan path.

In `@templates/portals.example.yml`:
- Around line 30-35: The comment in the semantic phase block is misleading about
override precedence; update the text so it explicitly states the actual
resolution order used at runtime: first check the CAREER_OPS_SEMANTIC_BACKEND
environment variable, then check for ANTHROPIC_API_KEY, then look for the
`claude` CLI in PATH, and if none are available neutrals are rejected; mention
the possible values for CAREER_OPS_SEMANTIC_BACKEND (api|cli) and reference the
semantic phase/title filtering to make intent clear.
- Around line 119-123: The comment for the archetypes block incorrectly states
it is used only when ANTHROPIC_API_KEY is set; update the text to reflect that
the archetypes are consumed by both API and CLI semantic flows
(scan-semantic.mjs passes archetypes to both backends). Edit the archetypes
description to remove the "only" qualifier and explicitly note that these
canonical role-family descriptions are used by the semantic phase for both the
API and Claude CLI paths so users know the section is not ignored when using the
CLI.

In `@test-linkedin-config.mjs`:
- Around line 136-140: After calling loadConfig(), validate that
config?.tracked_companies is an array before using .map(): if config is missing
or tracked_companies is undefined treat it as an empty array, but if
tracked_companies exists and is not an Array, log an error and exit process with
code 2; update the code around the variables config, entries and linkedinEntries
(the block that sets const config = loadConfig(); const entries =
config?.tracked_companies || []; const linkedinEntries = ...) to perform an
explicit Array.isArray check and handle the non-array case as described.
- Around line 112-120: The KNOWN set advertises delay_pages and delay_searches
as supported but the validator in test-linkedin-config.mjs never checks their
shapes, which causes providers/linkedin.mjs to later assume a [min,max] tuple
and fail; update the validation loop to validate that entry.delay_pages and
entry.delay_searches (when present) are arrays of two numeric values with
min<=max (or else push a warning/error), or remove those keys from the KNOWN set
if you choose not to validate them; reference the KNOWN constant and the entry
object in test-linkedin-config.mjs and ensure compatibility with the tuple
expectation in providers/linkedin.mjs.

In `@triage-pending.mjs`:
- Around line 188-206: The location-scoring logic in the triage code hardcodes a
Chicago-first, US-remote preference (see the location variable, the regex checks
and the score/breakdown mutations) which should instead read per-user geo
preferences from profile config; update the scorer to load the profile's geo
preferences (e.g., include allowlist/denylist, remote-preference, and
preferred-US-hubs) from config/profile.yml or the profile API and replace the
inlined regex/weights with a small policy evaluator that applies those
configured rules to entry.location, then adjust score and breakdown accordingly
(keep the same score/breakdown API but derive +5/+3/-10 etc. from config values
rather than hardcoded literals and remove the fixed Chicago/IL assumptions).

In `@update-pipeline-scores.mjs`:
- Around line 13-24: The script currently calls fs.readFileSync(STATE) and
fs.readdirSync(REPORTS) unguarded and will crash on a clean checkout; modify the
logic around STATE/REPORTS so missing artifacts are treated as empty input:
check existence (fs.existsSync or try/catch) before reading and if STATE is
missing set stateLines = [] (so urlMap stays empty) and if REPORTS is missing
set reportFiles = [] (or skip report processing), then return/exit early or
no-op the update step so the pipeline is left untouched; update references to
stateLines, urlMap, and reportFiles accordingly to handle empty arrays.
- Around line 38-40: findUrl currently only matches http(s) URLs so
"local:jds/..." links get dropped when rewriting pipeline; update the findUrl
function to recognize and return local:jds/... URIs as well (e.g. match either
https?://\S+ or local:jds/\S+ using a single regex) so callers that rewrite
entries (the code paths around the existing findUrl and the logic in the 56-76
region) will preserve those local:jds links.

---

Outside diff comments:
In `@batch/batch-runner.sh`:
- Around line 73-81: The flag parser currently assigns value-taking flags (e.g.,
--model -> MODEL) without verifying that a following argument exists, which
causes an unbound variable error under set -u; update the case branches that
consume a value (at least the --model branch, and ideally --parallel,
--start-from, --max-retries, --min-score) to check that a value is present (for
example test that $# -ge 2 or that "${2-}" is non-empty) before assigning
MODEL="$2" and shifting, and if the value is missing emit a clear usage/error
message and exit non-zero.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1ce9e49a-3afd-4c87-af73-375e7d082103

📥 Commits

Reviewing files that changed from the base of the PR and between 7f77834 and 410961c.

📒 Files selected for processing (14)
  • .gitignore
  • CLAUDE.md
  • analyze-filter-patterns.mjs
  • batch/batch-runner.sh
  • dashboard/main.go
  • modes/contacto.md
  • package.json
  • providers/linkedin.mjs
  • scan-semantic.mjs
  • scan.mjs
  • templates/portals.example.yml
  • test-linkedin-config.mjs
  • triage-pending.mjs
  • update-pipeline-scores.mjs

Comment thread .gitignore Outdated
Comment thread analyze-filter-patterns.mjs Outdated
Comment thread dashboard/main.go Outdated
Comment thread modes/contacto.md Outdated
Comment thread modes/contacto.md Outdated
Comment thread test-linkedin-config.mjs
Comment thread test-linkedin-config.mjs
Comment thread triage-pending.mjs
Comment thread update-pipeline-scores.mjs Outdated
Comment thread update-pipeline-scores.mjs
ageem23 and others added 5 commits April 30, 2026 13:50
batch/* matched the whole batch/ directory, which:
- broke the !batch/logs/.gitkeep and !batch/tracker-additions/.gitkeep
  re-includes (gitignore can't un-ignore once the parent dir is ignored)
- ignored system files like batch/batch-runner.sh that ship with the repo
- was redundant with the more specific batch/logs/*, batch/bat*,
  batch/batch-state.tsv, batch/batch-input.tsv, and
  batch/tracker-additions/**/*.tsv rules already in the file

Removing it lets the .gitkeep negations work and keeps system files
trackable while the specific TSV/log rules continue to ignore state
artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… URLs

Two fixes for the pipeline rewriter:

1. Match local:jds/... in addition to http(s):// when extracting URLs from
   pipeline lines. Without this, every LinkedIn-provider entry (which uses
   the local: convention because the search URL isn't reachable post-session)
   was silently dropped from the regenerated pipeline.

2. Treat a missing batch-state.tsv or reports/ directory as "nothing to
   refresh" instead of throwing. A clean checkout or pre-first-batch run
   no longer crashes on script startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`node scan.mjs --login` (no provider id, or followed by another flag) used
to silently fall through into a normal scan because loginProviderId would
be undefined and the login branch's `if (loginProviderId)` check skipped.
That's a surprising side effect for what's clearly a setup/auth flag.

Now rejects bare --login or --login followed by another flag with a usage
hint and exit 1, before any provider work runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…end override

Two fixes for the semantic phase:

1. extractJson() walks the response one character at a time tracking brace
   depth (and JSON string state) instead of slicing to the last '}'. Models
   sometimes emit valid JSON followed by trailing commentary that contains
   stray braces; the previous implementation would treat the whole chunk as
   malformed and reject every neutral title for that batch.

2. hasSemanticBackend() now logs the resolveBackend() error to stderr before
   returning null when CAREER_OPS_SEMANTIC_BACKEND holds an invalid value.
   The behavior contract (return null when no backend is usable) is unchanged
   so existing callers still fall back, but a config typo no longer silently
   disables the semantic phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… schemes

Two fixes for the LinkedIn provider:

1. slugify() previously stripped to '' for titles/companies written entirely
   in non-Latin scripts (Japanese, Arabic, Cyrillic, etc.), causing every
   such posting to collide onto jds/.md / local:jds/.md. Falls back to a
   short SHA-1 prefix (jd-<10hex>) when the Latin slug is empty so each
   unique input still gets its own file.

2. unwrapRedirect() decoded LinkedIn /safety/go redirects with
   `new URL(decoded)` and returned the result without checking the scheme.
   javascript:, file:, data: etc. would silently land in _application_url
   and the JD frontmatter. Now drops anything outside http:/https: and
   returns ''.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ageem23 and others added 10 commits May 17, 2026 14:24
fix(scan): pre-evaluation fuzzy dedup against pipeline.md
The fuzzy match requires ≥2 overlapping content words after stopword
stripping. That caused byte-identical short roles to slip past it:

  "VP Engineering" → tokens: [] (vp <3 chars, engineering stopword)
  "CTIO AI Engineering Manager" → tokens: ["ctio"] (engineer + manager
  stopword'd, ai <3 chars)

Two identical strings tokenizing to ≤1 content word each would yield
overlap < 2 → no match. Real applications.md had 8 such duplicate
clusters that dedup-tracker.mjs refused to merge even though the
strings were literally equal.

Fix: layer an exact normalize-and-compare in front of the fuzzy path,
in both call sites:

  dedup-utils.roleMatch     — gains the layered fallback (used by
                              dedup-tracker.mjs post-hoc cleanup).
  scan.mjs.findCompanyRoleDup
                              — adds the same fallback at scan-time so
                              future scans dedup these cases before
                              evaluation. Seen entries now also cache
                              normRole alongside tokens and exactKey
                              so the check is a single map lookup, not
                              a re-tokenize.

Measured impact on the current applications.md (447 → 432 rows):
  - dedup-tracker.mjs now removes 15 duplicates (was 5).
  - verify-pipeline.mjs reports 0 warnings (was 11).
  - Cases newly merged: Mitchell Martin Lead AI Engineer, Medidata
    SVP of Engineering, LTV.ai Head of Engineering, OP Recruiting
    VP Engineering, Veeva Systems VP Engineering (3→1), UKG Careers
    Lead AI Engineer, Motion Recruitment VP of Engineering, PwC
    CTIO AI Engineering Manager, Visionary Tech Lead AI Architect.

The fuzzy path is unchanged for all longer/substantive titles —
exact match only fires when the normalized strings are equal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address CodeRabbit finding on PR #25: the exact-match fallback in
roleMatch relies on normalizeRole as the canonical key, but the
existing normalizer had two gaps that defeated the fallback:

1. Unsupported punctuation was stripped without inserting a space,
   so "VP,Engineering" collapsed to "vpengineering" — a single
   bogus token that fuzzy can't recover. Now any run of disallowed
   chars maps to a single space, then \s+ collapse normalizes.

2. Slash separators kept their surrounding whitespace, so
   "AI/ML Engineer" and "AI / ML Engineer" hashed differently.
   Now \s*\/\s* canonicalizes to a no-whitespace "/" before the
   final space collapse.

After the fix all three "VP Engineering" / "VP,Engineering" /
"VP, Engineering" produce the same key, and "AI/ML Engineer" /
"AI / ML Engineer" do too. Regression-tested against all the
existing PR #25 unit cases — no behavior change for the
already-passing fuzzy and exact-match scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(dedup): exact-normalized match before fuzzy path
# Conflicts:
#	dashboard/internal/ui/screens/pipeline.go
#	dashboard/internal/ui/screens/viewer.go
#	modes/contacto.md
#	modes/pipeline.md
#	providers/_http.mjs
#	providers/ashby.mjs
#	providers/greenhouse.mjs
#	providers/lever.mjs
#	scan.mjs
# Conflicts:
#	batch/batch-runner.sh
#	modes/contacto.md
#	scan.mjs
#	templates/portals.example.yml
Esc no longer quits since PR santifer#526 made it clear search instead. The
upstream help bar was updated to match; the local merge lost that edit
so the bar advertised a quit key that did nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
batch-runner.sh records every evaluated offer in batch/batch-state.tsv
but never writes back to data/pipeline.md. Offers processed via batch
mode stay in the pipeline "Pendientes" inbox indefinitely -- the next
scan and the next `/career-ops pipeline` run re-surface them, producing
duplicate reports and tracker rows. For anyone running batch regularly
the inbox never drains.

Add reconcile-pipeline.mjs: for each completed/skipped entry in
batch-state.tsv whose URL is still in pipeline.md "Pendientes", move the
line to "Procesadas" with its report link, score and PDF flag. It is
idempotent -- an already-moved entry is a no-op -- so it is safe to run
after every batch.

batch-runner.sh now calls it from merge_tracker(), between the tracker
merge and the integrity check. Also exposed as `npm run reconcile` for
standalone use, and covered by test-all.mjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ageem23 ageem23 force-pushed the personal/dev-snapshot branch from 9e01163 to b111735 Compare May 20, 2026 18:29
ageem23 and others added 2 commits May 20, 2026 13:36
- reconcile-pipeline.mjs: constrain user-supplied --state/--pipeline
  paths to the repository tree. A path that escapes via `..` or an
  absolute target outside the repo is now rejected before any read or
  write, closing a path-traversal vector.
- reconcile-pipeline.mjs: when creating a missing processed section,
  match the pending section's language -- `## Processed` for English
  pipelines, `## Procesadas` for Spanish -- instead of always writing
  the Spanish header.
- batch/README.md: note the script reconciles `skipped` entries too
  (not just `completed`), and that entries without a report file on
  disk are left in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveInsideRepo() validated --state/--pipeline lexically only, so a
symlink inside the repo could still resolve to a target outside it.
Resolve the repo root and the target (or its parent, when the target
does not exist yet) with realpathSync before the boundary check.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ageem23 and others added 9 commits May 20, 2026 16:16
A directory passed as --state/--pipeline (e.g. `--state batch`) cleared
existsSync() and the boundary check, then crashed later with an
unhandled EISDIR from readFileSync/copyFileSync. resolveInsideRepo() now
rejects directory targets up front with a clear message.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In prompt mode, contacto emitted the contacts table and add-task.mjs
block itself (MODE B Steps 5/6) -- but in prompt mode the LinkedIn
research has not run, so those came out as placeholders with no
verified names and no dates.

Move the add-task.mjs block into the research prompt as section 5, so
the web Claude (which does the research) produces it with the real
verified names. Each task line now carries a required --due date
computed from a new "Today's date" anchor in the prompt. Claude Code
resolves the App# once up front and bakes it into the prompt; it no
longer emits a table or task block itself in prompt mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add nightly-careerops.sh: a wrapper that runs the portal scan,
imports new pipeline.md offers into batch-input.tsv (URL-deduped,
capped at 30/night), and runs the batch evaluator. Intended to be
driven by a daily OS scheduler for unattended runs.

Add a candidate-agnostic Evaluation Early-Exit rule to the
_profile template: workers run Blocks A-B first and short-circuit
to an abbreviated SKIP report when an offer hits a clear
disqualifier, skipping the expensive Blocks C-F. All specifics are
read from config/profile.yml and cv.md, so the rule is portable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core.autocrlf=true on Windows would rewrite *.sh to CRLF on
checkout, which breaks the bash shebang under a scheduler and on
macOS/Linux. Pin *.sh to eol=lf so nightly-careerops.sh and
batch-runner.sh stay portable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 21:00 nightly was consistently leaving Pendientes overflow because of the
30-offer cap. Add a --no-scan flag so a second Task Scheduler entry
(career-ops-nightly-late at 03:00) can clear those leftovers without spending
the scan budget twice. The 03:00 timing lets Claude's 5-hour token bucket
reset by 08:00.

Log files are suffixed with -late under --no-scan so the two runs don't
clobber each other.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Auto-sleep 60s before scan.mjs on non-interactive runs (Task Scheduler
  after wake-from-sleep) so the VPN can settle before providers are hit.
  Configurable via --wait=SECS / --no-wait.
- Morning summary leads with a scan-error banner (count + first 5
  providers) so failures don't get buried in the verbose log.
- 03:00 late run now appends to the evening summary instead of
  overwriting it — both runs visible in tmp/nightly-latest-summary.txt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
applyStatusUpdate already cascades a thank-you task on Interview;
mirror that for terminal statuses by auto-completing every pending task
for the application when its status moves to Rejected or Discarded.
Stops stale follow-up reminders from surfacing in the dashboard once
the app is closed out.

Adds autoCompletePendingTasksForApp() and converts the cascade dispatch
to a switch on the normalized status. Bails early when app.Number <= 0,
consistent with the existing Interview path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two improvements to triangulate scan failures (DNS vs TCP vs TLS vs HTTP)
when the nightly runs fire after wake-from-sleep:

- nightly-careerops.sh: pre-flight probe to a handful of representative
  hosts before scan.mjs runs. Retries up to 3x with 20s backoff on
  transient failures; logs per-host transport success/fail so the morning
  log has a leading indicator of connectivity state. Opt out with
  --no-preflight. Probe uses GET (not HEAD) because some endpoints reject
  HEAD with no body.
- scan.mjs: error capture now walks err.cause and surfaces
  code/errno/syscall/host instead of collapsing every transport failure
  to "fetch failed". Distinguishes ENOTFOUND (DNS) from ECONNRESET (VPN
  drop / IP-reputation block) from TLS handshake errors at a glance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
autoCompletePendingTasksForApp called UpdateTaskStatus in a loop, and
each call does a full readTasksFile + writeTasksFile. For an app with N
pending tasks that was 1+N full reads and N full rewrites of tasks.md
(amplified to KxN in the bulk-status path), which made status changes
visibly slow — worse under the OneDrive-synced data dir.

Add data.CompletePendingTasksForApp: read tasks.md once, flip every
matching pending task to done in memory, write once (and skip the write
entirely when nothing matches). Rewrite autoCompletePendingTasksForApp
to call it, dropping the redundant ParseTasks read and the per-task
write loop. Behavior is unchanged; N reads + N writes -> 1 read + 1 write
per app.

Adds tasks_test.go covering the target-app-only flip, preserved
completion dates, untouched other-app tasks, and the no-match no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.