Skip to content

fix(batch): reconcile pipeline.md inbox after batch runs#712

Open
ageem23 wants to merge 3 commits into
santifer:mainfrom
ageem23:feat/batch-pipeline-reconcile
Open

fix(batch): reconcile pipeline.md inbox after batch runs#712
ageem23 wants to merge 3 commits into
santifer:mainfrom
ageem23:feat/batch-pipeline-reconcile

Conversation

@ageem23
Copy link
Copy Markdown
Contributor

@ageem23 ageem23 commented May 20, 2026

Closes #711.

Problem

batch/batch-runner.sh records every evaluated offer in batch/batch-state.tsv, but it never writes back to data/pipeline.md. Offers processed via batch mode therefore stay in the pipeline "Pendientes" inbox indefinitely.

On the next scan or /career-ops pipeline run those entries get re-surfaced and evaluated a second time — producing duplicate reports and duplicate tracker rows. For anyone running batch regularly, the inbox never drains.

Fix

New script reconcile-pipeline.mjs:

  • Reads batch-state.tsv; for every completed / skipped entry whose URL is still in pipeline.md "Pendientes", moves that line to "Procesadas" with its report link, score and PDF flag.
  • Idempotent — an already-moved entry is a no-op; an entry already in "Procesadas" is dropped from "Pendientes" without a duplicate. Safe to run after every batch.
  • Conservativefailed entries stay in "Pendientes" (they were not evaluated); an entry whose report file is missing on disk is left in place rather than writing a dead link.
  • Path-safe — user-supplied --state / --pipeline arguments are constrained to the repository tree.
  • Handles both Pendientes/Procesadas and Pending/Processed section headers.

batch-runner.sh calls it from merge_tracker(), between the tracker merge and the integrity check. Also exposed as npm run reconcile for standalone/manual use.

Testing

  • test-all.mjs covers the new script (syntax + graceful-run on empty data); full suite green.
  • Functional test: fixture batch-state.tsv + pipeline.md → the matching entry moves to "Procesadas" with the correct report link/score/PDF; a second run reports "already in sync".
  • Verified end-to-end against a real 24-offer batch run.

Notes

  • No changes to user-layer data semantics; the script backs up pipeline.md to pipeline.md.pre-reconcile.bak before writing.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automated pipeline reconciliation moves completed/skipped offers into "Procesadas" with report links, resolved score and PDF status; runs after batch processing, supports dry-run and can be invoked manually (creates a backup before applying changes).
  • Documentation

    • Clarified batch post-processing steps and reconciliation behavior.
  • Tests

    • Reconciliation step added to the automated test harness.

Review Change Stack

batch-runner.sh records every evaluated offer in batch/batch-state.tsv
but never writes back to data/pipeline.md. Offers processed via batch
mode stay in the pipeline "Pendientes" inbox indefinitely -- the next
scan and the next `/career-ops pipeline` run re-surface them, producing
duplicate reports and tracker rows. For anyone running batch regularly
the inbox never drains.

Add reconcile-pipeline.mjs: for each completed/skipped entry in
batch-state.tsv whose URL is still in pipeline.md "Pendientes", move the
line to "Procesadas" with its report link, score and PDF flag. It is
idempotent -- an already-moved entry is a no-op -- so it is safe to run
after every batch. Conservative: failed entries and entries with no
report file on disk are left in place. User-supplied --state/--pipeline
paths are constrained to the repository tree.

batch-runner.sh now calls it from merge_tracker(), between the tracker
merge and the integrity check. Also exposed as `npm run reconcile` for
standalone use, and covered by test-all.mjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4c6c062c-8a36-4685-b34e-7264fd7c211a

📥 Commits

Reviewing files that changed from the base of the PR and between d1c6475 and 5af279b.

📒 Files selected for processing (1)
  • reconcile-pipeline.mjs

📝 Walkthrough

Walkthrough

This PR implements automatic pipeline reconciliation for batch mode. When batch-runner.sh completes evaluation, it now calls reconcile-pipeline.mjs to move completed/skipped offers from pipeline.md "Pendientes" to "Procesadas" with report links and scores, preventing duplicate evaluation on re-runs. The reconciliation is idempotent and safe to run standalone or after every batch.

Changes

Pipeline reconciliation for batch runs

Layer / File(s) Summary
Pipeline reconciliation core implementation
reconcile-pipeline.mjs
New CLI script that reads batch/batch-state.tsv (completed/skipped), discovers reports/*.md, extracts score and PDF flags, parses pipeline.md "Pendientes"/"Procesadas", moves matching pending entries into processed with report link/score/PDF flag, supports --dry-run, creates .pre-reconcile.bak, and exits early when no changes needed.
Batch runner integration and testing
batch/batch-runner.sh, package.json, test-all.mjs
batch-runner.sh invokes reconcile-pipeline.mjs after merge-tracker.mjs (logs a warning on failure), package.json adds an reconcile npm script, and test-all.mjs runs reconcile-pipeline.mjs as part of the script harness expecting exit code 0.
Documentation of reconciliation behavior
batch/README.md
README expands the end-of-run flow to include the reconcile call and adds a "Pipeline Reconcile" section describing how completed/skipped offers are moved from "Pendientes" to "Procesadas", handling of missing reports, and idempotency.

Sequence Diagram(s)

sequenceDiagram
  participant Script as reconcile-pipeline.mjs
  participant State as batch-state.tsv
  participant Reports as reports/
  participant Pipeline as pipeline.md
  Script->>State: parse completed/skipped entries
  Script->>Reports: find report files, extract Score/PDF
  Script->>Pipeline: parse Pendientes and Procesadas, collect URLs
  Script->>Pipeline: move matching Pendientes to Procesadas with report link, score, PDF flag
  Script->>Pipeline: write updated pipeline.md (create .pre-reconcile.bak)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

📄 docs, 🌐 i18n

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a reconciliation step for pipeline.md after batch runs, which directly addresses the core problem stated in the PR objectives.
Linked Issues check ✅ Passed All coding objectives from #711 are met: reconcile-pipeline.mjs moves completed/skipped entries from Pendientes to Procesadas with report links/scores/PDF flags, handles both Spanish and English headers, is idempotent and conservative, integrates into batch-runner.sh, and includes test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the reconciliation feature: documentation, the reconcile script, shell integration, npm script exposure, and test harness update—no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: db80901c-ee41-47e2-8734-a7b671d05073

📥 Commits

Reviewing files that changed from the base of the PR and between 82f0c2e and 332e6f8.

📒 Files selected for processing (5)
  • batch/README.md
  • batch/batch-runner.sh
  • package.json
  • reconcile-pipeline.mjs
  • test-all.mjs

Comment thread reconcile-pipeline.mjs
resolveInsideRepo() validated --state/--pipeline lexically only, so a
symlink inside the repo could still resolve to a target outside it.
Resolve the repo root and the target (or its parent, when the target
does not exist yet) with realpathSync before the boundary check.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ageem23 added a commit to ageem23/career-ops that referenced this pull request May 20, 2026
resolveInsideRepo() validated --state/--pipeline lexically only, so a
symlink inside the repo could still resolve to a target outside it.
Resolve the repo root and the target (or its parent, when the target
does not exist yet) with realpathSync before the boundary check.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@reconcile-pipeline.mjs`:
- Line 24: The path guard currently uses existsSync() but allows directories,
causing readFileSync()/copyFileSync() to later throw EISDIR; update
resolveInsideRepo() to also reject directory targets by using statSync/path or
fs.lstatSync to check isDirectory() and throw a clear error (or exit) when the
target is a directory. Ensure callers like the paths passed into readFileSync,
copyFileSync, and realpathSync (used when handling --state batch / --pipeline
data) receive only regular file paths and update the error message to indicate
the argument was a directory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 36dd5770-6848-460c-9b10-1a3835007a05

📥 Commits

Reviewing files that changed from the base of the PR and between 332e6f8 and d1c6475.

📒 Files selected for processing (1)
  • reconcile-pipeline.mjs

Comment thread reconcile-pipeline.mjs Outdated
A directory passed as --state/--pipeline (e.g. `--state batch`) cleared
existsSync() and the boundary check, then crashed later with an
unhandled EISDIR from readFileSync/copyFileSync. resolveInsideRepo() now
rejects directory targets up front with a clear message.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ageem23 added a commit to ageem23/career-ops that referenced this pull request May 20, 2026
A directory passed as --state/--pipeline (e.g. `--state batch`) cleared
existsSync() and the boundary check, then crashed later with an
unhandled EISDIR from readFileSync/copyFileSync. resolveInsideRepo() now
rejects directory targets up front with a clear message.

Addresses CodeRabbit review feedback on santifer#712.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

batch-runner.sh never reconciles data/pipeline.md — processed offers re-surface and get re-evaluated

1 participant