Skip to content

fix(batch): reconcile pipeline.md inbox after batch runs#27

Closed
ageem23 wants to merge 3 commits into
mainfrom
feat/batch-pipeline-reconcile-pr
Closed

fix(batch): reconcile pipeline.md inbox after batch runs#27
ageem23 wants to merge 3 commits into
mainfrom
feat/batch-pipeline-reconcile-pr

Conversation

@ageem23
Copy link
Copy Markdown
Owner

@ageem23 ageem23 commented May 20, 2026

Problem

batch-runner.sh records every evaluated offer in batch/batch-state.tsv, but it never writes back to data/pipeline.md. Offers processed via batch mode therefore stay in the pipeline "Pendientes" inbox indefinitely.

On the next scan or /career-ops pipeline run those entries get re-surfaced and evaluated a second time — producing duplicate reports and duplicate tracker rows. For anyone running batch regularly, the inbox never drains.

Fix

New script reconcile-pipeline.mjs:

  • Reads batch-state.tsv; for every completed / skipped entry whose URL is still in pipeline.md "Pendientes", moves that line to "Procesadas" with its report link, score and PDF flag.
  • Idempotent — an already-moved entry (no longer in Pendientes) is a no-op; an entry already in "Procesadas" is dropped from "Pendientes" without a duplicate. Safe to run after every batch.
  • Conservativefailed entries stay in "Pendientes" (they were not evaluated); if a report file is missing on disk the entry is left in place rather than writing a dead link.
  • Score falls back to the report's **Score:** header when batch-state.tsv has no parsed score.

batch-runner.sh calls it from merge_tracker(), between the tracker merge and the integrity check. Also exposed as npm run reconcile for standalone/manual use.

Testing

  • test-all.mjs now covers the new script (syntax + graceful-run on empty data); full suite green (73 passed, 0 failed).
  • Functional test: fixture batch-state.tsv + pipeline.md → the matching entry moves to "Procesadas" with the correct report link/score/PDF; a second run reports "already in sync".
  • Verified end-to-end against a real 24-offer batch run.

Notes

  • No changes to user-layer data semantics; the script backs up pipeline.md to pipeline.md.pre-reconcile.bak before writing.
  • Handles both Pendientes/Procesadas and Pending/Processed section headers.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added automatic pipeline reconciliation that moves completed offers from pending to processed list.
    • New npm run reconcile command available for manual pipeline synchronization.
  • Documentation

    • Updated batch documentation with Pipeline Reconcile section explaining idempotent batch re-run behavior.
  • Tests

    • Added test case for reconciliation script execution.

Review Change Stack

batch-runner.sh records every evaluated offer in batch/batch-state.tsv
but never writes back to data/pipeline.md. Offers processed via batch
mode stay in the pipeline "Pendientes" inbox indefinitely -- the next
scan and the next `/career-ops pipeline` run re-surface them, producing
duplicate reports and tracker rows. For anyone running batch regularly
the inbox never drains.

Add reconcile-pipeline.mjs: for each completed/skipped entry in
batch-state.tsv whose URL is still in pipeline.md "Pendientes", move the
line to "Procesadas" with its report link, score and PDF flag. It is
idempotent -- an already-moved entry is a no-op -- so it is safe to run
after every batch.

batch-runner.sh now calls it from merge_tracker(), between the tracker
merge and the integrity check. Also exposed as `npm run reconcile` for
standalone use, and covered by test-all.mjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@ageem23 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 54 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b2785f11-9603-4231-b436-1ae58a98ff78

📥 Commits

Reviewing files that changed from the base of the PR and between bea31e1 and fe4843c.

📒 Files selected for processing (2)
  • batch/README.md
  • reconcile-pipeline.mjs
📝 Walkthrough

Walkthrough

This PR adds a pipeline reconciliation workflow that synchronizes data/pipeline.md with batch/batch-state.tsv. A new reconcile-pipeline.mjs CLI script processes batch state to move completed offers from the "Pendientes" list to "Procesadas" with report metadata, integrated into the batch runner and exposed as an npm script.

Changes

Pipeline Reconciliation Feature

Layer / File(s) Summary
Reconcile script implementation
reconcile-pipeline.mjs
Entry point and argument parsing compute filesystem paths; guards check for missing state/pipeline files. Batch-state.tsv is parsed into a map of processed URLs and report numbers filtered to completed/skipped rows. Report discovery lists .md files and extracts Score and PDF fields via regex. Pipeline.md sections are parsed to detect existing Procesadas entries and prevent duplicates. For each pending item, logic decides whether to move it (if in batch-state with report file), drop it (if URL already in Procesadas), or keep it (if no report found). Content is rebuilt by removing marked Pendientes lines and inserting moved entries into Procesadas. Statistics are computed and logged, then written to disk with a .pre-reconcile.bak backup unless --dry-run is set.
Batch pipeline integration and exposure
batch/batch-runner.sh, package.json
merge_tracker() invokes reconcile-pipeline.mjs after tracker merges and before pipeline verification, continuing on failure with a warning. An npm reconcile script is added to run the script directly.
Documentation and test coverage
batch/README.md, test-all.mjs
README describes the reconciliation step and notes it is idempotent and safe to run after every batch. Test suite adds an entry to verify reconcile-pipeline.mjs exits with code 0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ageem23/career-ops#23: Adds an orphan sweep in batch-runner.sh that updates stuck "processing" rows in batch-state.tsv to "completed", which directly improves the input data that this reconciler consumes.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: adding pipeline reconciliation after batch runs to prevent duplicates and keep pipeline.md in sync with processed offers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/batch-pipeline-reconcile-pr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@batch/README.md`:
- Line 74: Update the README line describing reconcile-pipeline.mjs to reflect
that the script reconciles both "completed" and "skipped" entries (not just
"completed") when report metadata exists: mention reconcile-pipeline.mjs (run as
npm run reconcile) moves every `completed` and `skipped` offer in
`batch-state.tsv` whose URL remains in pipeline "Pendientes" to "Procesadas"
with its report link and score, and note it is idempotent and safe to run after
every batch or manually.

In `@reconcile-pipeline.mjs`:
- Around line 42-48: User-supplied --pipeline/--state paths (used to read/write
files via PIPELINE_FILE and STATE_FILE) can perform path traversal or target
files outside the repository; validate and constrain them to the repo root
(CAREER_OPS) by resolving the provided argValue with path.resolve, then verify
the resolved path is inside CAREER_OPS (e.g., via path.relative or startsWith)
and reject or error if it escapes (absolute paths that are not under CAREER_OPS
or any path containing .. that resolves outside). Also ensure missing parent
directories are handled explicitly for writes (create or fail with a clear
error) and apply the same validation logic for other occurrences noted at lines
~262-263 where argValue is used.
- Around line 233-236: The added block always inserts the Spanish header '##
Procesadas' which mixes languages; change it to choose the processed-section
header based on the language already used in the document (inspect the existing
headers in the out array). Replace the hardcoded '## Procesadas' with a computed
header (e.g., processedHeader) determined by checking for English markers like
'## Pending' (use '## Processed') or Spanish markers like '## Pendientes' (use
'## Procesadas'); then push processedHeader, '' and ...movedProcLines instead of
the fixed string. Ensure you reference procStart, movedProcLines and out when
implementing this detection and insertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d990d6dd-b8c6-4b2c-b258-3b9c04a49b6a

📥 Commits

Reviewing files that changed from the base of the PR and between a289a39 and bea31e1.

📒 Files selected for processing (5)
  • batch/README.md
  • batch/batch-runner.sh
  • package.json
  • reconcile-pipeline.mjs
  • test-all.mjs

Comment thread batch/README.md Outdated
Comment thread reconcile-pipeline.mjs Outdated
Comment thread reconcile-pipeline.mjs
ageem23 and others added 2 commits May 20, 2026 13:35
- reconcile-pipeline.mjs: constrain user-supplied --state/--pipeline
  paths to the repository tree. A path that escapes via `..` or an
  absolute target outside the repo is now rejected before any read or
  write, closing a path-traversal vector.
- reconcile-pipeline.mjs: when creating a missing processed section,
  match the pending section's language -- `## Processed` for English
  pipelines, `## Procesadas` for Spanish -- instead of always writing
  the Spanish header.
- batch/README.md: note the script reconciles `skipped` entries too
  (not just `completed`), and that entries without a report file on
  disk are left in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty commit to fire a `pull_request` (synchronize) event so the
file-based workflows — Tests and Dependency Review — run now that
workflows are enabled on the fork.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@ageem23
Copy link
Copy Markdown
Owner Author

ageem23 commented May 20, 2026

Closing — this fork PR served as the pre-merge check stage (CodeRabbit, CodeQL, and the test suite all passed here). The fix is now submitted upstream as santifer#712, and any further review feedback will be handled there.

@ageem23 ageem23 closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant