fix: manual scraping crashing without existing item #1239

davidemarcoli · 2025-10-31T16:33:39Z

Pull Request Check List

Resolves: #issue-number-here

Added tests for changed code.
Updated documentation for changed code.

Description:

Summary by CodeRabbit

Refactor
- Enhanced internal data session management to improve consistency and reliability across database operations.

coderabbitai · 2025-10-31T16:34:04Z

Walkthrough

This PR refactors database session handling in the scrape router by introducing FastAPI dependency injection for session management. It consolidates database operations across two methods to use a centralized injected session, enhances update_item with explicit filesystem and stream management, and renames parameters for clarity.

Changes

Cohort / File(s)	Summary
Database Session Dependency Injection & Refactoring `src/routers/secure/scrape.py`	Added FastAPI-injected `db_session` dependency to `start_manual_session` for persisting items. Renamed `session` parameter to `db_session` in `manual_update_attributes` and refactored all DB operations to use the injected session consistently. Added explicit `item.merge()` at start of update path. Enhanced `update_item` to explicitly manage filesystem entries and item streams with provided `db_session`, adding/persisting staging entries as needed. Consolidated end-of-update persistence while preserving post-processing hooks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Verify dependency injection contracts: Ensure the new db_session parameter doesn't inadvertently break existing callers or break session lifecycle expectations.
Validate merge & commit placement: Confirm that item.merge() and db_session.commit() calls occur at the correct points in the update flow to avoid premature flushes or missed persistence.
Inspect filesystem and stream management: Review the new logic for adding and persisting staging entries to ensure it correctly integrates with the existing database operations.
Test post-processing hooks: Verify that moving persistence logic doesn't cause post-processing hooks to execute at unexpected times or lose necessary context.

Poem

🐰 Hops of joy in the refactored code,
Sessions flow down dependency's road,
Merged with care, persisted with grace,
Database streams find their rightful place! 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix: manual scraping crashing without existing item" directly describes the main objective of this changeset. The raw summary shows that the changes focus on adding a dependency-injected database session to manual scraping functions (`start_manual_session` and `manual_update_attributes`) and refactoring database operations to properly handle cases where `item_id` is not provided. The title accurately captures the problem being solved—a crash condition in manual scraping when no existing item is present—and the summary of changes confirms this is the central fix being implemented. The title is concise, specific, and uses clear language without unnecessary noise.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/manual-scrape-without-exsting-item

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/routers/secure/scrape.py (1)

524-526: Drop the unused select import.

select is no longer referenced inside update_item, so this import will trip Ruff/F401—please remove it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7513f4a and 218ac4b.

📒 Files selected for processing (1)

src/routers/secure/scrape.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/routers/secure/scrape.py (6)

src/program/db/db.py (1)

get_db (31-36)

src/program/media/item.py (7)

MediaItem (22-711)

reset (626-641)

filesystem_entry (568-575)

get_absolute_episode (836-859)

store_state (176-197)

store_state (809-812)

store_state (889-892)

src/program/services/downloaders/models.py (1)

DebridFile (70-130)

src/program/managers/event_manager.py (1)

cancel_job (246-280)

src/program/media/media_entry.py (2)

MediaEntry (8-208)

create_virtual_entry (139-169)

src/program/services/filesystem/vfs/rivenvfs.py (1)

add (702-741)

🪛 Ruff (0.14.2)

src/routers/secure/scrape.py

313-313: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

452-452: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

coderabbitai · 2025-10-31T17:10:13Z

src/routers/secure/scrape.py

+                    if (
+                        season_number != item.parent.number
+                        and episode_number != item.number
+                    ):
+                        continue
+                    update_item(item, episode_data, session)
+                    break


⚠️ Potential issue | 🟠 Major

Fix the episode matching guard.

The guard uses and, so we still update the current episode whenever either the season or the episode number happens to match. That attaches the wrong file (e.g., same season, different episode). Use or so we skip as soon as either component diverges.

- if ( - season_number != item.parent.number - and episode_number != item.number - ): + if ( + season_number != item.parent.number + or episode_number != item.number + ): continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (

season_number != item.parent.number

and episode_number != item.number

):

continue

update_item(item, episode_data, session)

break

if (

season_number != item.parent.number

or episode_number != item.number

):

continue

update_item(item, episode_data, session)

break

🤖 Prompt for AI Agents

In src/routers/secure/scrape.py around lines 592 to 598, the episode matching guard currently uses AND which allows a match when either season or episode number matches; change the condition to use OR so that if either the season_number differs from item.parent.number or the episode_number differs from item.number we continue (i.e., skip this item) and only call update_item when both components match.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7513f4a and 218ac4b.

📒 Files selected for processing (1)

src/routers/secure/scrape.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/routers/secure/scrape.py (3)

src/program/db/db.py (1)

get_db (31-36)

src/program/media/item.py (6)

reset (626-641)

filesystem_entry (568-575)

get_absolute_episode (836-859)

store_state (176-197)

store_state (809-812)

store_state (889-892)

src/program/media/media_entry.py (2)

MediaEntry (8-208)

create_virtual_entry (139-169)

🪛 Ruff (0.14.2)

src/routers/secure/scrape.py

313-313: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

452-452: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

coderabbitai · 2025-10-31T17:19:14Z

src/routers/secure/scrape.py

+            fs_entry = MediaEntry.create_virtual_entry(
+                original_filename=data.filename,
+                download_url=getattr(data, "download_url", None),
+                provider=None,
+                provider_download_id=None,
+                file_size=(data.filesize or 0),
+            )
+            db_session.add(fs_entry)
+            db_session.commit()
+            db_session.refresh(fs_entry)
+


⚠️ Potential issue | 🟠 Major

Keep manual update atomic

Calling db_session.commit() inside update_item finalizes part of the transaction before the rest of the loop finishes. If any later step raises, the request returns 500 but prior commits have already persisted partial state. Flush instead so the primary key is available without breaking atomicity.

db_session.add(fs_entry) - db_session.commit() + db_session.flush() db_session.refresh(fs_entry)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fs_entry = MediaEntry.create_virtual_entry(

original_filename=data.filename,

download_url=getattr(data, "download_url", None),

provider=None,

provider_download_id=None,

file_size=(data.filesize or 0),

)

db_session.add(fs_entry)

db_session.commit()

db_session.refresh(fs_entry)

fs_entry = MediaEntry.create_virtual_entry(

original_filename=data.filename,

download_url=getattr(data, "download_url", None),

provider=None,

provider_download_id=None,

file_size=(data.filesize or 0),

)

db_session.add(fs_entry)

db_session.flush()

db_session.refresh(fs_entry)

🤖 Prompt for AI Agents

In src/routers/secure/scrape.py around lines 535 to 545, the code calls db_session.commit() while inside update_item which finalizes part of the transaction early; replace that commit with db_session.flush() so the new fs_entry gets a primary key available for subsequent operations but the overall transaction remains atomic (you can keep or call db_session.refresh(fs_entry) after the flush if you need loaded DB defaults), and ensure the actual commit is performed once at the end of the outer operation rather than here.

coderabbitai · 2025-10-31T17:19:14Z

src/routers/secure/scrape.py

+                    if (
+                        season_number != item.parent.number
+                        and episode_number != item.number
+                    ):
+                        continue
+                    update_item(item, episode_data, session)
+                    break


⚠️ Potential issue | 🔴 Critical

Fix episode guard condition

With the current and check, any payload that matches the season but not the episode (or vice‑versa) will slip through and overwrite the wrong record. Switch to or (or invert the condition) so we only update when both numbers match, preventing cross‑episode data corruption.

- if ( - season_number != item.parent.number - and episode_number != item.number - ): + if ( + season_number != item.parent.number + or episode_number != item.number + ):

🤖 Prompt for AI Agents

In src/routers/secure/scrape.py around lines 592 to 598, the guard uses an "and" so a mismatch on one field can still pass; change the condition to continue when either the season or episode mismatches (i.e., use "or" between the two inequality checks or invert the logic so you only proceed when both season_number == item.parent.number AND episode_number == item.number) so update_item is only called when both numbers match.

Gaisberg

LGTM

Gaisberg

Nvm, unfinished.

fix: manual scraping crashing without existing item

218ac4b

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

Gaisberg approved these changes Nov 1, 2025

View reviewed changes

Gaisberg requested changes Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: manual scraping crashing without existing item #1239

fix: manual scraping crashing without existing item #1239

Uh oh!

davidemarcoli commented Oct 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

Gaisberg left a comment

Uh oh!

Gaisberg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: manual scraping crashing without existing item #1239

Are you sure you want to change the base?

fix: manual scraping crashing without existing item #1239

Uh oh!

Conversation

davidemarcoli commented Oct 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Check List

Description:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Gaisberg left a comment

Choose a reason for hiding this comment

Uh oh!

Gaisberg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidemarcoli commented Oct 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 31, 2025 •

edited

Loading