-
-
Notifications
You must be signed in to change notification settings - Fork 85
fix: manual scraping crashing without existing item #1239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR refactors database session handling in the scrape router by introducing FastAPI dependency injection for session management. It consolidates database operations across two methods to use a centralized injected session, enhances update_item with explicit filesystem and stream management, and renames parameters for clarity. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/routers/secure/scrape.py (1)
524-526: Drop the unusedselectimport.
selectis no longer referenced insideupdate_item, so this import will trip Ruff/F401—please remove it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/routers/secure/scrape.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/routers/secure/scrape.py (6)
src/program/db/db.py (1)
get_db(31-36)src/program/media/item.py (7)
MediaItem(22-711)reset(626-641)filesystem_entry(568-575)get_absolute_episode(836-859)store_state(176-197)store_state(809-812)store_state(889-892)src/program/services/downloaders/models.py (1)
DebridFile(70-130)src/program/managers/event_manager.py (1)
cancel_job(246-280)src/program/media/media_entry.py (2)
MediaEntry(8-208)create_virtual_entry(139-169)src/program/services/filesystem/vfs/rivenvfs.py (1)
add(702-741)
🪛 Ruff (0.14.2)
src/routers/secure/scrape.py
313-313: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
452-452: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
| if ( | ||
| season_number != item.parent.number | ||
| and episode_number != item.number | ||
| ): | ||
| continue | ||
| update_item(item, episode_data, session) | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the episode matching guard.
The guard uses and, so we still update the current episode whenever either the season or the episode number happens to match. That attaches the wrong file (e.g., same season, different episode). Use or so we skip as soon as either component diverges.
- if (
- season_number != item.parent.number
- and episode_number != item.number
- ):
+ if (
+ season_number != item.parent.number
+ or episode_number != item.number
+ ):
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ( | |
| season_number != item.parent.number | |
| and episode_number != item.number | |
| ): | |
| continue | |
| update_item(item, episode_data, session) | |
| break | |
| if ( | |
| season_number != item.parent.number | |
| or episode_number != item.number | |
| ): | |
| continue | |
| update_item(item, episode_data, session) | |
| break |
🤖 Prompt for AI Agents
In src/routers/secure/scrape.py around lines 592 to 598, the episode matching
guard currently uses AND which allows a match when either season or episode
number matches; change the condition to use OR so that if either the
season_number differs from item.parent.number or the episode_number differs from
item.number we continue (i.e., skip this item) and only call update_item when
both components match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/routers/secure/scrape.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/routers/secure/scrape.py (3)
src/program/db/db.py (1)
get_db(31-36)src/program/media/item.py (6)
reset(626-641)filesystem_entry(568-575)get_absolute_episode(836-859)store_state(176-197)store_state(809-812)store_state(889-892)src/program/media/media_entry.py (2)
MediaEntry(8-208)create_virtual_entry(139-169)
🪛 Ruff (0.14.2)
src/routers/secure/scrape.py
313-313: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
452-452: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
| fs_entry = MediaEntry.create_virtual_entry( | ||
| original_filename=data.filename, | ||
| download_url=getattr(data, "download_url", None), | ||
| provider=None, | ||
| provider_download_id=None, | ||
| file_size=(data.filesize or 0), | ||
| ) | ||
| db_session.add(fs_entry) | ||
| db_session.commit() | ||
| db_session.refresh(fs_entry) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep manual update atomic
Calling db_session.commit() inside update_item finalizes part of the transaction before the rest of the loop finishes. If any later step raises, the request returns 500 but prior commits have already persisted partial state. Flush instead so the primary key is available without breaking atomicity.
db_session.add(fs_entry)
- db_session.commit()
+ db_session.flush()
db_session.refresh(fs_entry)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fs_entry = MediaEntry.create_virtual_entry( | |
| original_filename=data.filename, | |
| download_url=getattr(data, "download_url", None), | |
| provider=None, | |
| provider_download_id=None, | |
| file_size=(data.filesize or 0), | |
| ) | |
| db_session.add(fs_entry) | |
| db_session.commit() | |
| db_session.refresh(fs_entry) | |
| fs_entry = MediaEntry.create_virtual_entry( | |
| original_filename=data.filename, | |
| download_url=getattr(data, "download_url", None), | |
| provider=None, | |
| provider_download_id=None, | |
| file_size=(data.filesize or 0), | |
| ) | |
| db_session.add(fs_entry) | |
| db_session.flush() | |
| db_session.refresh(fs_entry) |
🤖 Prompt for AI Agents
In src/routers/secure/scrape.py around lines 535 to 545, the code calls
db_session.commit() while inside update_item which finalizes part of the
transaction early; replace that commit with db_session.flush() so the new
fs_entry gets a primary key available for subsequent operations but the overall
transaction remains atomic (you can keep or call db_session.refresh(fs_entry)
after the flush if you need loaded DB defaults), and ensure the actual commit is
performed once at the end of the outer operation rather than here.
| if ( | ||
| season_number != item.parent.number | ||
| and episode_number != item.number | ||
| ): | ||
| continue | ||
| update_item(item, episode_data, session) | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix episode guard condition
With the current and check, any payload that matches the season but not the episode (or vice‑versa) will slip through and overwrite the wrong record. Switch to or (or invert the condition) so we only update when both numbers match, preventing cross‑episode data corruption.
- if (
- season_number != item.parent.number
- and episode_number != item.number
- ):
+ if (
+ season_number != item.parent.number
+ or episode_number != item.number
+ ):🤖 Prompt for AI Agents
In src/routers/secure/scrape.py around lines 592 to 598, the guard uses an "and"
so a mismatch on one field can still pass; change the condition to continue when
either the season or episode mismatches (i.e., use "or" between the two
inequality checks or invert the logic so you only proceed when both
season_number == item.parent.number AND episode_number == item.number) so
update_item is only called when both numbers match.
Gaisberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Gaisberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, unfinished.
Pull Request Check List
Resolves: #issue-number-here
Description:
Summary by CodeRabbit