-
Notifications
You must be signed in to change notification settings - Fork 484
changes to the parsing controller #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe logic for reparsing projects in the parsing controller was refined to consider more specific project states and timing. The code now checks if a project should be reparsed based on commit status, error state, or prolonged non-ready status. Status responses after parsing are now fetched dynamically from the database. Unused variables were removed, and time utilities were imported. Changes
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
app/modules/parsing/graph_construction/parsing_controller.py (1)
294-302: Code duplication: Extract status fetching logic into a helper method.The status fetching logic is duplicated between handling existing and new projects. Consider extracting this into a reusable helper method.
Extract the common logic into a helper method:
@staticmethod async def get_current_project_status( project_manager: ProjectService, project_id: str ) -> str: """Get the current status of a project from the database.""" updated_project = await project_manager.get_project_from_db_by_id(project_id) return ( updated_project["status"] if updated_project else ProjectStatusEnum.SUBMITTED.value )Then replace both occurrences with:
- updated_project = await project_manager.get_project_from_db_by_id( - project_id - ) - current_status = ( - updated_project["status"] - if updated_project - else ProjectStatusEnum.SUBMITTED.value - ) + current_status = await ParsingController.get_current_project_status( + project_manager, project_id + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/modules/parsing/graph_construction/parsing_controller.py(6 hunks)
🔇 Additional comments (4)
app/modules/parsing/graph_construction/parsing_controller.py (4)
5-5: LGTM: Import addition is necessary for new functionality.The added imports for
datetimeandtimezoneare required for the new time-based reparse logic implemented below.
221-229: Good improvement: Dynamic status fetching enhances accuracy.Fetching the actual project status from the database instead of always returning
SUBMITTEDprovides more accurate status information to clients. The fallback logic is well-implemented.
314-317: Consistent improvement: Dynamic status handling for new projects.The change to fetch actual status instead of always returning
SUBMITTEDis consistent with the changes made for existing projects and improves accuracy.
162-202:❓ Verification inconclusive
Potential timezone compatibility issue in datetime comparison.
The enhanced reparse logic is well-structured and addresses important use cases. However, there's a potential issue with timezone handling in the datetime comparison.
The comparison between
current_time(timezone-aware) andproject_updated_atcould fail if the database field is stored as a naive datetime:current_time = datetime.now(timezone.utc) # timezone-aware project_updated_at = project.updated_at # potentially naive time_diff = (current_time - project_updated_at).total_seconds() / 3600 # TypeError if timezone mismatchRun this script to verify the timezone handling of the
updated_atfield:If
project.updated_atis naive, apply this fix:- current_time = datetime.now(timezone.utc) - project_updated_at = project.updated_at + current_time = datetime.now(timezone.utc) + project_updated_at = project.updated_at.replace(tzinfo=timezone.utc) if project.updated_at.tzinfo is None else project.updated_at
🏁 Script executed:
#!/bin/bash # Description: Check how the updated_at field is defined in the Project model and verify timezone handling # Search for the Project model definition ast-grep --pattern 'class Project($$$) { $$$ updated_at = $$$ $$$ }' # Search for any timezone-related configurations or datetime field definitions rg -A 3 -B 3 "updated_at.*DateTime" # Look for timezone configurations in the project rg -i "timezone|utc" --type pyLength of output: 9250
🏁 Script executed:
#!/bin/bash # Locate the Project model file and inspect its updated_at definition # Find the file containing the Project model project_file=$(rg -l --type py "class Project" | head -n 1) echo "Project model file: $project_file" # Show lines where updated_at is defined in that file grep -n "updated_at" "$project_file"Length of output: 172
🏁 Script executed:
#!/bin/bash # Search for the Project model by __tablename__ and class name rg -n "__tablename__.*projects" --type py rg -n "class .*Project" --type pyLength of output: 79
Verify timezone-awareness of
project.updated_atI wasn’t able to locate the
Projectmodel definition—please confirm whether itsupdated_atcolumn is stored as a timezone-aware timestamp. If it’s defined likeColumn(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)(naive), you’ll get a
TypeErrorwhen doing:current_time = datetime.now(timezone.utc) # tz-aware project_updated_at = project.updated_at # tz-naive time_diff = (current_time - project_updated_at).total_seconds() / 3600To guard against that, wrap the field in UTC if it’s naive:
current_time = datetime.now(timezone.utc) -project_updated_at = project.updated_at +project_updated_at = ( + project.updated_at.replace(tzinfo=timezone.utc) + if project.updated_at.tzinfo is None + else project.updated_at +) time_diff = (current_time - project_updated_at).total_seconds() / 3600Please verify your model’s
updated_atand apply this fix if needed to avoid timezone mismatches.



Summary by CodeRabbit
Bug Fixes
Chores