-
Notifications
You must be signed in to change notification settings - Fork 113
Download Datatsets Hosted on Google Drive #1262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: B Hashemian <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a CMake option to conditionally download datasets during configuration, installs gdown in the application Dockerfile, and extends the download utility to support Google Drive URLs via gdown alongside existing https/ngc download flows. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
utilities/download_ngc_data (1)
294-303: Bug: GXF entities generation runs before download completes.This block at lines 294-303 attempts to convert video files in
${download_dir_fullname}before the download logic (starting at line 306) has executed. The directory likely won't exist or will be empty at this point, making this code ineffective.This appears to be pre-existing code, but worth noting since the download logic was modified.
🧹 Nitpick comments (2)
applications/surgical_scene_recon/Dockerfile (1)
90-91: Add--no-cache-dirfor consistency and smaller image size.The previous
pip installon line 88 uses--no-cache-dir, but this one doesn't. For consistency and to reduce image size:# Install gdown for downloading datasets from google drive -RUN pip install gdown +RUN pip install --no-cache-dir gdownutilities/download_ngc_data (1)
383-402: Potential issue:$?may not capture the intended exit status.
run_commandreturns its own status, but if the shell modifies$?betweenrun_commandand the check (e.g., via command substitution or other side effects), this could be unreliable. Consider capturing the exit status immediately:if [[ ${url} == *"/folders/"* ]]; then # Download folder directly c_echo G "Downloading folder ${url} to ${download_dir_fullname}" - run_command gdown --folder ${url} -O ${download_dir_fullname} --remaining-ok - - if [ $? -ne 0 ]; then + run_command gdown --folder "${url}" -O "${download_dir_fullname}" --remaining-ok + local status=$? + if [ $status -ne 0 ]; then fatal R "Unable to download ${url} via gdown." else c_echo G "Successfully downloaded folder ${url} to ${download_dir_fullname}" fi else # Download file directly run_command mkdir -p ${download_dir_fullname} - run_command gdown ${url} -O ${download_dir_fullname}/ --fuzzy - if [ $? -ne 0 ]; then + run_command gdown "${url}" -O "${download_dir_fullname}/" --fuzzy + local status=$? + if [ $status -ne 0 ]; then fatal R "Unable to download ${url} via gdown." else c_echo G "Successfully downloaded file ${url} to ${download_dir_fullname}" fi fiAlso added quotes around variables to handle paths with spaces correctly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
applications/surgical_scene_recon/CMakeLists.txt(1 hunks)applications/surgical_scene_recon/Dockerfile(1 hunks)utilities/download_ngc_data(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{applications,workflows}/**/CMakeLists.txt
📄 CodeRabbit inference engine (CONTRIBUTING.md)
CMakeLists.txt must include build system integration using add_holohub_application() for applications and workflows
Files:
applications/surgical_scene_recon/CMakeLists.txt
applications/**/CMakeLists.txt
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Applications should include a testing section in CMakeLists.txt for functional testing using CTest
Files:
applications/surgical_scene_recon/CMakeLists.txt
🧠 Learnings (4)
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to {applications,workflows}/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_application() for applications and workflows
Applied to files:
applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to operators/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_operator() for operators
Applied to files:
applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to gxf_extensions/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_extension() for GXF extensions
Applied to files:
applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.281Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.281Z
Learning: Applies to **CMakeLists.txt : Use CMake target dependencies: add DEPENDS EXTENSIONS for operators wrapping GXF extensions, add DEPENDS OPERATORS for applications/workflows
Applied to files:
applications/surgical_scene_recon/CMakeLists.txt
🔇 Additional comments (4)
utilities/download_ngc_data (2)
308-312: LGTM on gdown detection logic.The detection of Google Drive URLs and the gdown availability check are correctly implemented with an appropriate error message guiding users to install gdown.
405-412: Good: MD5 warning and stamp file handling.The warning about MD5 not being supported for Google Drive downloads is helpful, and the stamp file is correctly set after successful download to prevent re-downloading.
applications/surgical_scene_recon/CMakeLists.txt (2)
16-41: The review comment is incorrect. Theadd_holohub_application()integration is already present in the parentapplications/CMakeLists.txtfile.The parent
applications/CMakeLists.txtcontains the line:add_holohub_application(surgical_scene_recon)This properly registers the application with the build system. The subdirectory
CMakeLists.txt(lines 16-41) does not need to calladd_holohub_application()itself; instead, it configures the subdirectory-specific build tasks (data downloads and tests). This pattern is consistent with other applications likevideo_streaming_server, which includes a comment: "This application is built by the parentadd_holohub_applicationmacro."
36-40:HOLOHUB_DATA_DIRis properly defined in the root CMakeLists.txt and will be available when this file is processed.The variable is set at root CMakeLists.txt line 77 (
set(HOLOHUB_DATA_DIR "${CMAKE_BINARY_DIR}/data" CACHE PATH "Data Download directory")), making it available in this subdirectory's scope. No action needed.Note: The target name
pullingis semantically meaningful—it directly corresponds to the dataset directoryEndoNeRF/pullingused throughout the codebase (Python scripts, tests, documentation). The suggested rename toendonerf_datasetwould be less specific and contradict established naming conventions.
Greptile OverviewGreptile SummaryThis PR adds support for downloading datasets from Google Drive during build configuration, enabling automatic dataset retrieval as an alternative to manual downloads. Major Changes:
Issues Found:
Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant CMake as CMakeLists.txt
participant Script as download_ngc_data
participant GDown as gdown CLI
participant GDrive as Google Drive
Note over CMake: Build time (if HOLOHUB_DOWNLOAD_DATASETS=ON)
CMake->>CMake: Check option HOLOHUB_DOWNLOAD_DATASETS
CMake->>Script: holoscan_download_data(depth, URL, DOWNLOAD_DIR)
CMake->>Script: holoscan_download_data(gt_masks, URL, DOWNLOAD_DIR)
CMake->>Script: holoscan_download_data(images, URL, DOWNLOAD_DIR)
CMake->>Script: holoscan_download_data(images_right, URL, DOWNLOAD_DIR)
CMake->>Script: holoscan_download_data(masks, URL, DOWNLOAD_DIR)
CMake->>Script: holoscan_download_data(poses_bounds, URL, DOWNLOAD_DIR)
Note over Script: For each download request
Script->>Script: Check if .stamp file exists
alt Stamp exists
Script-->>CMake: Already downloaded, skip
else No stamp
Script->>Script: Detect URL contains "drive.google.com"
Script->>Script: Set download_command=gdown
Script->>Script: Check if gdown installed
alt URL contains "/folders/"
Script->>GDown: gdown --folder URL -O DIR --remaining-ok
GDown->>GDrive: Download folder contents
GDrive-->>GDown: Folder files
GDown-->>Script: Files saved to DIR
else URL is file
Script->>GDown: gdown URL -O DIR/ --fuzzy
GDown->>GDrive: Download file
GDrive-->>GDown: File content
GDown-->>Script: File saved to DIR
end
Script->>Script: Create .stamp file
Script-->>CMake: Download complete
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.