Skip to content

Conversation

@bhashemian
Copy link
Member

@bhashemian bhashemian commented Nov 26, 2025

Summary by CodeRabbit

  • New Features

    • Support for downloading required datasets from Google Drive during build/configuration.
    • New toggle (enabled by default) to enable/disable automatic dataset downloads.
    • Improved downloader that handles Drive folders and files and falls back to alternative methods when needed.
  • Chores

    • Container/build now installs the Google Drive download tool to enable Drive-based downloads.

✏️ Tip: You can customize this high-level summary in your review settings.

@bhashemian bhashemian marked this pull request as ready for review November 26, 2025 17:47
@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a CMake option to conditionally download datasets during configuration, installs gdown in the application Dockerfile, and extends the download utility to support Google Drive URLs via gdown alongside existing https/ngc download flows.

Changes

Cohort / File(s) Summary
CMake dataset download option
applications/surgical_scene_recon/CMakeLists.txt
Adds HOLOHUB_DOWNLOAD_DATASETS option (default ON) and a conditional data-download block that calls holoscan_download_data to fetch six datasets (depth, gt_masks, images, images_right, masks, poses_bounds) into a shared DOWNLOAD_DIR using ALL.
Application Dockerfile
applications/surgical_scene_recon/Dockerfile
Installs the Python package gdown in the application-dependencies stage (after installing requirements) to enable Google Drive downloads.
Download utility
utilities/download_ngc_data
Adds Google Drive handling: detects drive.google.com URLs and uses gdown (supports --folder for folders and fuzzy matching for files), creates target directories, emits a warning that MD5 checks are unreliable for Drive, sets a stamp file on success, and integrates with existing logic (https uses wget or falls back to curl; non-https uses ngc). Minor formatting/spacing adjustments retained.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review gdown folder vs file branches in utilities/download_ngc_data for correct directory creation and target path handling.
  • Verify md5-warning messaging and that no silent skips leave corrupted or incomplete downloads.
  • Confirm fallback to curl for https and unchanged ngc behavior for non-https URLs.
  • Check the conditional CMake download block is correctly gated by HOLOHUB_DOWNLOAD_DATASETS and does not run when disabled.

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Download Datatsets Hosted on Google Drive' accurately summarizes the primary change in the pull request, which adds support for downloading datasets from Google Drive via gdown across multiple files (CMakeLists.txt, Dockerfile, and download utility script).

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f560be5 and 2361088.

📒 Files selected for processing (2)
  • applications/surgical_scene_recon/CMakeLists.txt (1 hunks)
  • applications/surgical_scene_recon/Dockerfile (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • applications/surgical_scene_recon/Dockerfile
  • applications/surgical_scene_recon/CMakeLists.txt

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utilities/download_ngc_data (1)

294-303: Bug: GXF entities generation runs before download completes.

This block at lines 294-303 attempts to convert video files in ${download_dir_fullname} before the download logic (starting at line 306) has executed. The directory likely won't exist or will be empty at this point, making this code ineffective.

This appears to be pre-existing code, but worth noting since the download logic was modified.

🧹 Nitpick comments (2)
applications/surgical_scene_recon/Dockerfile (1)

90-91: Add --no-cache-dir for consistency and smaller image size.

The previous pip install on line 88 uses --no-cache-dir, but this one doesn't. For consistency and to reduce image size:

 # Install gdown for downloading datasets from google drive
-RUN pip install gdown
+RUN pip install --no-cache-dir gdown
utilities/download_ngc_data (1)

383-402: Potential issue: $? may not capture the intended exit status.

run_command returns its own status, but if the shell modifies $? between run_command and the check (e.g., via command substitution or other side effects), this could be unreliable. Consider capturing the exit status immediately:

     if [[ ${url} == *"/folders/"* ]]; then
         # Download folder directly
         c_echo G "Downloading folder ${url} to ${download_dir_fullname}"
-        run_command gdown --folder ${url} -O ${download_dir_fullname} --remaining-ok
-
-        if [ $? -ne 0 ]; then
+        run_command gdown --folder "${url}" -O "${download_dir_fullname}" --remaining-ok
+        local status=$?
+        if [ $status -ne 0 ]; then
             fatal R "Unable to download ${url} via gdown."
         else
           c_echo G "Successfully downloaded folder ${url} to ${download_dir_fullname}"
         fi

     else
         # Download file directly
         run_command mkdir -p ${download_dir_fullname}
-        run_command gdown ${url} -O ${download_dir_fullname}/ --fuzzy
-        if [ $? -ne 0 ]; then
+        run_command gdown "${url}" -O "${download_dir_fullname}/" --fuzzy
+        local status=$?
+        if [ $status -ne 0 ]; then
             fatal R "Unable to download ${url} via gdown."
         else
           c_echo G "Successfully downloaded file ${url} to ${download_dir_fullname}"
         fi
     fi

Also added quotes around variables to handle paths with spaces correctly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19a457b and f560be5.

📒 Files selected for processing (3)
  • applications/surgical_scene_recon/CMakeLists.txt (1 hunks)
  • applications/surgical_scene_recon/Dockerfile (1 hunks)
  • utilities/download_ngc_data (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{applications,workflows}/**/CMakeLists.txt

📄 CodeRabbit inference engine (CONTRIBUTING.md)

CMakeLists.txt must include build system integration using add_holohub_application() for applications and workflows

Files:

  • applications/surgical_scene_recon/CMakeLists.txt
applications/**/CMakeLists.txt

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Applications should include a testing section in CMakeLists.txt for functional testing using CTest

Files:

  • applications/surgical_scene_recon/CMakeLists.txt
🧠 Learnings (4)
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to {applications,workflows}/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_application() for applications and workflows

Applied to files:

  • applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to operators/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_operator() for operators

Applied to files:

  • applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.280Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.280Z
Learning: Applies to gxf_extensions/**/CMakeLists.txt : CMakeLists.txt must include build system integration using add_holohub_extension() for GXF extensions

Applied to files:

  • applications/surgical_scene_recon/CMakeLists.txt
📚 Learning: 2025-11-24T16:28:06.281Z
Learnt from: CR
Repo: nvidia-holoscan/holohub PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T16:28:06.281Z
Learning: Applies to **CMakeLists.txt : Use CMake target dependencies: add DEPENDS EXTENSIONS for operators wrapping GXF extensions, add DEPENDS OPERATORS for applications/workflows

Applied to files:

  • applications/surgical_scene_recon/CMakeLists.txt
🔇 Additional comments (4)
utilities/download_ngc_data (2)

308-312: LGTM on gdown detection logic.

The detection of Google Drive URLs and the gdown availability check are correctly implemented with an appropriate error message guiding users to install gdown.


405-412: Good: MD5 warning and stamp file handling.

The warning about MD5 not being supported for Google Drive downloads is helpful, and the stamp file is correctly set after successful download to prevent re-downloading.

applications/surgical_scene_recon/CMakeLists.txt (2)

16-41: The review comment is incorrect. The add_holohub_application() integration is already present in the parent applications/CMakeLists.txt file.

The parent applications/CMakeLists.txt contains the line:

add_holohub_application(surgical_scene_recon)

This properly registers the application with the build system. The subdirectory CMakeLists.txt (lines 16-41) does not need to call add_holohub_application() itself; instead, it configures the subdirectory-specific build tasks (data downloads and tests). This pattern is consistent with other applications like video_streaming_server, which includes a comment: "This application is built by the parent add_holohub_application macro."


36-40: HOLOHUB_DATA_DIR is properly defined in the root CMakeLists.txt and will be available when this file is processed.

The variable is set at root CMakeLists.txt line 77 (set(HOLOHUB_DATA_DIR "${CMAKE_BINARY_DIR}/data" CACHE PATH "Data Download directory")), making it available in this subdirectory's scope. No action needed.

Note: The target name pulling is semantically meaningful—it directly corresponds to the dataset directory EndoNeRF/pulling used throughout the codebase (Python scripts, tests, documentation). The suggested rename to endonerf_dataset would be less specific and contradict established naming conventions.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 26, 2025

Greptile Overview

Greptile Summary

This PR adds support for downloading datasets from Google Drive during build configuration, enabling automatic dataset retrieval as an alternative to manual downloads.

Major Changes:

  • Added HOLOHUB_DOWNLOAD_DATASETS option (enabled by default) to control automatic dataset downloads in CMakeLists.txt
  • Enhanced utilities/download_ngc_data script to detect and handle Google Drive URLs using the gdown tool
  • Added gdown package installation to Dockerfile for Google Drive download support
  • Configured 6 dataset components (depth, gt_masks, images, images_right, masks, poses_bounds) to download from Google Drive folders/files

Issues Found:

  • The directory structure issue noted in previous comments remains valid - folder downloads may create nested subdirectories
  • Minor conditional logic issue in download_ngc_data script where if should be elif for proper control flow

Confidence Score: 3/5

  • This PR has good intent but contains a critical directory structure issue that will cause runtime failures
  • Score reflects one critical logical error (conditional should use elif) and one pre-existing directory structure issue flagged in previous comments. The Dockerfile changes are clean, but the download logic needs attention before merge
  • Pay close attention to utilities/download_ngc_data for the conditional logic fix and verify the Google Drive folder structure matches expected dataset layout

Important Files Changed

File Analysis

Filename Score Overview
applications/surgical_scene_recon/CMakeLists.txt 3/5 Adds Google Drive dataset download support with 6 holoscan_download_data calls; implementation has directory structure issue noted in previous comments
applications/surgical_scene_recon/Dockerfile 5/5 Adds gdown package installation for Google Drive downloads; clean change with no issues
utilities/download_ngc_data 4/5 Adds Google Drive download support via gdown with folder/file detection; has minor logic issue with conditional order

Sequence Diagram

sequenceDiagram
    participant CMake as CMakeLists.txt
    participant Script as download_ngc_data
    participant GDown as gdown CLI
    participant GDrive as Google Drive

    Note over CMake: Build time (if HOLOHUB_DOWNLOAD_DATASETS=ON)
    
    CMake->>CMake: Check option HOLOHUB_DOWNLOAD_DATASETS
    CMake->>Script: holoscan_download_data(depth, URL, DOWNLOAD_DIR)
    CMake->>Script: holoscan_download_data(gt_masks, URL, DOWNLOAD_DIR)
    CMake->>Script: holoscan_download_data(images, URL, DOWNLOAD_DIR)
    CMake->>Script: holoscan_download_data(images_right, URL, DOWNLOAD_DIR)
    CMake->>Script: holoscan_download_data(masks, URL, DOWNLOAD_DIR)
    CMake->>Script: holoscan_download_data(poses_bounds, URL, DOWNLOAD_DIR)
    
    Note over Script: For each download request
    
    Script->>Script: Check if .stamp file exists
    alt Stamp exists
        Script-->>CMake: Already downloaded, skip
    else No stamp
        Script->>Script: Detect URL contains "drive.google.com"
        Script->>Script: Set download_command=gdown
        Script->>Script: Check if gdown installed
        
        alt URL contains "/folders/"
            Script->>GDown: gdown --folder URL -O DIR --remaining-ok
            GDown->>GDrive: Download folder contents
            GDrive-->>GDown: Folder files
            GDown-->>Script: Files saved to DIR
        else URL is file
            Script->>GDown: gdown URL -O DIR/ --fuzzy
            GDown->>GDrive: Download file
            GDrive-->>GDown: File content
            GDown-->>Script: File saved to DIR
        end
        
        Script->>Script: Create .stamp file
        Script-->>CMake: Download complete
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@bhashemian bhashemian marked this pull request as draft November 26, 2025 17:52
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
Signed-off-by: B Hashemian <[email protected]>
@bhashemian bhashemian marked this pull request as ready for review November 26, 2025 18:01
@bhashemian bhashemian marked this pull request as draft November 26, 2025 18:02
@bhashemian
Copy link
Member Author

bhashemian commented Nov 26, 2025

gdown can't download folders with more than 50 files!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@bhashemian bhashemian requested a review from jjomier November 26, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants