Skip to content

Conversation

nickmccarty
Copy link

@nickmccarty nickmccarty commented Jun 13, 2025

Pushed changes to pass doi checks, please advise if anything else is required...

Copy link

github-actions bot commented Jun 13, 2025

Curvenote Preview

Directory Preview Checks Updated (UTC)
papers/nicholas_mccarty 🔍 Inspect 41 checks passed (6 optional) Aug 6, 2025, 1:07 AM

@rowanc1 rowanc1 added paper This indicates that the PR in question is a paper draft This triggers Curvenote Preview actions and removed draft This triggers Curvenote Preview actions labels Jun 14, 2025
@ameyxd
Copy link
Contributor

ameyxd commented Jun 23, 2025

Inviting reviewers: @[email protected] and @[email protected]

@nickmccarty
Copy link
Author

Is the accompanying poster to be submitted using MyST/Curvenote as well, @scipy-conference/2025-proceedings? If so, are you able to point me toward any guidance on that front?

@nickmccarty

This comment was marked as resolved.

@ameyxd

This comment was marked as resolved.

@rowanc1
Copy link
Contributor

rowanc1 commented Jul 1, 2025

Please open another PR with your poster and follow the instructions in the readme!

@nickmccarty
Copy link
Author

Please open another PR with your poster and follow the instructions in the readme!

Will do, thanks a bunch!

@ameyxd
Copy link
Contributor

ameyxd commented Jul 2, 2025

@anacomesana will serve as editor for this paper.

@naomity
Copy link

naomity commented Jul 3, 2025

Hi Nick, it's my honor to be assigned to review this paper. The paper is overall well-structured, will be taking a closer look soon!

Copy link

@naomity naomity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive dataset and application domain! Paper is well-written, just a few minor edits.

# Ensure your title is the same as in your `main.md`
title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)
# subtitle:
description: This article presents a workflow that utilizes SAM's automatic mask generation skill to effectively perform the task of object detection zero-shot on a high-resolution drone orthomosaic. The generated output is 20% more spatially accurate than that produced using proprietary software, with 400% greater IoU.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20% more spatially accurate, with 400% greater IoU. I assume this is comparing with vanilla SAM?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, which indicates that I need to add clarity on this front. We benchmarked our results against the output produced using proprietary software; our output is more spatially accurate (the centers of our detected objects are closer to the QC points) and our polygons cover the actual objects better (our generated mask polygons have 400% greater IoU than the bounding boxes generated using the proprietary software). Will add this to my list of edits.

# Ensure that this title is the same as the one in `myst.yml`
title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)
abstract: |
Accurate and efficient object detection and spatial localization in remote sensing imagery is a persistent challenge. In the context of precision agriculture, the extensive data annotation required by conventional deep learning models poses additional challenges. This paper presents a fully open source workflow leveraging Meta AI's Segment Anything Model (SAM) for zero-shot segmentation, enabling scalable object detection and spatial localization in high-resolution drone orthomosaics without the need for annotated image datasets. Model training and/or fine-tuning is rendered unnecessary in our precision agriculture-focused use case. The presented end-to-end workflow takes high-resolution images and quality control (QC) check points as inputs, automatically generates masks corresponding to the objects of interest (empty plant pots, in our given context), and outputs their spatial locations in real-world coordinates. Detection accuracy (required in the given context to be within 3 cm) is then quantitatively evaluated using the ground truth QC check points and benchmarked against object detection output generated using commercially available software. Results demonstrate that the open source workflow achieves superior spatial accuracy — producing output `20% more spatially accurate`, with `400% greater IoU` — while providing a scalable way to perform spatial localization on high-resolution aerial imagery (with ground sampling distance, or GSD, < 30 cm).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, 20% more spatially accurate, with 400% greater IoU. I assume this is comparing with vanilla SAM?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment for clarification.


## Approach

Our approach integrates SAM’s segmentation strengths with traditional geospatial data processing techniques, which lends itself to our precision agriculture use case. The workflow, like any other, can be thought of as a sequence of steps (visualized above and described below), each with their own sets of substeps:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: visualized above, missing / mis-placed visualization?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will edit!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why is 80% used as threshold? Is it a hyper-paramter for optimized performance, or a standard practice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically, through iteration, we discovered that threshold produced more useful results.


### Key Findings

The open-source workflow using Meta AI’s Segment Anything Model (SAM) outperformed a commercial alternative in object detection and spatial localization on high-resolution drone imagery. It achieved `20% higher spatial accuracy` (1.20 cm vs 1.39 cm deviation) and a `400% higher Intersection-over-Union (IoU)` (0.74 vs 0.18), indicating stronger alignment with object boundaries. Both methods had near-perfect precision, but the open-source approach showed slightly lower recall due to 65 false negatives. It should be noted, however, that these FN were a direct result of the filtering substep in our workflow, which filtered our detections (based on arbitrary geometry area and compactness thresholds; see [code](https://colab.research.google.com/drive/1pwnb14s2i7n_VAlfwhBqzDQ0cOb9oGs-?usp=sharing#sandboxMode=true&scrollTo=240nXaT5-EqM)) that are present in the output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can author elaborate that, from a domain expert point of view, which metric(s) is the most important and why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spatial accuracy was a metric imposed by the client, and IoU is a standard evaluation metric in computer vision/object detection.


[^footnote-3]: Inference was accelerated using `CUDA 12` (`cuDF 25.2.1`) on a `T4` GPU within our Colab notebook environment.

### Workflow
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Workflow section, can we replace or add a pseudocode-algorithm style block? It will be a more formal and technical representation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, will do!

Copy link
Member

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for coming by our booth 🙂 Let's see if these changes get the PDF building...

@nickmccarty
Copy link
Author

Thanks for coming by our booth 🙂 Let's see if these changes get the PDF building...

Thanks a bunch -- you rock! Great chatting with you, Franklin 🤘🤩

@fwkoch
Copy link
Member

fwkoch commented Jul 10, 2025

I noticed there are a few sections where you only have a figure, and in the PDF, these figures get displaced and the sections look empty (see the image below).

We can also get this fixed up for you - there is an option to anchor figures in place. I need to determine if we should make this fix across all papers or just yours. Don't worry about it for now!

image

@nickmccarty
Copy link
Author

I noticed there are a few sections where you only have a figure, and in the PDF, these figures get displaced and the sections look empty (see the image below).

We can also get this fixed up for you - there is an option to anchor figures in place. I need to determine if we should make this fix across all papers or just yours. Don't worry about it for now!

image

Thank you for the consideration! The reviewer mentioned that replacing the figures with pseudocode would be better, so I'll likely do that. Will circle back if I decide otherwise -- thanks again, @fwkoch!

@nickmccarty
Copy link
Author

I pushed the suggested revisions and added labels to the various sections (when linking) like you did, but the changes are somehow preventing the PDF from building again, @fwkoch ... a million thanks in advance for any help or guidance you could provide!

## Discussion

### Key Findings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to clarify that the large IoU improvement arises not only from quantitative differences but also from the methodological distinction between mask-based polygons and bounding boxes. A brief reminder of the different output types would make this clearer.
In addition, since the FN stem from the filtering substep, consider noting whether you observed practical tradeoffs between FP and FN, and that these thresholds were empirically chosen. A short clarification would strengthen the interpretation of the results.

Copy link
Author

@nickmccarty nickmccarty Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input, @anacomesana -- I'll be sure to try to get those points added ahead of tomorrow's deadline (I'm traveling) 🤞 Also, while I have you here, do I need to alter the Methodolody pseudocode by instead using the LaTeX algpsedocode package, or anything? Thanks again!


### Precision Agriculture Challenges

Our work began with an eye toward tackling a major challenge in agricultural remote sensing: the need for extensive manual annotation. SAM’s zero-shot segmentation enables accurate object detection without domain-specific training, making it scalable and adaptable for new use cases with minimal setup.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a short reflection here on the workflow's potential generalizability (for example, if similar performance could be expected in other agricultural imagery like crop rows or tree canopies), or if additional challenges might arise in different applications.

Copy link
Author

@nickmccarty nickmccarty Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks -- I'll try ahead of tomorrow's deadline (not a lot of advance notice)

Also, these are trade secrets -- this is not an academic project. I have a responsibility to my client to say as little as possible (I've written and submitted this with their permission and their extended grace)...

@nickmccarty
Copy link
Author

Will I be penalized if I don't make the reviewer's last-minute suggested revisions, @scipy-conference/2025-proceedings? If so, I'm traveling and would like to request some grace, given the lack of notice w.r.t. these oddly-timed (questionable) revision requests...

@nickmccarty
Copy link
Author

Also, can I please get the requested guidance about the pseudocode, @anacomesana?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned-editor draft This triggers Curvenote Preview actions paper This indicates that the PR in question is a paper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants