Paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) #1106

nickmccarty · 2025-06-13T23:29:48Z

Pushed changes to pass doi checks, please advise if anything else is required...

github-actions · 2025-06-13T23:30:07Z

Curvenote Preview

Directory	Preview	Checks	Updated (UTC)
papers/nicholas_mccarty	🔍 Inspect	✅ 41 checks passed (6 optional)	Aug 6, 2025, 1:07 AM

ameyxd · 2025-06-23T18:35:00Z

Inviting reviewers: @[email protected] and @[email protected]

nickmccarty · 2025-06-30T17:21:59Z

Is the accompanying poster to be submitted using MyST/Curvenote as well, @scipy-conference/2025-proceedings? If so, are you able to point me toward any guidance on that front?

rowanc1 · 2025-07-01T17:43:01Z

Please open another PR with your poster and follow the instructions in the readme!

nickmccarty · 2025-07-01T17:46:50Z

Please open another PR with your poster and follow the instructions in the readme!

Will do, thanks a bunch!

ameyxd · 2025-07-02T19:41:12Z

@anacomesana will serve as editor for this paper.

naomity · 2025-07-03T03:57:06Z

Hi Nick, it's my honor to be assigned to review this paper. The paper is overall well-structured, will be taking a closer look soon!

naomity

Impressive dataset and application domain! Paper is well-written, just a few minor edits.

naomity · 2025-07-08T04:42:04Z

papers/nicholas_mccarty/myst.yml

+  # Ensure your title is the same as in your `main.md`
+  title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)
+  # subtitle:
+  description: This article presents a workflow that utilizes SAM's automatic mask generation skill to effectively perform the task of object detection zero-shot on a high-resolution drone orthomosaic. The generated output is 20% more spatially accurate than that produced using proprietary software, with 400% greater IoU.


20% more spatially accurate, with 400% greater IoU. I assume this is comparing with vanilla SAM?

Good question, which indicates that I need to add clarity on this front. We benchmarked our results against the output produced using proprietary software; our output is more spatially accurate (the centers of our detected objects are closer to the QC points) and our polygons cover the actual objects better (our generated mask polygons have 400% greater IoU than the bounding boxes generated using the proprietary software). Will add this to my list of edits.

naomity · 2025-07-08T04:47:35Z

papers/nicholas_mccarty/main.md

+# Ensure that this title is the same as the one in `myst.yml`
+title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)
+abstract: |
+  Accurate and efficient object detection and spatial localization in remote sensing imagery is a persistent challenge. In the context of precision agriculture, the extensive data annotation required by conventional deep learning models poses additional challenges. This paper presents a fully open source workflow leveraging Meta AI's Segment Anything Model (SAM) for zero-shot segmentation, enabling scalable object detection and spatial localization in high-resolution drone orthomosaics without the need for annotated image datasets. Model training and/or fine-tuning is rendered unnecessary in our precision agriculture-focused use case. The presented end-to-end workflow takes high-resolution images and quality control (QC) check points as inputs, automatically generates masks corresponding to the objects of interest (empty plant pots, in our given context), and outputs their spatial locations in real-world coordinates. Detection accuracy (required in the given context to be within 3 cm) is then quantitatively evaluated using the ground truth QC check points and benchmarked against object detection output generated using commercially available software. Results demonstrate that the open source workflow achieves superior spatial accuracy — producing output `20% more spatially accurate`, with `400% greater IoU` — while providing a scalable way to perform spatial localization on high-resolution aerial imagery (with ground sampling distance, or GSD, < 30 cm).


ditto, 20% more spatially accurate, with 400% greater IoU. I assume this is comparing with vanilla SAM?

See above comment for clarification.

naomity · 2025-07-08T04:48:10Z

papers/nicholas_mccarty/main.md

+
+## Approach
+
+Our approach integrates SAM’s segmentation strengths with traditional geospatial data processing techniques, which lends itself to our precision agriculture use case. The workflow, like any other, can be thought of as a sequence of steps (visualized above and described below), each with their own sets of substeps:


nit: visualized above, missing / mis-placed visualization?

Good catch, will edit!

naomity · 2025-07-08T04:58:52Z

papers/nicholas_mccarty/mask-generation.png

Curious why is 80% used as threshold? Is it a hyper-paramter for optimized performance, or a standard practice?

Empirically, through iteration, we discovered that threshold produced more useful results.

naomity · 2025-07-08T05:04:39Z

papers/nicholas_mccarty/main.md

+
+### Key Findings
+
+The open-source workflow using Meta AI’s Segment Anything Model (SAM) outperformed a commercial alternative in object detection and spatial localization on high-resolution drone imagery. It achieved `20% higher spatial accuracy` (1.20 cm vs 1.39 cm deviation) and a `400% higher Intersection-over-Union (IoU)` (0.74 vs 0.18), indicating stronger alignment with object boundaries. Both methods had near-perfect precision, but the open-source approach showed slightly lower recall due to 65 false negatives. It should be noted, however, that these FN were a direct result of the filtering substep in our workflow, which filtered our detections (based on arbitrary geometry area and compactness thresholds; see [code](https://colab.research.google.com/drive/1pwnb14s2i7n_VAlfwhBqzDQ0cOb9oGs-?usp=sharing#sandboxMode=true&scrollTo=240nXaT5-EqM)) that are present in the output. 


Can author elaborate that, from a domain expert point of view, which metric(s) is the most important and why?

Spatial accuracy was a metric imposed by the client, and IoU is a standard evaluation metric in computer vision/object detection.

naomity · 2025-07-08T05:09:33Z

papers/nicholas_mccarty/main.md

+
+[^footnote-3]: Inference was accelerated using `CUDA 12` (`cuDF 25.2.1`) on a `T4` GPU within our Colab notebook environment.
+
+### Workflow


For the Workflow section, can we replace or add a pseudocode-algorithm style block? It will be a more formal and technical representation.

Good callout, will do!

fwkoch

Thanks for coming by our booth 🙂 Let's see if these changes get the PDF building...

papers/nicholas_mccarty/main.md

nickmccarty · 2025-07-09T20:15:30Z

Thanks for coming by our booth 🙂 Let's see if these changes get the PDF building...

Thanks a bunch -- you rock! Great chatting with you, Franklin 🤘🤩

fwkoch · 2025-07-10T00:36:05Z

I noticed there are a few sections where you only have a figure, and in the PDF, these figures get displaced and the sections look empty (see the image below).

We can also get this fixed up for you - there is an option to anchor figures in place. I need to determine if we should make this fix across all papers or just yours. Don't worry about it for now!

nickmccarty · 2025-07-10T00:47:06Z

I noticed there are a few sections where you only have a figure, and in the PDF, these figures get displaced and the sections look empty (see the image below).

We can also get this fixed up for you - there is an option to anchor figures in place. I need to determine if we should make this fix across all papers or just yours. Don't worry about it for now!

Thank you for the consideration! The reviewer mentioned that replacing the figures with pseudocode would be better, so I'll likely do that. Will circle back if I decide otherwise -- thanks again, @fwkoch!

nickmccarty · 2025-08-06T22:28:34Z

I pushed the suggested revisions and added labels to the various sections (when linking) like you did, but the changes are somehow preventing the PDF from building again, @fwkoch ... a million thanks in advance for any help or guidance you could provide!

anacomesana · 2025-08-21T22:23:24Z

papers/nicholas_mccarty/main.md

+## Discussion
+
+### Key Findings
+


It would be helpful to clarify that the large IoU improvement arises not only from quantitative differences but also from the methodological distinction between mask-based polygons and bounding boxes. A brief reminder of the different output types would make this clearer.
In addition, since the FN stem from the filtering substep, consider noting whether you observed practical tradeoffs between FP and FN, and that these thresholds were empirically chosen. A short clarification would strengthen the interpretation of the results.

Thanks for your input, @anacomesana -- I'll be sure to try to get those points added ahead of tomorrow's deadline (I'm traveling) 🤞 Also, while I have you here, do I need to alter the Methodolody pseudocode by instead using the LaTeX algpsedocode package, or anything? Thanks again!

anacomesana · 2025-08-21T22:33:00Z

papers/nicholas_mccarty/main.md

+
+### Precision Agriculture Challenges
+
+Our work began with an eye toward tackling a major challenge in agricultural remote sensing: the need for extensive manual annotation. SAM’s zero-shot segmentation enables accurate object detection without domain-specific training, making it scalable and adaptable for new use cases with minimal setup.


Consider adding a short reflection here on the workflow's potential generalizability (for example, if similar performance could be expected in other agricultural imagery like crop rows or tree canopies), or if additional challenges might arise in different applications.

Again, thanks -- I'll try ahead of tomorrow's deadline (not a lot of advance notice)

Also, these are trade secrets -- this is not an academic project. I have a responsibility to my client to say as little as possible (I've written and submitted this with their permission and their extended grace)...

nickmccarty · 2025-08-22T20:13:31Z

Will I be penalized if I don't make the reviewer's last-minute suggested revisions, @scipy-conference/2025-proceedings? If so, I'm traveling and would like to request some grace, given the lack of notice w.r.t. these oddly-timed (questionable) revision requests...

nickmccarty · 2025-08-22T20:15:44Z

Also, can I please get the requested guidance about the pseudocode, @anacomesana?

nickmccarty and others added 4 commits June 13, 2025 11:32

pushed paper draft

862301f

Merge branch '2025' into 2025

2804b9c

added DOIs and error rule keys to bypass DOI checks

fdf1de4

Merge remote-tracking branch 'origin/2025' into 2025

aa38dbc

nickmccarty added 2 commits June 13, 2025 16:43

removed redundant image from directory

d810340

added doi for final citation

a4a0631

rowanc1 added paper This indicates that the PR in question is a paper draft This triggers Curvenote Preview actions and removed draft This triggers Curvenote Preview actions labels Jun 14, 2025

nickmccarty added 2 commits June 13, 2025 18:55

simplified Appendix header

f9cbdac

disambiguated some verbiage

c2684b9

added link to our Python package (orthomasker)

92daf7e

This comment was marked as resolved.

Sign in to view

fwkoch mentioned this pull request Jul 2, 2025

Running list of small fixes for 2025 submissions #1082

Open

8 tasks

ameyxd assigned anacomesana Jul 2, 2025

ameyxd added the assigned-editor label Jul 2, 2025

naomity reviewed Jul 8, 2025

View reviewed changes

fwkoch reviewed Jul 9, 2025

View reviewed changes

papers/nicholas_mccarty/main.md Show resolved Hide resolved

papers/nicholas_mccarty/main.md Show resolved Hide resolved

🔧 Add label to code section

92b580f

🔧 Add label to accuracy eval section

3e0a8ae

pushed revisions

62d49d5

anacomesana reviewed Aug 21, 2025

View reviewed changes

naomity approved these changes Aug 22, 2025

View reviewed changes


		## Approach

		Our approach integrates SAM’s segmentation strengths with traditional geospatial data processing techniques, which lends itself to our precision agriculture use case. The workflow, like any other, can be thought of as a sequence of steps (visualized above and described below), each with their own sets of substeps:


		### Key Findings

		The open-source workflow using Meta AI’s Segment Anything Model (SAM) outperformed a commercial alternative in object detection and spatial localization on high-resolution drone imagery. It achieved `20% higher spatial accuracy` (1.20 cm vs 1.39 cm deviation) and a `400% higher Intersection-over-Union (IoU)` (0.74 vs 0.18), indicating stronger alignment with object boundaries. Both methods had near-perfect precision, but the open-source approach showed slightly lower recall due to 65 false negatives. It should be noted, however, that these FN were a direct result of the filtering substep in our workflow, which filtered our detections (based on arbitrary geometry area and compactness thresholds; see [code](https://colab.research.google.com/drive/1pwnb14s2i7n_VAlfwhBqzDQ0cOb9oGs-?usp=sharing#sandboxMode=true&scrollTo=240nXaT5-EqM)) that are present in the output.


		[^footnote-3]: Inference was accelerated using `CUDA 12` (`cuDF 25.2.1`) on a `T4` GPU within our Colab notebook environment.

		### Workflow


		### Precision Agriculture Challenges

		Our work began with an eye toward tackling a major challenge in agricultural remote sensing: the need for extensive manual annotation. SAM’s zero-shot segmentation enables accurate object detection without domain-specific training, making it scalable and adaptable for new use cases with minimal setup.

Paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) #1106

Are you sure you want to change the base?

Paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) #1106

Uh oh!

Conversation

nickmccarty commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ameyxd commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickmccarty commented Jun 30, 2025

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

rowanc1 commented Jul 1, 2025

Uh oh!

nickmccarty commented Jul 1, 2025

Uh oh!

ameyxd commented Jul 2, 2025

Uh oh!

naomity commented Jul 3, 2025

Uh oh!

naomity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fwkoch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nickmccarty commented Jul 9, 2025

Uh oh!

fwkoch commented Jul 10, 2025

Uh oh!

nickmccarty commented Jul 10, 2025

Uh oh!

nickmccarty commented Aug 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickmccarty Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickmccarty Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickmccarty commented Aug 22, 2025

Uh oh!

nickmccarty commented Jun 13, 2025 •

edited

Loading

github-actions bot commented Jun 13, 2025 •

edited

Loading

ameyxd commented Jun 23, 2025 •

edited

Loading

nickmccarty Aug 21, 2025 •

edited

Loading

nickmccarty Aug 21, 2025 •

edited

Loading