-
Notifications
You must be signed in to change notification settings - Fork 562
Paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2025
Are you sure you want to change the base?
Conversation
Curvenote Preview
|
Inviting reviewers: @[email protected] and @[email protected] |
Is the accompanying poster to be submitted using MyST/Curvenote as well, @scipy-conference/2025-proceedings? If so, are you able to point me toward any guidance on that front? |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Please open another PR with your poster and follow the instructions in the readme! |
Will do, thanks a bunch! |
@anacomesana will serve as editor for this paper. |
Hi Nick, it's my honor to be assigned to review this paper. The paper is overall well-structured, will be taking a closer look soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive dataset and application domain! Paper is well-written, just a few minor edits.
# Ensure your title is the same as in your `main.md` | ||
title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) | ||
# subtitle: | ||
description: This article presents a workflow that utilizes SAM's automatic mask generation skill to effectively perform the task of object detection zero-shot on a high-resolution drone orthomosaic. The generated output is 20% more spatially accurate than that produced using proprietary software, with 400% greater IoU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20% more spatially accurate, with 400% greater IoU
. I assume this is comparing with vanilla SAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, which indicates that I need to add clarity on this front. We benchmarked our results against the output produced using proprietary software; our output is more spatially accurate (the centers of our detected objects are closer to the QC points) and our polygons cover the actual objects better (our generated mask polygons have 400% greater IoU than the bounding boxes generated using the proprietary software). Will add this to my list of edits.
# Ensure that this title is the same as the one in `myst.yml` | ||
title: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM) | ||
abstract: | | ||
Accurate and efficient object detection and spatial localization in remote sensing imagery is a persistent challenge. In the context of precision agriculture, the extensive data annotation required by conventional deep learning models poses additional challenges. This paper presents a fully open source workflow leveraging Meta AI's Segment Anything Model (SAM) for zero-shot segmentation, enabling scalable object detection and spatial localization in high-resolution drone orthomosaics without the need for annotated image datasets. Model training and/or fine-tuning is rendered unnecessary in our precision agriculture-focused use case. The presented end-to-end workflow takes high-resolution images and quality control (QC) check points as inputs, automatically generates masks corresponding to the objects of interest (empty plant pots, in our given context), and outputs their spatial locations in real-world coordinates. Detection accuracy (required in the given context to be within 3 cm) is then quantitatively evaluated using the ground truth QC check points and benchmarked against object detection output generated using commercially available software. Results demonstrate that the open source workflow achieves superior spatial accuracy — producing output `20% more spatially accurate`, with `400% greater IoU` — while providing a scalable way to perform spatial localization on high-resolution aerial imagery (with ground sampling distance, or GSD, < 30 cm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, 20% more spatially accurate, with 400% greater IoU
. I assume this is comparing with vanilla SAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment for clarification.
|
||
## Approach | ||
|
||
Our approach integrates SAM’s segmentation strengths with traditional geospatial data processing techniques, which lends itself to our precision agriculture use case. The workflow, like any other, can be thought of as a sequence of steps (visualized above and described below), each with their own sets of substeps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: visualized above
, missing / mis-placed visualization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will edit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why is 80% used as threshold? Is it a hyper-paramter for optimized performance, or a standard practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirically, through iteration, we discovered that threshold produced more useful results.
papers/nicholas_mccarty/main.md
Outdated
|
||
### Key Findings | ||
|
||
The open-source workflow using Meta AI’s Segment Anything Model (SAM) outperformed a commercial alternative in object detection and spatial localization on high-resolution drone imagery. It achieved `20% higher spatial accuracy` (1.20 cm vs 1.39 cm deviation) and a `400% higher Intersection-over-Union (IoU)` (0.74 vs 0.18), indicating stronger alignment with object boundaries. Both methods had near-perfect precision, but the open-source approach showed slightly lower recall due to 65 false negatives. It should be noted, however, that these FN were a direct result of the filtering substep in our workflow, which filtered our detections (based on arbitrary geometry area and compactness thresholds; see [code](https://colab.research.google.com/drive/1pwnb14s2i7n_VAlfwhBqzDQ0cOb9oGs-?usp=sharing#sandboxMode=true&scrollTo=240nXaT5-EqM)) that are present in the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can author elaborate that, from a domain expert point of view, which metric(s) is the most important and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spatial accuracy was a metric imposed by the client, and IoU is a standard evaluation metric in computer vision/object detection.
|
||
[^footnote-3]: Inference was accelerated using `CUDA 12` (`cuDF 25.2.1`) on a `T4` GPU within our Colab notebook environment. | ||
|
||
### Workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Workflow section, can we replace or add a pseudocode-algorithm style block? It will be a more formal and technical representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for coming by our booth 🙂 Let's see if these changes get the PDF building...
Thanks a bunch -- you rock! Great chatting with you, Franklin 🤘🤩 |
Thank you for the consideration! The reviewer mentioned that replacing the figures with pseudocode would be better, so I'll likely do that. Will circle back if I decide otherwise -- thanks again, @fwkoch! |
I pushed the suggested revisions and added labels to the various sections (when linking) like you did, but the changes are somehow preventing the PDF from building again, @fwkoch ... a million thanks in advance for any help or guidance you could provide! |
## Discussion | ||
|
||
### Key Findings | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to clarify that the large IoU improvement arises not only from quantitative differences but also from the methodological distinction between mask-based polygons and bounding boxes. A brief reminder of the different output types would make this clearer.
In addition, since the FN stem from the filtering substep, consider noting whether you observed practical tradeoffs between FP and FN, and that these thresholds were empirically chosen. A short clarification would strengthen the interpretation of the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input, @anacomesana -- I'll be sure to try to get those points added ahead of tomorrow's deadline (I'm traveling) 🤞 Also, while I have you here, do I need to alter the Methodolody pseudocode by instead using the LaTeX algpsedocode
package, or anything? Thanks again!
|
||
### Precision Agriculture Challenges | ||
|
||
Our work began with an eye toward tackling a major challenge in agricultural remote sensing: the need for extensive manual annotation. SAM’s zero-shot segmentation enables accurate object detection without domain-specific training, making it scalable and adaptable for new use cases with minimal setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a short reflection here on the workflow's potential generalizability (for example, if similar performance could be expected in other agricultural imagery like crop rows or tree canopies), or if additional challenges might arise in different applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks -- I'll try ahead of tomorrow's deadline (not a lot of advance notice)
Also, these are trade secrets -- this is not an academic project. I have a responsibility to my client to say as little as possible (I've written and submitted this with their permission and their extended grace)...
Will I be penalized if I don't make the reviewer's last-minute suggested revisions, @scipy-conference/2025-proceedings? If so, I'm traveling and would like to request some grace, given the lack of notice w.r.t. these oddly-timed (questionable) revision requests... |
Also, can I please get the requested guidance about the pseudocode, @anacomesana? |
Pushed changes to pass doi checks, please advise if anything else is required...