Conversation
Reviewer's GuideEnhance HTML fetching by adding a fallback to the system’s curl binary whenever reqwest fails (due to errors, timeouts, or unexpected content types), and encapsulating that logic in a new helper function. Sequence diagram for HTML fetch with reqwest and curl fallbacksequenceDiagram
participant Caller
participant fetch_html
participant reqwest
participant fetch_html_with_curl
participant curl
Caller->>fetch_html: fetch_html(client, url)
fetch_html->>reqwest: client.get(url).send()
alt reqwest succeeds and returns HTML
reqwest-->>fetch_html: HTML response
fetch_html-->>Caller: HTML body
else reqwest fails (error, timeout, or non-HTML)
fetch_html->>fetch_html_with_curl: fetch_html_with_curl(url)
fetch_html_with_curl->>curl: spawn curl process
curl-->>fetch_html_with_curl: curl output
alt curl returns HTML
fetch_html_with_curl-->>fetch_html: HTML body
fetch_html-->>Caller: HTML body
else curl fails or non-HTML
fetch_html_with_curl-->>fetch_html: error
fetch_html-->>Caller: error
end
end
Class diagram for new HTML fetch logic with curl fallbackclassDiagram
class fetch_html {
+async fn fetch_html(client: &Client, url: &str) -> Result<String>
}
class fetch_html_with_curl {
+async fn fetch_html_with_curl(url: &str) -> Result<String>
}
fetch_html --> fetch_html_with_curl : fallback on error/timeout/non-HTML
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
There was a problem hiding this comment.
Hey @twardoch - I've reviewed your changes - here's some feedback:
- Check for the
curlbinary’s availability before falling back and return a clear error if it’s missing. - Wrap the spawned
curlprocess in a timeout (e.g., viatokio::time::timeout) to avoid hanging indefinitely. - Improve HTTP response parsing in
fetch_html_with_curlto handle LF-only delimiters or consider using a proper HTTP parser.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Check for the `curl` binary’s availability before falling back and return a clear error if it’s missing.
- Wrap the spawned `curl` process in a timeout (e.g., via `tokio::time::timeout`) to avoid hanging indefinitely.
- Improve HTTP response parsing in `fetch_html_with_curl` to handle LF-only delimiters or consider using a proper HTTP parser.
## Individual Comments
### Comment 1
<location> `src/html.rs:401` </location>
<code_context>
+/// platforms where reqwest/rustls has trouble negotiating TLS. The function
+/// attempts to fetch the given URL and returns the HTML body if successful.
+async fn fetch_html_with_curl(url: &str) -> Result<String> {
+ let output = Command::new("curl")
+ .arg("-L")
+ .arg("-s")
</code_context>
<issue_to_address>
No timeout is set for the curl subprocess, risking indefinite hangs.
Add a timeout to the subprocess to avoid indefinite waits if curl hangs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| /// platforms where reqwest/rustls has trouble negotiating TLS. The function | ||
| /// attempts to fetch the given URL and returns the HTML body if successful. | ||
| async fn fetch_html_with_curl(url: &str) -> Result<String> { | ||
| let output = Command::new("curl") |
There was a problem hiding this comment.
issue (bug_risk): No timeout is set for the curl subprocess, risking indefinite hangs.
Add a timeout to the subprocess to avoid indefinite waits if curl hangs.
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||
Summary
this_filecurlwhen reqwest fails to fetch a page or returns an unexpected MIME typeTesting
cargo test --all-featureshttps://chatgpt.com/codex/tasks/task_e_685b338d8af8832aa54a598b37a01850
Summary by Sourcery
Provide a system curl fallback for HTML fetching to handle reqwest failures, timeouts, and unexpected MIME types
New Features:
Enhancements: