feat: add HTML to markdown conversion for http_request tool #63

mkmeral · 2025-05-30T22:23:02Z

Description

This PR enhances the http_request tool with HTML to markdown conversion capabilities, making web content more readable and suitable for AI processing.

Key Features:

New Parameter: convert_to_markdown boolean parameter to enable conversion
Smart Detection: Automatically detects HTML content by checking Content-Type headers and document structure
Clean Conversion: Uses readabilipy to extract main content and markdownify to convert to clean markdown
Graceful Fallback: Returns original content if conversion fails
User Feedback: Shows success notification when conversion occurs

Use Cases:

Scraping articles and blog posts for better readability
Converting HTML documentation to markdown format
Processing web content for AI analysis
Creating clean text versions of web pages

Example Usage:

# Convert HTML webpage to markdown
response = agent.tool.http_request(
    method="GET",
    url="https://example.com/article",
    convert_to_markdown=True
)

Related Issues

N/A

Documentation PR

N/A - Documentation updated in this PR

Type of Change

Testing

Automated Testing:

hatch fmt --linter ✅
hatch fmt --formatter ✅
hatch test --all ✅ (540 passed, 5 skipped)

Test Coverage:

Added unit tests for HTML conversion functionality
Manually tested, Claude's comment on the markdown and HTML content:

## Results Summary:

**First request (without markdown conversion):**
- Retrieved the raw HTML content of the blog post
- Shows the complete HTML structure with all tags, CSS, JavaScript, and metadata
- Content is in its original HTML format with full page structure

**Second request (with markdown conversion):**
- Retrieved the same content but converted to clean, readable markdown format
- Stripped out all the HTML boilerplate, navigation, headers, footers, and styling
- Focused only on the main article content in an easy-to-read markdown format

Checklist

I have read the CONTRIBUTING document
I have added tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature
My changes generate no new warnings
Any dependent changes have been merged and published
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add markdownify and readabilipy dependencies - Add convert_to_markdown parameter to http_request tool - Automatically detect and convert HTML responses to markdown - Add tests and documentation with usage examples

mkmeral · 2025-05-31T09:33:53Z

See example trace https://cloud.langfuse.com/project/cmb3mu0el0017ad075dgtxwip/traces/9aa133fb1499319013ee4b9043d27932?timestamp=2025-05-30T22%3A17%3A23.772Z&display=details&observation=c08cd5ef10e23b1a

awsarron

Awesome PR, thank you @mkmeral!

One small comment on the test coverage and then I think this is good to merge.

awsarron · 2025-06-10T08:13:33Z

tests/test_http_request.py

+    result_text = extract_result_text(result)
+    assert "Status Code: 200" in result_text
+    # The exact markdown format depends on whether the optional packages are installed
+    # So we just verify that the request succeeded with the parameter


Would be worth testing that the markdown parsing worked I think. Could we assert that HTML tags are not present, and that expected text content is present?

Which optional packages are being referred to in this comment?

feat: add HTML to markdown conversion for http_request tool

0c640f9

- Add markdownify and readabilipy dependencies - Add convert_to_markdown parameter to http_request tool - Automatically detect and convert HTML responses to markdown - Add tests and documentation with usage examples

mkmeral requested a review from a team as a code owner May 30, 2025 22:23

awsarron requested changes Jun 10, 2025

View reviewed changes

Murat Kaan Meral added 2 commits June 10, 2025 14:20

Merge branch 'main' into feature/html-to-markdown-conversion

fedc5c6

fix: extend test coverage for markdownify in http request tool

095b39c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HTML to markdown conversion for http_request tool #63

feat: add HTML to markdown conversion for http_request tool #63

Uh oh!

mkmeral commented May 30, 2025

Uh oh!

mkmeral commented May 31, 2025

Uh oh!

awsarron left a comment

Uh oh!

awsarron Jun 10, 2025

Uh oh!

Uh oh!

feat: add HTML to markdown conversion for http_request tool #63

Are you sure you want to change the base?

feat: add HTML to markdown conversion for http_request tool #63

Uh oh!

Conversation

mkmeral commented May 30, 2025

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

mkmeral commented May 31, 2025

Uh oh!

awsarron left a comment

Choose a reason for hiding this comment

Uh oh!

awsarron Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!