Skip to content

feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK#135

Open
VinciGit00 wants to merge 1 commit into
CelestoAI:mainfrom
VinciGit00:feat/scrapegraph-sdk-v2
Open

feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK#135
VinciGit00 wants to merge 1 commit into
CelestoAI:mainfrom
VinciGit00:feat/scrapegraph-sdk-v2

Conversation

@VinciGit00
Copy link
Copy Markdown
Contributor

@VinciGit00 VinciGit00 commented Apr 22, 2026

Summary

  • Migrates the ScrapeGraphAI tool to the new scrapegraph-py 2.x SDK. The old Client class and its methods (smartscraper, markdownify, searchscraper, smartcrawler, sitemap) no longer exist upstream, so this is a full rewrite against the new endpoint surface.
  • Bumps the scrapegraph extra to scrapegraph-py>=2.1.0.
  • Accepts SGAI_API_KEY (the new SDK default) and falls back to the legacy SCRAPEGRAPH_API_KEY so existing users aren't broken.

New capabilities

Capability Maps to
scrape(url, format) client.scrape(...) with Markdown/Html/Links/SummaryFormatConfig
extract(prompt, url, schema) client.extract(...) — AI structured extraction
search(query, num_results, prompt) client.search(...)
crawl(url, max_pages, max_depth, include/exclude) client.crawl.start(...)
get_crawl_result(crawl_id) client.crawl.get(...)
monitor(url, interval, name, webhook_url) client.monitor.create(...)
credits() client.credits()
health() client.health()

Responses from the SDK are ApiResult objects; the tool turns successful results into a JSON string and surfaces result.error as "Error in <capability>: ..." so the LLM gets a consistent string return.

Breaking change note

The previous capability names (smartscraper, markdownify, etc.) are removed. Any agent prompt that hard-coded those names needs to be updated — see examples/scrapegraphai_example.py for the new surface.

Test plan

  • pytest tests/tools/test_scrapegraphai.py — 16/16 pass (success paths, API error paths, exception paths, missing-dep guard, env-var resolution including legacy fallback)
  • Live-tested against the ScrapeGraphAI API: health, credits, scrape, extract, search, crawl (start + get_crawl_result poll to completion), and error path for invalid URL
  • Reviewer sanity-check on the example

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced ScrapeGraphAI integration with new v2 API capabilities: scrape, extract, search, crawl, monitor, credits, and health.
    • Added output format options (markdown, HTML, links, summary) for scrape operations.
  • Documentation

    • Updated examples demonstrating v2 API capabilities.
    • Environment variable name changed from SCRAPEGRAPH_API_KEY to SGAI_API_KEY.
  • Chores

    • Updated ScrapeGraphAI SDK dependency to v2.1.0+.

Greptile Summary

This PR rewrites the ScrapeGraphAI tool against the scrapegraph-py v2 SDK, replacing the five v1 capabilities (smartscraper, markdownify, etc.) with eight new ones (scrape, extract, search, crawl, get_crawl_result, monitor, credits, health). The new env-var fallback chain and _format_result helper are well-structured, and the test suite is thorough.

  • JsonFormatConfig is imported from scrapegraph_py but never referenced in _FORMAT_BUILDERS or any capability method. This will fail the project's ruff linter (F401) in CI. The import (and its None assignment in the fallback block) should be removed unless \"json\" format support is intended.

Confidence Score: 4/5

Safe to merge after fixing the unused JsonFormatConfig import, which will fail ruff linting in CI.

One P1 finding (unused import that breaks ruff/CI linting) keeps this at 4. The remaining findings are P2 style/consistency items that do not affect runtime correctness.

src/agentor/tools/scrapegraphai.py — remove or wire up the JsonFormatConfig import and keep the fallback block in sync.

Important Files Changed

Filename Overview
src/agentor/tools/scrapegraphai.py Full rewrite against scrapegraph-py v2 SDK; two issues: JsonFormatConfig is imported but unused (ruff F401/CI fail), and the fallback except block must stay in sync with the try-block imports. Additionally self.api_key is set to the unresolved value before env-var lookup.
tests/tools/test_scrapegraphai.py Comprehensive test rewrite covering all 8 new capabilities, error paths, exception paths, missing-dep guard, and env-var resolution including legacy fallback.
pyproject.toml Bumps scrapegraph-py from >=1.46.0 to >=2.1.0 in both the scrapegraph extra and the all extra — correctly paired.
examples/scrapegraphai_example.py Updated to new capability names and SGAI_API_KEY env var; covers scrape, extract, search, crawl/get_crawl_result, and credits.

Sequence Diagram

sequenceDiagram
    participant Agent as LLM Agent
    participant Tool as ScrapeGraphAI Tool
    participant SDK as scrapegraph-py v2 SDK
    participant API as ScrapeGraphAI API

    Agent->>Tool: scrape(url, format)
    Tool->>Tool: _FORMAT_BUILDERS[format]()
    Tool->>SDK: client.scrape(url, formats=[...])
    SDK->>API: HTTP POST /scrape
    API-->>SDK: ApiResult
    SDK-->>Tool: ApiResult
    Tool->>Tool: _format_result(result, scrape)
    Tool-->>Agent: JSON string or Error in scrape

    Agent->>Tool: extract(prompt, url, schema)
    Tool->>SDK: client.extract(prompt, url, schema)
    SDK->>API: HTTP POST /extract
    API-->>SDK: ApiResult
    SDK-->>Tool: ApiResult
    Tool-->>Agent: JSON string or Error in extract

    Agent->>Tool: crawl(url, max_pages, max_depth)
    Tool->>SDK: client.crawl.start(url, formats, ...)
    SDK->>API: HTTP POST /crawl
    API-->>SDK: ApiResult with crawl_id
    SDK-->>Tool: ApiResult
    Tool-->>Agent: JSON string with crawl_id

    Agent->>Tool: get_crawl_result(crawl_id)
    Tool->>SDK: client.crawl.get(crawl_id)
    SDK->>API: HTTP GET /crawl/id
    API-->>SDK: ApiResult
    SDK-->>Tool: ApiResult
    Tool-->>Agent: JSON string with status and results
Loading

Reviews (1): Last reviewed commit: "feat(scrapegraph): migrate to scrapegrap..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

The scrapegraph-py 2.x SDK replaces the old `Client` with `ScrapeGraphAI`
and returns `ApiResult` objects instead of raising exceptions. The old
capability surface (smartscraper, markdownify, searchscraper, smartcrawler,
sitemap) no longer exists upstream, so this is a full rewrite of the tool
against the new endpoints.

Capabilities exposed:
  - scrape(url, format)           — markdown/html/links/summary
  - extract(prompt, url, schema)  — AI structured extraction
  - search(query, num_results)    — web search + optional extraction
  - crawl(url, max_pages, ...)    — start a crawl job
  - get_crawl_result(crawl_id)    — poll crawl status/result
  - monitor(url, interval, ...)   — schedule a page monitor (cron)
  - credits()                     — plan / remaining credits
  - health()                      — API health check

Also:
  - Bump `scrapegraph-py` optional dep to `>=2.1.0`
  - Accept `SGAI_API_KEY` (new SDK default), with fallback to legacy
    `SCRAPEGRAPH_API_KEY` so existing users aren't broken
  - Tests cover success, API-level error (ApiResult.status=="error"),
    exception paths, missing-dep guard, and env-var resolution
  - Example rewritten to exercise the new surface

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

The ScrapeGraphAI integration has undergone a comprehensive upgrade from SDK v1 to v2. The tool's capability set has been reorganized—replacing legacy methods (smartscraper, searchscraper, markdownify, smartcrawler) with a streamlined v2 API featuring scrape, extract, search, crawl, monitor, credits, and health. Dependency constraints and examples have been updated accordingly.

Changes

Cohort / File(s) Summary
SDK Version Constraint
pyproject.toml
Updated scrapegraph-py dependency from >=1.46.0 to >=2.1.0 in both scrapegraph and all optional-dependency groups.
Tool Implementation
src/agentor/tools/scrapegraphai.py
Replaced six legacy capability methods with eight new v2 API methods. Introduced ScrapeFormat type alias, format configuration builder, result serialization helpers (_serialize(), _format_result()), and API key resolution logic supporting both SGAI_API_KEY and legacy SCRAPEGRAPH_API_KEY environment variables. Updated class description and error handling.
Example & Tests
examples/scrapegraphai_example.py, tests/tools/test_scrapegraphai.py
Updated example to demonstrate v2 capabilities with new method names and SGAI_API_KEY environment variable. Refactored test suite from mocking legacy SDK methods to testing _SGAIClient interface, introduced structured error responses, and updated registered-tools assertions to expect eight new method names.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The refactoring spans multiple files with heterogeneous logic changes—new method signatures across the tool class, SDK integration details, format configuration builders, and corresponding test suite updates. While changes follow a consistent pattern, each method requires separate verification of argument forwarding, result handling, and error management.

Poem

The old guard steps aside, the new order takes the stage,
Eight methods rise where six once played their game,
ScrapeGraphAI v2—sharper, cleaner, poised for war,
API keys renamed, dependencies refined once more, 🎯
Business moves like clockwork; the future's here to stay.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK' directly summarizes the primary change—migrating the ScrapeGraphAI tool to the new v2 SDK with updated capabilities and dependencies.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +127 to +133
prompt: Optional extraction prompt applied to the results.
"""
try:
result = self.client.search(query, num_results=num_results, prompt=prompt)
return _format_result(result, "search")
except Exception as e:
logger.exception("ScrapeGraphAI SearchScraper Error")
return f"Error in searchscraper: {str(e)}"
logger.exception("ScrapeGraphAI search error")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 JsonFormatConfig imported but never used

JsonFormatConfig is imported (and nulled out in the fallback) but never referenced in _FORMAT_BUILDERS, in crawl/monitor, or anywhere else. Ruff will flag this as F401 and fail the lint pre-commit hook / CI linter step. Either add "json" as a supported format in _FORMAT_BUILDERS/ScrapeFormat, or drop the import entirely.

Suggested change
prompt: Optional extraction prompt applied to the results.
"""
try:
result = self.client.search(query, num_results=num_results, prompt=prompt)
return _format_result(result, "search")
except Exception as e:
logger.exception("ScrapeGraphAI SearchScraper Error")
return f"Error in searchscraper: {str(e)}"
logger.exception("ScrapeGraphAI search error")
from scrapegraph_py import (
HtmlFormatConfig,
LinksFormatConfig,
MarkdownFormatConfig,
SummaryFormatConfig,
)

Comment on lines +139 to +144
url: str,
max_pages: int = 10,
max_depth: int = 2,
include_patterns: Optional[List[str]] = None,
exclude_patterns: Optional[List[str]] = None,
) -> str:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 JsonFormatConfig also missing from fallback

The except ImportError fallback block does not assign JsonFormatConfig = None. If the import is kept and the library is absent, any reference to JsonFormatConfig would raise a NameError rather than degrade gracefully. If the import is removed from the try-block (see above), also remove it from the fallback.

Suggested change
url: str,
max_pages: int = 10,
max_depth: int = 2,
include_patterns: Optional[List[str]] = None,
exclude_patterns: Optional[List[str]] = None,
) -> str:
_SGAIClient = None
MarkdownFormatConfig = None
HtmlFormatConfig = None
LinksFormatConfig = None
SummaryFormatConfig = None

Comment on lines 188 to +195
Args:
website_url: The URL of the website to crawl
user_prompt: Prompt describing what to extract (used when extraction_mode=True)
max_depth: Maximum depth of crawling (default: 1)
max_pages: Maximum number of pages to crawl (default: 3)
sitemap: Whether to use sitemap for crawling (default: True)
extraction_mode: Whether to use extraction mode (requires data_schema if True, default: False)
data_schema: Data schema for extraction (required if extraction_mode=True)
url: Page to monitor.
interval: Cron expression, e.g. "0 * * * *" for hourly.
name: Optional monitor name.
webhook_url: Optional webhook to receive change notifications.
"""
try:
crawl_params = {
"url": website_url,
"depth": max_depth,
"max_pages": max_pages,
"sitemap": sitemap,
"extraction_mode": extraction_mode,
}

# Include prompt and data_schema only when extraction_mode=True
if extraction_mode:
if data_schema is None:
raise ValueError(
"data_schema is required when extraction_mode=True"
)
crawl_params["prompt"] = user_prompt
crawl_params["data_schema"] = data_schema
response = self.client.crawl(**crawl_params)
return str(response)
result = self.client.monitor.create(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 self.api_key stores unresolved value

super().__init__(api_key) is called before resolved_key is computed, so self.api_key ends up holding None (or the raw, unresolved argument) even when the key was actually read from an environment variable. Any code that later reads tool.api_key to inspect the active credential will see None. Computing resolved_key first and then passing it to super().__init__ would keep the stored attribute consistent with what self.client was initialised with.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/scrapegraphai_example.py (1)

24-24: Small thing — the explicit os.environ.get(...) is redundant.

The constructor already resolves SGAI_API_KEY (and the legacy one) on its own, so passing it in from os.environ.get is belt-and-braces. Not wrong, just noise. You could simply write ScrapeGraphAI() here and let the tool do its job. Keep it if you prefer explicitness — no harm done.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/scrapegraphai_example.py` at line 24, The example instantiates
ScrapeGraphAI by explicitly passing os.environ.get("SGAI_API_KEY"), which is
redundant because the ScrapeGraphAI constructor already resolves SGAI_API_KEY
(and the legacy key) internally; update the instantiation to call
ScrapeGraphAI() with no arguments (i.e., remove the os.environ.get(...)
argument) so the constructor handles env var resolution itself, leaving the rest
of the example unchanged.
src/agentor/tools/scrapegraphai.py (3)

74-92: The parameter name format is shadowing a Python builtin — tidy it up.

Ruff's already whistling at us (A002) on line 74. It won't break anything today, but any code inside scrape that reaches for the builtin format() will be in for a surprise. Rename it, and the Literal type stays just as tight.

♻️ Proposed rename
-    def scrape(self, url: str, format: ScrapeFormat = "markdown") -> str:
+    def scrape(self, url: str, output_format: ScrapeFormat = "markdown") -> str:
         """Fetch a webpage and return its content in the requested format.
 
         Args:
             url: The URL to scrape.
-            format: One of "markdown", "html", "links", "summary". Defaults to markdown.
+            output_format: One of "markdown", "html", "links", "summary". Defaults to markdown.
         """
         try:
-            builder = _FORMAT_BUILDERS.get(format)
+            builder = _FORMAT_BUILDERS.get(output_format)
             if builder is None:
                 return (
-                    f"Error in scrape: unsupported format '{format}'. "
+                    f"Error in scrape: unsupported format '{output_format}'. "
                     "Use one of: markdown, html, links, summary."
                 )
             result = self.client.scrape(url, formats=[builder()])
             return _format_result(result, "scrape")

Mind you — this is a public capability signature, and the tests in tests/tools/test_scrapegraphai.py (lines 32, 43) and the example docstring currently call it as format=.... If you take this route, update those too, or slap a # noqa: A002 on the line and leave the signature alone. Your call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 74 - 92, The method scrape
currently uses the parameter name format which shadows the built-in format()
causing linter A002; rename the parameter (for example to out_format or fmt) in
the scrape signature (def scrape(self, url: str, out_format: ScrapeFormat =
"markdown") -> str), update all internal uses (the lookup
_FORMAT_BUILDERS.get(format) -> _FORMAT_BUILDERS.get(out_format) and any
references to format within the function such as the client.scrape call and
result formatting), and update the public/API callers and tests
(tests/tools/test_scrapegraphai.py, example docstring) to pass the new parameter
name or call positionally; if you prefer to keep the public name, remove the
change and instead add a `# noqa: A002` comment to the original parameter to
silence the linter.

81-225: Same try/except/log/format dance repeated eight times — worth a little decorator.

Every capability does the same thing: call the SDK, format the result, catch Exception, log, return f"Error in <name>: ...". It's clean enough now, but the next capability you add will copy-paste the same seven lines. A thin wrapper keeps the intent obvious and the surface tight.

♻️ Sketch of a wrapper
from functools import wraps

def _safe_capability(name: str):
    def deco(fn):
        `@wraps`(fn)
        def inner(self, *args, **kwargs):
            try:
                result = fn(self, *args, **kwargs)
                return _format_result(result, name)
            except Exception as e:
                logger.exception("ScrapeGraphAI %s error", name)
                return f"Error in {name}: {e}"
        return inner
    return deco

Then each capability just returns the raw SDK result (or the unsupported-format string, which would need a small tweak). Not a blocker — file as "next time you're in here."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 81 - 225, Introduce a small
decorator (e.g. _safe_capability) and apply it to each capability method
(scrape, extract, search, crawl, get_crawl_result, monitor, credits, health) to
centralize the try/except/log/_format_result pattern: the decorator should call
the wrapped method, if the return is a string (existing error message like the
unsupported format case in scrape) return it unchanged, otherwise call
_format_result(result, name); on exception log with
logger.exception("ScrapeGraphAI %s error", name) and return f"Error in {name}:
{e}". Update each capability to return the raw SDK result (or the existing
string error) and remove the repeated try/except blocks so the decorator handles
them.

186-205: The monitor method locks formats to Markdown—same concern as crawl. Worth discussing.

Right, listen. You've spotted something worth noting here. The scrape method lets callers pick their format using that ScrapeFormat knob. Markdown, HTML, links, summary—the lot. But monitor and crawl both hardcode MarkdownFormatConfig() with no way round it. The web confirms monitor.create supports the formats parameter, so the capability's there—it's just not wired up.

It's workable as is, mind you. Markdown's a sensible default for scheduled monitors. But if an agent needs to ask for HTML or a summary on a scheduled run, they've got nothing. The _FORMAT_BUILDERS mapping already exists and handles all four formats cleanly.

The suggested implementation follows the scrape pattern directly: add format: ScrapeFormat = "markdown" to the signature, use the builder, handle unsupported formats properly. Straightforward piece of work, no complications. Low priority for now, but worth considering when you next touch this code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 186 - 205, The monitor
method currently hardcodes Markdown by passing MarkdownFormatConfig() to
self.client.monitor.create; change the signature of monitor (the monitor method)
to accept a format: ScrapeFormat = "markdown" parameter, use the existing
_FORMAT_BUILDERS mapping to build the appropriate format config (like scrape
does), replace the hardcoded MarkdownFormatConfig() with the builder output, and
raise/handle an error if the provided format is unsupported before calling
self.client.monitor.create so monitors can be scheduled in HTML/links/summary as
well as markdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agentor/tools/scrapegraphai.py`:
- Around line 66-71: The code silently passes a None API key into _SGAIClient by
assigning resolved_key from api_key or env vars; add a guard after computing
resolved_key (before calling _SGAIClient) to check if resolved_key is falsy and
raise a clear exception (e.g., ValueError or RuntimeError) with a message
instructing the caller to provide api_key or set
SGAI_API_KEY/SCRAPEGRAPH_API_KEY; update the instantiation site where
self.client = _SGAIClient(api_key=resolved_key) to run only after the check so
the error is explicit and not a downstream SDK stack trace.

---

Nitpick comments:
In `@examples/scrapegraphai_example.py`:
- Line 24: The example instantiates ScrapeGraphAI by explicitly passing
os.environ.get("SGAI_API_KEY"), which is redundant because the ScrapeGraphAI
constructor already resolves SGAI_API_KEY (and the legacy key) internally;
update the instantiation to call ScrapeGraphAI() with no arguments (i.e., remove
the os.environ.get(...) argument) so the constructor handles env var resolution
itself, leaving the rest of the example unchanged.

In `@src/agentor/tools/scrapegraphai.py`:
- Around line 74-92: The method scrape currently uses the parameter name format
which shadows the built-in format() causing linter A002; rename the parameter
(for example to out_format or fmt) in the scrape signature (def scrape(self,
url: str, out_format: ScrapeFormat = "markdown") -> str), update all internal
uses (the lookup _FORMAT_BUILDERS.get(format) ->
_FORMAT_BUILDERS.get(out_format) and any references to format within the
function such as the client.scrape call and result formatting), and update the
public/API callers and tests (tests/tools/test_scrapegraphai.py, example
docstring) to pass the new parameter name or call positionally; if you prefer to
keep the public name, remove the change and instead add a `# noqa: A002` comment
to the original parameter to silence the linter.
- Around line 81-225: Introduce a small decorator (e.g. _safe_capability) and
apply it to each capability method (scrape, extract, search, crawl,
get_crawl_result, monitor, credits, health) to centralize the
try/except/log/_format_result pattern: the decorator should call the wrapped
method, if the return is a string (existing error message like the unsupported
format case in scrape) return it unchanged, otherwise call
_format_result(result, name); on exception log with
logger.exception("ScrapeGraphAI %s error", name) and return f"Error in {name}:
{e}". Update each capability to return the raw SDK result (or the existing
string error) and remove the repeated try/except blocks so the decorator handles
them.
- Around line 186-205: The monitor method currently hardcodes Markdown by
passing MarkdownFormatConfig() to self.client.monitor.create; change the
signature of monitor (the monitor method) to accept a format: ScrapeFormat =
"markdown" parameter, use the existing _FORMAT_BUILDERS mapping to build the
appropriate format config (like scrape does), replace the hardcoded
MarkdownFormatConfig() with the builder output, and raise/handle an error if the
provided format is unsupported before calling self.client.monitor.create so
monitors can be scheduled in HTML/links/summary as well as markdown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daf1abfc-4df7-4ad7-8d54-5e3b9ce95b6c

📥 Commits

Reviewing files that changed from the base of the PR and between 8eab1f4 and e9732e8.

📒 Files selected for processing (4)
  • examples/scrapegraphai_example.py
  • pyproject.toml
  • src/agentor/tools/scrapegraphai.py
  • tests/tools/test_scrapegraphai.py

Comment on lines +66 to +71
resolved_key = (
api_key
or os.environ.get("SGAI_API_KEY")
or os.environ.get("SCRAPEGRAPH_API_KEY")
)
self.client = _SGAIClient(api_key=resolved_key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent fallback to None when no API key is resolved.

If the caller passes nothing and neither env var is set, resolved_key quietly becomes None and gets shovelled into the SDK. The SDK will eventually bark, but the error won't be as clean as the one we raise for missing deps just above. A quick guard here saves a confusing stack trace down the road.

🛡️ Proposed guard
         resolved_key = (
             api_key
             or os.environ.get("SGAI_API_KEY")
             or os.environ.get("SCRAPEGRAPH_API_KEY")
         )
+        if not resolved_key:
+            raise ValueError(
+                "ScrapeGraphAI API key not provided. Pass `api_key=...` or set "
+                "SGAI_API_KEY (or legacy SCRAPEGRAPH_API_KEY) in the environment."
+            )
         self.client = _SGAIClient(api_key=resolved_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 66 - 71, The code silently
passes a None API key into _SGAIClient by assigning resolved_key from api_key or
env vars; add a guard after computing resolved_key (before calling _SGAIClient)
to check if resolved_key is falsy and raise a clear exception (e.g., ValueError
or RuntimeError) with a message instructing the caller to provide api_key or set
SGAI_API_KEY/SCRAPEGRAPH_API_KEY; update the instantiation site where
self.client = _SGAIClient(api_key=resolved_key) to run only after the check so
the error is explicit and not a downstream SDK stack trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant