feat: AsyncPlasmateCrawlerStrategy β lightweight alternative to Playwright (no Chrome)#1906
feat: AsyncPlasmateCrawlerStrategy β lightweight alternative to Playwright (no Chrome)#1906dbhurley wants to merge 1 commit intounclecode:developfrom
Conversation
β¦laywright Closes unclecode#1256 (memory leak in Docker from Chrome) Related to unclecode#1874 (token usage tracking) Plasmate (https://github.com/plasmate-labs/plasmate) is an open-source Rust browser engine that replaces Chrome/Playwright for static pages. No browser process, ~64MB RAM vs ~300MB, 10-100x fewer tokens per page. Changes: - crawl4ai/async_plasmate_strategy.py: AsyncPlasmateCrawlerStrategy - Implements AsyncCrawlerStrategy ABC (drop-in replacement) - Supports output_format: text (default), markdown, som, links - Supports --selector, --header, --timeout flags - Optional fallback_to_playwright=True for JS-heavy SPAs - Subprocess runs in asyncio executor β safe for concurrent use - crawl4ai/__init__.py: export AsyncPlasmateCrawlerStrategy - tests/general/test_plasmate_strategy.py: 20 unit tests Install: pip install plasmate Usage: from crawl4ai import AsyncWebCrawler from crawl4ai.async_plasmate_strategy import AsyncPlasmateCrawlerStrategy strategy = AsyncPlasmateCrawlerStrategy( output_format="markdown", fallback_to_playwright=True, # SPA safety net ) async with AsyncWebCrawler(crawler_strategy=strategy) as crawler: result = await crawler.arun("https://docs.python.org/3/")
|
A Chrome-free crawling strategy is a big deal for containerized environments where Playwright's Chromium binary is a pain to manage β image sizes, security patches, and arm64 compatibility all improve significantly. The drop-in interface matching |
|
Hi @unclecode β bumping this gently three weeks in. Picking up on @mshi-hacks's note above: the chrome-free angle is the bigger story than the integration itself. Container size, ARM64 support, and the security-patch treadmill on Chromium binaries are real pain points for anyone running Crawl4AI on Cloud Run / Lambda / Fly / Pi-class hardware. Happy to:
No rush β just want to make sure it's not stuck behind something I can clear. Thanks for the project. |
Summary
Adds
AsyncPlasmateCrawlerStrategyβ a drop-in alternative toAsyncPlaywrightCrawlerStrategyusing Plasmate instead of Chrome.Directly addresses:
What Plasmate is
Open-source Rust browser engine (Apache 2.0). Fetches pages and returns them as Structured Object Model (SOM) β a compact, semantically clean representation with nav, ads, cookie banners, and boilerplate stripped. Install:
pip install plasmate.Compression measured across 45 real sites: 17.7Γ average, 77Γ peak. Every token saved before the LLM is a direct cost reduction.
Drop-in usage
What changed
crawl4ai/async_plasmate_strategy.pyAsyncPlasmateCrawlerStrategyimplementingAsyncCrawlerStrategyABCcrawl4ai/__init__.pyAsyncPlasmateCrawlerStrategytests/general/test_plasmate_strategy.pyComparison
fallback_to_playwright=True)playwright install(~300MB browser)pip install plasmateNotes
AsyncPlaywrightCrawlerStrategyusage is untouchedfallback_to_playwright=Truemakes it safe for mixed static/SPA crawlsgather()calls