An open, actively maintained collection of Scrapy spiders for harvesting structured product data from major retailers. Spiders are purpose-built per retailer with bootstrap/API/HTML fallback logic where needed.
This repository is actively maintained by OpenClaw AI Agents (with human oversight).
pip install common-scrapypip install common-scrapy adds a common-scrapy console script so you can work with the packaged spider without cloning the repo.
PROXY=http://user:pass@host:1234 common-scrapy crawl amazon_listing -a category=fashionAll spiders honor PROXY via project-wide middleware.
common-scrapy listcommon-scrapy crawl <identifier> [--category <category>] [additional Scrapy args]
--categoryis required for listing spiders. If omitted, the CLI prints available categories for that spider.
Examples:
common-scrapy crawl target_search --category 5xtc0 -a max_pages=2 -O target.jsonlcommon-scrapy crawl kohls_listing --category women -a max_pages=1 -O kohls_listing.jsonlcommon-scrapy crawl sephora_listing --category makeup -a max_pages=1 -O sephora_listing.jsonl
All extra args are forwarded to scrapy crawl unchanged (feeds, settings overrides, etc.).
These live under common/spiders/*_listing_spider.py and are purpose-built per retailer.
Spiders below are returning items in recent smoke runs:
| Spider Name | Status | Method | Antibot | Description | Number of items output | Spider Categories | Sample output |
|---|---|---|---|---|---|---|---|
ae_listing |
Experimental | html | Akamai (signals in headers) | American Eagle listing spider via category-page product cards. | 30 (ok) | women-tops, women-jeans, men-tops | {"item_id":"1457_2980_808","title":null,"url":"https://www.ae.com/us/en/p/women/hoodies-sweatshirts/crew-neck-sweatshirts/ae-big-hug-v-neck-sweatshirt/1457_2980_808","price":nul... |
amazon_listing |
Active | html | none detected | Amazon category listing spider (category shortcuts). | 22 (ok) | electronics, fashion, beauty, home-kitchen, toys-games, sports-outdoors, grocery, books | {"asin":"B0DKDTBBF7","title":"2 Packs Electric Candle Lighters, Windproof Flameless USB Rechargeable Plasma Arc Long Lighter for Grill Fi... |
amazon_search |
Active | html | none detected | Amazon keyword search spider. | 22 (ok) | - | {"asin":"B0GHQRV71M","title":"16\" FHD IPS Laptop Computer - 16GB RAM 512GB SSD, Pentium N100(Beat to i3-1115G4, 4 Cores Up to 3.4GHz), B... |
bestbuy_listing |
Flaky | bootstrap + html | unknown (timeout/no verdict) | Best Buy listing via Playwright + Apollo cache extract. | 10 (skipped2) | laptops, tvs, headphones, monitors, cell-phones | {"item_id":"6572184","title":"Samsung - Galaxy Book4 15.6\" FHD Laptop - Intel Core 7- 16GB Memory - 512GB SSD - Silver","url":"https://www.bestbuy.com/product/samsung-galaxy-bo... |
bestbuy_search |
Flaky | bootstrap + html | unknown (timeout/no verdict) | Best Buy search via Playwright + Apollo cache extract. | 4 (skipped2) | - | {"item_id":"6613879","title":"HP - 14\" Laptop - Intel Processor N150 2025 - 4GB Memory - 128GB UFS - Willow Green","url":"https://www.bestbuy.com/product/hp-14-laptop-intel-pro... |
bloomingdales_listing |
Experimental | html + nuxt-state | Akamai | Bloomingdale's listing spider via direct HTML/state extraction (resilient parser). | 8 (ok) | women, men, shoes, beauty, home | {"item_id":"5973765","title":"Tumbled Woven Verne Pants","url":"https://www.bloomingdales.com/shop/product/cinq-a-sept-tumbled-woven-vern... |
costco_listing |
Active | bootstrap + html | Akamai | Costco category listing with state extraction + fallback. | 24 (ok) | coffee, water, snacks, vitamins, laundry, paper-products | {"item_id":"100501081","title":null,"url":"https://www.costco.com/starbucks-pike-place-medium-roast-k-cup-72-count.product.100501081.html","price":null,"currency":null,"brand":n... |
ebay_listing |
Flaky | bootstrap + html | Akamai | eBay category listing via __NEXT_DATA__ + fallback. |
4 (ok) | laptops, cell-phones, headphones, watches, video-games | {"item_id":null,"title":"Apple MacBook Air 13.3'' (256GB SSD, Apple M1, 8GB RAM) Laptop - Space Gray - MGN63LL/A (2020)","url":"https://w... |
ebay_search |
Flaky | bootstrap + html | Akamai | eBay keyword search via __NEXT_DATA__ + JSON-LD + HTML fallback (filters promo/non-item cards). |
60 (ok, VPN-dependent) | - | {'item_id':'286393092388','title':'Dell Latitude Laptop Computer PC Intel i5 Up To 32GB RAM 1TB SSD Windows 11',...} |
elfcosmetics_listing |
Experimental | api + bootstrap + html | none detected (CloudFront CDN only) | e.l.f. Cosmetics multi-mode listing spider. | 6 (ok) | face, eyes, lips | {'item_id':'300261','title':'Soft Glam Satin Concealer','url':'https://www.elfcosmetics.com/soft-glam-satin-concealer/300262.html','price':9.0,'brand':'e.l.f. Cosmetics','source':'elfcosmetics_preloaded_state'...} |
fashionnova_listing |
Active | api + html | Cloudflare | Fashion Nova listing via Shopify Storefront GraphQL with HTML fallback. | 48 (ok) | women, new, dresses, jeans, sale | {"item_id":"175898317","title":"Classic High Waist Skinny Jeans - Dark Denim","url":"https://www.fashionnova.com/products/dark-blue-class... |
homedepot_search |
Active | bootstrap + html | Akamai | Home Depot keyword search via Apollo state. | 24 (ok) | - | {"item_id":"336787835","sku":"1014334650","brand":"Lukyamzn","title":"14 in. Dual-Core Celeron N4000 Laptop 6 GB RAM 128 GB SSD IPS Displ... |
jcpenney_listing |
Active | api | Akamai (+ reCAPTCHA scripts observed) | JCPenney listing spider via search API bootstrap endpoint. | 48 (ok) | womens_tops, mens_shirts | {"item_id":"ppr5008584232","title":"St. John's Bay Womens Boat Neck Elbow Sleeve T-Shirt","brand":"st. john's bay","url":"https://www.jcp... |
kroger_listing |
Active | bootstrap + html | unknown (timeout/no verdict) | Kroger category listing with search fallback path. | 31 (ok) | cereal, milk, eggs, bread, coffee, snacks | {'item_id':'kroger-vitamin-d-whole-milk-gallon','url':'https://www.kroger.com/p/kroger-vitamin-d-whole-milk-gallon/0001111040101','source':'kroger_html_links_fallback'} |
kroger_search |
Active | bootstrap + html | unknown (timeout/no verdict) | Kroger keyword search with state extraction + fallback. | 27 (ok) | - | {'item_id':'kroger-2-reduced-fat-milk-gallon','url':'https://www.kroger.com/p/kroger-2-reduced-fat-milk-gallon/0001111041700','source':'kroger_html_links_fallback'} |
lululemon_listing |
Active | bootstrap | Akamai | lululemon listing spider via Next.js __NEXT_DATA__. |
40 (ok) | women-shorts, women-leggings, men-shorts, bags | {"category":"women-shorts","product_id":"prod11860112","name":"Shake It Out High-Rise Running Short 2.5\"","brand":"lululemon","price":["... |
maccosmetics_listing |
Experimental | api + bootstrap + html | Akamai | MAC Cosmetics multi-mode listing spider. | 66 (ok) | face, lips, eyes | {"item_id":"13854","title":"4.8/5 ( 452 ) Lustreglass Sheer-Shine Lipstick Sheer Coverage, Glossy/High-Shine Finish, Infused With Raspberry Seed/Organic Extra Virgin Olive Oils ... |
macys_listing |
Active | api | Akamai | Macy’s listing via xapi endpoint (with fallback routing). | 60 (ok) | laptops, shoes, dresses, fragrance, bedding | {"item_id":"17595303","title":"5Core AC Power Cord 6Ft 3 Prong US Male to Female Extension Adapter 18AWG 10A 7A 125V","brand":"5 Core","u... |
poshmark_listing |
Experimental | bootstrap | none detected | Poshmark listing spider via window.__INITIAL_STATE__ category grid data. |
48 (ok) | women, men, kids, home, electronics, pets | {"category":"women","item_id":"6989d90ac4e7b4d4de556bac","title":"🔥Stunning Farm Rio NWT Size Large Tropical Midi Dress with Sleeves – V... |
qvc_listing |
Experimental | html | Akamai | QVC listing spider via direct category HTML parsing. | 102 (ok) | beauty, fashion, home, kitchen | {"item_id":"A711188","title":"lwya by kim gravel balm bae center core lip balm quad","url":"https://www.qvc.com/lwya-by-kim-gravel-balm-b... |
saksfifthavenue_listing |
Experimental | html | DataDome | Saks Fifth Avenue listing spider via direct category HTML cards. | 24 (ok) | women, men, shoes, beauty, handbags | {"item_id":"0400026449047","title":"Prada Washed Re Nylon Rain Jacket","url":"https://www.saksfifthavenue.com/product/prada-washed-re-nyl... |
sallybeauty_listing |
Experimental | api + bootstrap + html | PerimeterX / HUMAN (px-captcha signals) | Sally Beauty multi-mode listing spider. | 1 (ok) | hair-color, hair-care, nails | {"item_id":null,"title":"What's the issue? We’re dedicated to keeping SallyBeauty.com safe from bots and other malicious software. Sometimes a technical issue with your internet... |
sephora_listing |
Experimental | api | Akamai | Sephora listing via /api/v2/catalog/categories/<slug>/seo. |
60 (ok) | makeup, skincare, gifts, fragrance | {"item_id":"P517483","title":"Pocket Blush Buildable Hydrating Cream Blush","url":"https://www.sephora.com/product/pocket-blush-P517483?s... |
stockx_listing |
Experimental | bootstrap + html | Cloudflare | StockX listing via __NEXT_DATA__ bootstrap. |
41 (ok) | sneakers, apparel, electronics, trading-cards, collectibles | {"item_id":"brands","title":"Brands","url":"https://stockx.com/brands","price":null,"currency":null} |
target_listing |
Active (alias) | api | PerimeterX / HUMAN (cookie signals) | Deprecated alias of target_search. |
24 (ok) | - | {"product_id":"90600286","name":"Women's Waffle Short Robe - Auden™ Light Gray M/L: Front Tie, Long Sleeve","price":"$35.00","u... |
target_search |
Active | api | PerimeterX / HUMAN (cookie signals) | Target RedSky search API spider. | 24 (ok) | - | {"product_id":"90600286","name":"Women's Waffle Short Robe - Auden™ Light Gray M/L: Front Tie, Long Sleeve","price":"$35.00","u... |
ulta_search |
Active | api + html | Akamai | Ulta keyword search via GraphQL (with unsorted retry + HTML fallback). | 64 (ok) | - | {"item_id":"xlsImpprod15511061","title":"All Soft Shampoo","source":"ulta_dxl_graphql"...} |
walmart_listing |
Active | api + html | Akamai (+ PerimeterX/HUMAN signals) | Walmart category listing spider (direct API+HTML flow). | 45 (ok) | electronics, home, clothing, beauty, toys, sports-and-outdoors, grocery | {"item_id":"375041225","title":"Restaurado Apple iPhone 12 Restaurado - Desbloqueado para Cualquier Operador - 64GB Negro (Reacondicionad... |
walmart_search |
Active | api + html | Akamai (+ PerimeterX/HUMAN signals) | Walmart keyword search spider. | 12 (ok) | - | {"item_id":"13542163431","title":"ASUS Vivobook Go 15.6” Laptop, Intel i3-N305, 8GB, 256GB, Windows 11 Home in S mode, Cool Silver, E1504... |
These are still being worked on and currently returned 0 items in recent smoke runs:
| Spider Name | Status | Method | Antibot | Description | Number of items output | Spider Categories | Sample output |
|---|---|---|---|---|---|---|---|
anthropologie_listing |
Experimental | api + html | PerimeterX / HUMAN | Anthropologie listing spider (API + HTML fallback). | 0 (ok) | women, dresses, sale | n/a |
bathandbodyworks_listing |
Experimental | api + bootstrap + html | PerimeterX / HUMAN (px-captcha) | Bath & Body Works multi-mode listing spider. | 0 (ok) | body-care, home-fragrance, hand-soaps | {} |
costco_search |
Active | bootstrap + html | Akamai | Costco keyword search with state extraction + fallback. | 0 (skipped2) | - | {} |
dillards_listing |
Experimental | bootstrap | Akamai | Dillard's listing spider via window.__INITIAL_STATE__. |
0 (ok) | women, men, shoes, handbags, beauty, juniors, home | n/a |
homedepot_listing |
Flaky | bootstrap + html | Akamai | Home Depot category listing via Apollo state. | 0 (ok) | screwdrivers, drills, paint, light-bulbs, lumber | n/a |
kohls_listing |
Experimental | api | Akamai (Cloudflare challenge assets also observed) | Kohl’s listing via /web/catalog/... API. |
0 (ok) | women, men, sale | n/a |
nordstrom_listing |
Experimental | bootstrap + html | PerimeterX / HUMAN | Nordstrom listing parser; often blocked/changed. | 0 (timeout2) | women, men, kids, beauty, home, designer, sale | {} |
nordstromrack_listing |
Experimental | playwright + html | PerimeterX / HUMAN | Nordstrom Rack listing spider via rendered category pages. | 0 (skipped2) | dresses, women, men, shoes | {} |
ulta_listing |
Active | api + html | Akamai | Ulta category listing (GraphQL default, HTML fallback mode). | 0 (ok) | shampoo, conditioner, cleanser, mascara, moisturizer | n/a |
Number of items output reflects recent local smoke runs (typically max_pages=1) and can vary by location, anti-bot behavior, and site changes.
Many listing spiders accept -a category=<name> shortcuts (in addition to -a category_url=<url>), including Amazon, Walmart, eBay, Home Depot, Best Buy, Costco, and Kroger.
Below are trimmed examples from recent local test runs (JSONL output, 1 item shown).
{
"asin": "B08NF2W2V2",
"title": "INZCOU",
"price": 36.98,
"url": "https://www.amazon.com/s?k=sneakers",
"image_url": "https://m.media-amazon.com/images/I/71Akg8OEbXL._AC_UL320_.jpg"
}Run example:
common-scrapy crawl amazon_search -a q=sneakers -a max_pages=1 -O amazon_search.jsonl
Supported built-in categories:
electronics, fashion, beauty, home-kitchen, toys-games, sports-outdoors, grocery, books.
Notes:
- Uses Amazon search query URLs (
/s?k=...) for category shortcuts. - If a page returns no cards, spider logs a warning with URL/title to help diagnose layout/response changes.
{
"asin": "B00008BFZH",
"title": "Snap Circuits Jr. SC-100 Electronics Exploration Kit, Over 100 Projects, Full Color Project Manual, 28 Parts, STEM Educational Toy for Kids 8 +",
"url": "https://www.amazon.com/Snap-Circuits-SC-100-Electronics-Exploration/dp/B00008BFZH/ref=sr_1_1?...",
"image_url": "https://m.media-amazon.com/images/I/91THy3rMlCL._AC_UY218_.jpg",
"price": 29.98,
"rating": 4.8,
"reviews_count": 28851,
"is_prime": false,
"is_sponsored": false
}Run example:
common-scrapy crawl amazon_listing -a category=electronics -a max_pages=1 -O amazon_cat.jsonl
{
"item_id": null,
"title": "Restored Dell Latitude 3190 | 11.6\" Touchscreen Laptop PC | Intel Core Pentium Silver N5030 (1.1 GHz) | 8GB RAM | 128GB SSD | Windows 11 Pro $178.00",
"price": 178.0,
"url": "https://www.walmart.com/sp/track?...",
"image_url": "https://i5.walmartimages.com/seo/...jpeg?odnHeight=576&odnWidth=576&odnBg=FFFFFF"
}Run example:
common-scrapy crawl walmart_listing -a category=electronics -a max_pages=1 -O walmart.jsonl
{
"item_id": "13542163431",
"title": "ASUS Vivobook Go 15.6” Laptop, Intel i3-N305, 8GB, 256GB, Windows 11 Home in S mode, Cool Silver, E1504GA-WS35",
"url": "https://www.walmart.com/sp/track?.../ip/.../13542163431",
"image_url": "https://i5.walmartimages.com/seo/...jpeg?odnHeight=288&odnWidth=288&odnBg=FFFFFF",
"price": 269.0,
"rating": null,
"reviews_count": null,
"is_sponsored": false,
"source": "walmart_html"
}Run example:
common-scrapy crawl walmart_search -a q=laptop -a max_pages=1 -O walmart_search.jsonl
Notes:
- Uses the same HTML parser as
walmart_listing. - Walmart frequently serves a "Robot or human?" challenge depending on IP/proxy reputation; when blocked, no items are emitted and the spider logs a warning.
- Browser inspection on
https://www.walmart.com/search?q=laptopconfirmed product cards + price blocks are present in rendered HTML in this runtime. - NordVPN US city checks (
max_pages=1,q=laptop) returned stable output across Ashburn (us11646), Dallas (us9147), and Los Angeles (us5381) with 13 items each.
{
"item_id": "286393092388",
"title": "Dell Latitude Laptop Computer PC Intel i5 Up To 32GB RAM 1TB SSD Windows 11",
"url": "https://www.ebay.com/itm/286393092388?...",
"price": 237.36,
"currency": "USD",
"seller": "discountcomputerdepot 99.2% positive (153.3K)",
"source": "ebay_html_cards_fallback"
}Run example:
common-scrapy crawl ebay_search -a q=laptop -a max_pages=1 -O ebay_search.jsonl
Notes:
- Added filtering for non-listing promo cards (e.g. "Shop on eBay") so fallback HTML parsing only yields real
/itm/<id>products. - US NordVPN city test (Chicago
us11915) returned 60 items withmax_pages=1. - No-VPN run in this environment often returns 0 items (HTTP 500/anti-bot), so eBay spiders remain marked Flaky.
{
"item_id": null,
"title": "Apple MacBook Air 13.3'' (256GB SSD, Apple M1, 8GB RAM) Laptop - Space Gray - MGN63LL/A (2020)",
"url": "https://www.ebay.com/p/9055828714",
"price": null,
"currency": null,
"image_url": "https://i.ebayimg.com/images/g/vtYAAOSwA3tnk8aQ/s-l400.webp",
"source": "ebay_jsonld_fallback",
"mode": "category",
"category_url": "https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276",
"page": 1,
"source_url": "https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_ipg=60&_pgn=1"
}Run example:
common-scrapy crawl ebay_listing -a category='laptops' -a max_pages=1 -O ebay_listing.jsonl
Notes:
- Proxy-rendered category HTML may omit
__NEXT_DATA__and serve unquotedtype=application/ld+jsonblocks. - Spider now parses
ItemList/ProductJSON-LD fallback in that HTML mode.
{
"item_id": "204663533",
"sku": "1000024249",
"brand": "Husky",
"title": "Screwdriver Set (2-Piece)",
"model": "246340020",
"url": "https://www.homedepot.com/p/Husky-Screwdriver-Set-2-Piece-246340020/204663533",
"image_url": "https://images.thdstatic.com/productImages/08052130-f21b-4366-93a8-9faecad0ba34/svn/husky-screwdriver-sets-246340020-64_300.jpg",
"price": 6.97,
"original_price": 6.97,
"rating": 4.63,
"reviews_count": 227,
"source": "homedepot_apollo_bootstrap",
"mode": "keyword",
"query": "screwdriver",
"category_url": null,
"page": 1
}Run example:
common-scrapy crawl homedepot_search -a q='screwdriver' -a max_pages=1 -O homedepot_search.jsonl
Sample output (2026-03-01, no VPN):
[]Run example:
common-scrapy crawl homedepot_listing -a category='screwdrivers' -a max_pages=1 -O homedepot_listing.jsonl
Observed logs:
HomeDepot listing fallback found 0 items (status=403)- feed written with
0 items
Notes:
- Category and search/listing pages are currently blocked in this environment with Home Depot anti-bot (
403,Oops/Error Page, orAccess Denied) across multiple US NordVPN cities and sort orders. - Current fallback paths (
__APOLLO_STATE__, JSON-LD, and HTML product links) cannot extract items when the origin serves only error/deny responses. - If this target is business-critical, use a dedicated unblocker/browser-rendering pipeline or provide an approved alternative data source.
{
"item_id": "25092672",
"title": "Floral Stickers Laptop, 74 Pcs, Stickers for Water Bottles,",
"brand": "Mr. Pen",
"price": 6.99,
"price_text": "$6.99",
"url": "https://www.macys.com/shop/product/floral-stickers-laptop-74-pcs-stickers-for-water-bottles?ID=25092672",
"image_url": "7/optimized/34925717_fpx.tif",
"source": "macys_xapi_discover_v1_page"
}{
"item_id": "2565096",
"sku_id": "2565096",
"brand": null,
"title": "3 sizes Hydrate Shampoo for Dry Hair $12.00 - $90.00 Add to bag",
"list_price": "$12.00 - $90.00",
"sale_price": null,
"url": "https://www.ulta.com/p/hydrate-shampoo-dry-hair-pimprod2017791?sku=2565096",
"image_url": "https://media.ultainc.com/i/ulta/2565096?w=200&$ProductCardNeutralBGLight$&h=200&fmt=auto",
"source": "ulta_direct_html",
"mode": "category_html"
}Run examples:
- GraphQL mode (default):
common-scrapy crawl ulta_listing -a category='shampoo' -a max_pages=1 -O ulta.jsonl - HTML mode:
common-scrapy crawl ulta_listing -a category='shampoo' -a mode=html -a max_pages=1 -O ulta_html.jsonl
Notes:
- GraphQL mode now retries once without
sortwhen Ulta blocks sorted requests (e.g.sort=new,sort=price_low) before falling back to HTML. mode=htmlis a fallback parser from rendered product cards and is useful when GraphQL responses are unstable.- HTML mode typically returns URL/title/image/price text first; GraphQL mode gives richer normalized fields (brand/rating/reviews/sponsored).
- Validation (2026-03-01): GraphQL mode returned
64items across NordVPN US cities (Dallas, Atlanta, Chicago) forq=shampoo,max_pages=1; with NordVPN disconnected Ulta returned403and0items (including HTML fallback).
{
"item_id": "xlsImpprod15511061",
"sku_id": "2580410",
"brand": "Redken",
"title": "All Soft Shampoo",
"url": "https://www.ulta.com/p/all-soft-shampoo-xlsImpprod15511061?sku=2580410",
"image_url": "https://media.ultainc.com/i/ulta/2580410",
"list_price": "$11.00 - $56.00",
"rating": 4.1,
"reviews_count": 1601,
"is_sponsored": false,
"source": "ulta_dxl_graphql"
}Run examples:
- GraphQL mode (recommended):
common-scrapy crawl ulta_search -a q=shampoo -a mode=graphql -a max_pages=1 -O ulta_search.jsonl - HTML fallback mode:
common-scrapy crawl ulta_search -a q=shampoo -a mode=html -a max_pages=1 -O ulta_search_html.jsonl
Notes:
mode=graphqlis the stable path for normalized fields.- For reliability, run via US residential egress/VPN (validated from NordVPN US Dallas).
{
"item_id": "12345678",
"title": "Women's ...",
"url": "https://www.kohls.com/product/prd-...",
"price": 29.99,
"regular_price": 39.99,
"sale_price": 29.99,
"brand": "SONOMA Goods for Life",
"source": "kohls_web_catalog_api"
}Run example:
common-scrapy crawl kohls_listing -a category=women -a max_pages=1 -O kohls_listing.jsonl
{
"item_id": "P517483",
"title": "Pocket Blush Buildable Hydrating Cream Blush",
"url": "https://www.sephora.com/product/pocket-blush-P517483?skuId=2895845",
"brand": "rhode",
"rating": 4.0598,
"reviews_count": 1153,
"source": "sephora_catalog_api"
}Run example:
common-scrapy crawl sephora_listing -a category=makeup -a max_pages=1 -O sephora_listing.jsonl
{
"item_id": "air-jordan-1-retro-high-og-chicago-lost-and-found",
"title": "Air Jordan 1 Retro High OG Chicago Lost and Found",
"url": "https://stockx.com/air-jordan-1-retro-high-og-chicago-lost-and-found",
"price": null,
"source": "stockx_next_data|stockx_html_links_fallback"
}Run example:
common-scrapy crawl stockx_listing -a category=sneakers -a max_pages=1 -O stockx_listing.jsonl
{
"item_id": "123456789",
"title": "Curve Appeal Maxi Dress - Black",
"url": "https://www.fashionnova.com/products/curve-appeal-maxi-dress-black",
"price": 39.99,
"currency": "USD",
"brand": "Fashion Nova",
"source": "fashionnova_storefront_graphql"
}Run examples:
common-scrapy crawl fashionnova_listing -a category=women -a max_pages=1 -O fashionnova_listing.jsonlcommon-scrapy crawl fashionnova_listing -a category=women -a mode=html -a max_pages=1 -O fashionnova_listing_html.jsonl
{
"item_id": "by-anthropologie-cotton-floral-cutwork-barn-jacket",
"title": "By Anthropologie Cotton Floral Cutwork Barn Jacket",
"url": "https://www.anthropologie.com/shop/by-anthropologie-cotton-floral-cutwork-barn-jacket?color=016&type=STANDARD",
"price": 198.0,
"currency": "USD",
"brand": "Anthropologie",
"source": "anthropologie_html",
"category_url": "https://www.anthropologie.com/womens-clothing",
"page": 1
}Run example:
common-scrapy crawl anthropologie_listing -a category=women -a max_pages=1 -O anthropologie_listing.jsonl
Notes:
- Verified after connecting via NordVPN US endpoints (Seattle, Chicago, Miami) and again with NordVPN disabled.
- HTML parsing is enabled by default; API/bootstrap was not required once the spider ignored recaptcha config noise.
Run example:
common-scrapy crawl lululemon_listing -a category=women-shorts -a max_pages=1 -O lululemon_listing.jsonl
Run example:
common-scrapy crawl jcpenney_listing -a category=womens_tops -a max_pages=1 -O jcpenney_listing.jsonl
Run example:
common-scrapy crawl dillards_listing -a category=women -a max_pages=1 -O dillards_listing.jsonl
{
"category": "women",
"item_id": "62bdd4097028ec9dd68ee867",
"title": "Size Large solid black yoga pants by Canta Bella",
"brand": "Canta Bella",
"url": "https://poshmark.com/listing/Size-Large-solid-black-yoga-pants-by-Canta-Bella-62bdd4097028ec9dd68ee867",
"price": 11.0,
"currency": "USD",
"source": "poshmark_bootstrap_initial_state"
}Run example:
common-scrapy crawl poshmark_listing -a category=women -a max_pages=1 -O poshmark_listing.jsonl
Notes:
- Verified while connected to NordVPN US endpoints (Seattle and Los Angeles).
- Category pages expose
window.__INITIAL_STATE__with listing records at$_category.gridData.data.
{
"item_id": "1234567",
"title": "AQUA ...",
"url": "https://www.bloomingdales.com/shop/product/...",
"price": 198.0,
"price_text": "$198.00",
"source": "bloomingdales_direct_html"
}Run example:
common-scrapy crawl bloomingdales_listing -a category=women -a max_pages=1 -O bloomingdales_listing.jsonl
{
"item_id": "A711188",
"title": "lwya by kim gravel balm bae center core lip balm quad",
"url": "https://www.qvc.com/lwya-by-kim-gravel-balm-bae-center-core-lip-balm-quad.product.A711188.html?sc=PRODFEED",
"price": 29.98,
"source": "qvc_direct_html"
}Run example:
common-scrapy crawl qvc_listing -a category=beauty -a max_pages=1 -O qvc_listing.jsonl
{
"item_id": "0400026449047",
"title": "Prada Washed Re Nylon Rain Jacket",
"url": "https://www.saksfifthavenue.com/product/prada-washed-re-nylon-rain-jacket-0400026449047.html?dwvar_0400026449047_color=GREY",
"price": 6200.0,
"price_text": "$6,200",
"source": "saksfifthavenue_direct_html"
}Run example:
common-scrapy crawl saksfifthavenue_listing -a category=women -a max_pages=1 -O saksfifthavenue_listing.jsonl
Notes:
- Saks is heavily anti-bot protected (DataDome). Direct HTTP requests may return
403challenge pages from some runtimes/IPs. - Treat this spider as best-effort/experimental; verify output quality in your target environment before relying on unattended runs.
{
"product_id": "xxxxx",
"name": "…",
"price": "$…",
"url": "https://www.target.com/p/...",
"image": "https://target.scene7.com/is/image/Target/..."
}Run example:
common-scrapy crawl target_search -a category=5xtc0 -a max_pages=1 -O target.jsonl
target_listing is a backwards-compatible alias of target_search.
Sample output:
{
"product_id": "81127431",
"name": "Women's Perfectly Cozy Jogger Pants - Stars Above™ Black M",
"price": "$22.00",
"url": "https://www.target.com/p/women-s-perfectly-cozy-jogger-pants-stars-above-black/-/A-81127431",
"image": "https://target.scene7.com/is/image/Target/GUEST_9f95ecf4-59f7-4008-b854-95380a6b6f89"
}Run example:
.venv/bin/scrapy crawl target_listing -a category=5xtc0 -a max_pages=1 -O target_listing.jsonl
Validation notes (2026-03-01):
- Browser-control tool was unavailable during this run, so sorting behavior was validated via direct RedSky API probes (
sortBy:relevance,newest,PriceHigh,PriceLow), all returning HTTP 200. - Fixed Target key extraction to parse escaped
apiKeyfrom bootstrap payload and use the first 32-hex chars forplp_search_v2. - Disabled proxy routing for Target spider requests (
disable_proxy) because the configured proxy path returned RedSky 404 for this endpoint. - Verified
target_listing(category=5xtc0,max_pages=1) returns 24 items with NordVPN US cities Ashburn and Dallas, and also while NordVPN is disconnected.
HTML-first Nordstrom listing spider that extracts products from embedded hydration data (window.__INITIAL_CONFIG__, with __NEXT_DATA__/generic JSON fallback).
Run example:
common-scrapy crawl nordstrom_listing -a category=women -a max_pages=1 -O nordstrom_listing.jsonl
Sample output:
{
"category": "women",
"product_id": 3865966,
"name": "Pure Luxe Underwire T-Shirt Bra",
"brand": "Natori",
"price": 29.6,
"url": "https://www.nordstrom.com/s/natori-pure-luxe-underwire-t-shirt-bra/3865966",
"image": "https://n.nordstrommedia.com/it/0777d4b6-d7ef-4809-84a5-36fe4da01aff.jpeg",
"rating": 4.5,
"reviews_count": 1715
}Validation notes (2026-02-25):
- Browser check showed live product cards rendering on
https://www.nordstrom.com/browse/women. - Confirmed while connected to NordVPN US cities: Ashburn, Seattle, and Dallas.
nordstrom_listing(category=women,max_pages=1) returned 81 items in this environment.- One run hit an initial
502 Bad Gatewaybut recovered via retry and completed successfully.
Playwright-rendered listing spider for Nordstrom Rack category pages.
Run example:
common-scrapy crawl nordstromrack_listing -a category=dresses -a max_pages=1 -O nordstromrack_listing.jsonl
Sample output:
{
"category": "dresses",
"product_id": "8210770",
"name": "One-Shoulder Front Twist A-Line Gown",
"url": "https://www.nordstromrack.com/s/marina-one-shoulder-front-twist-a-line-gown/8210770?origin=category-personalizedsort&breadcrumb=Home%2FWomen%2FClothing%2FDresses",
"image": "https://n.nordstrommedia.com/it/b91a2122-b716-4559-b990-4fc031eed7ca.jpeg?h=368&w=240&dpr=2",
"source_url": "https://www.nordstromrack.com/shop/women/clothing/dresses?page=1",
"page": 1,
"mode": "listing"
}Best Buy pages currently use Apollo hydration (not __NEXT_DATA__ on PLP/search). These spiders use Playwright to render the page, then extract normalized data from ApolloClientSingleton.cache.extract() (with inline bootstrap parsing fallback).
If Best Buy serves a challenge/error variant, output may still be empty; Playwright materially improves reliability versus plain HTTP fetch.
Run examples:
common-scrapy crawl bestbuy_search -a q='laptop' -a max_pages=1 -O bestbuy_search.jsonlcommon-scrapy crawl bestbuy_listing -a category=laptops -a max_pages=1 -O bestbuy_listing.jsonl
Validation notes (2026-02-25):
- Browser check confirmed live product cards rendered on
searchpage.jsp?st=laptop. bestbuy_search(max_pages=1) returned items withsource=bestbuy_apollo_bootstrapwhile connected to NordVPN US Dallas (us9157).- Also tested browser accessibility from NordVPN US Seattle (
us8242) and US Ashburn (us9510); listing pages still rendered.
bestbuy_search sample output:
{
"item_id": "10460842",
"title": "HP - 14\" Laptop - Intel Processor N150 2025 - 4GB Memory - 128GB UFS - Willow Green",
"url": "https://www.bestbuy.com/product/hp-14-laptop-intel-processor-n150-2025-4gb-memory-128gb-ufs-willow-green/JJGQJQR8CP",
"brand": null,
"price": 189.98,
"currency": "USD",
"rating": 4.6,
"reviews_count": 1551,
"image_url": "https://pisces.bbystatic.com/image2/BestBuy_US/images/products/90a8a03b-c474-416d-bb79-579d46bf34d5.jpg",
"source": "bestbuy_apollo_bootstrap",
"mode": "keyword",
"query": "laptop",
"page": 1,
"source_url": "https://www.bestbuy.com/site/searchpage.jsp?st=laptop&intl=nosplash"
}bestbuy_listing sample output:
{
"item_id": "6628354",
"title": "Dell - Plus - Copilot+ PC - 16\" 2K Touchscreen Laptop - AMD Ryzen AI 7 350 2025 - 32GB Memory - 1TB Storage - Ice Blue",
"url": "https://www.bestbuy.com/product/dell-plus-copilot-pc-16-2k-touchscreen-laptop-amd-ryzen-ai-7-350-2025-32gb-memory-1tb-storage-ice-blue/J3K4L63SVF/sku/6628354",
"brand": null,
"price": 799.99,
"currency": "USD",
"rating": 4.7,
"reviews_count": 439,
"image_url": "https://pisces.bbystatic.com/image2/BestBuy_US/images/products/7afd11ae-3eb7-46d2-ad3e-6690837b2fdd.jpg",
"source": "bestbuy_apollo_bootstrap",
"mode": "category",
"category_url": "https://www.bestbuy.com/site/all-laptops/laptops/abcat0502000.c?id=abcat0502000",
"page": 1,
"source_url": "https://www.bestbuy.com/site/all-laptops/laptops/abcat0502000.c?id=abcat0502000&cp=1&intl=nosplash"
}These spiders try bootstrap state extraction first (__NEXT_DATA__ / __APOLLO_STATE__), then fallback to JSON-LD and direct product-link HTML parsing.
Run examples:
common-scrapy crawl costco_search -a q='coffee' -a max_pages=1 -O costco_search.jsonlcommon-scrapy crawl costco_listing -a category='coffee' -a max_pages=1 -O costco_listing.jsonl
costco_search sample output:
{
"item_id": "100617983",
"title": null,
"url": "https://www.costco.com/lavazza-espresso-gran-crema-whole-bean-coffee-medium-22-lbs.product.100617983.html",
"price": null,
"currency": null,
"brand": null,
"rating": null,
"reviews_count": null,
"image_url": null,
"source": "costco_html_links_fallback",
"raw": null,
"mode": "keyword",
"query": "coffee",
"page": 1,
"source_url": "https://www.costco.com/s?keyword=coffee"
}costco_listing sample output:
{
"item_id": "100361434",
"title": null,
"url": "https://www.costco.com/kirkland-signature-colombian-coffee-dark-roast-3-lbs.product.100361434.html",
"price": null,
"currency": null,
"brand": null,
"rating": null,
"reviews_count": null,
"image_url": null,
"source": "costco_html_links_fallback",
"raw": null,
"mode": "category",
"category_url": "https://www.costco.com/coffee.html",
"page": 1,
"source_url": "https://www.costco.com/coffee.html"
}Notes:
- Browser HTML inspection confirms Costco search results render product links for
keyword=coffeein this runtime. - NordVPN US city variance observed while testing
costco_search(max_pages=1): Ashburn (us9512) → 24 items, Los Angeles (us5864) → 24 items, Dallas (us8104) → 0 items. HTML links fallback remains the most reliable extraction path.
These spiders try bootstrap state extraction first (__NEXT_DATA__ / __APOLLO_STATE__), then fallback to JSON-LD and direct product-link HTML parsing.
kroger_search sample output:
{
"item_id": "kroger-2-reduced-fat-milk-gallon",
"title": null,
"url": "https://www.kroger.com/p/kroger-2-reduced-fat-milk-gallon/0001111041700",
"price": null,
"currency": null,
"brand": null,
"rating": null,
"reviews_count": null,
"image_url": null,
"source": "kroger_html_links_fallback",
"raw": null,
"mode": "keyword",
"query": "milk",
"page": 1,
"source_url": "https://www.kroger.com/search?query=milk&searchType=default_search&sort=bestMatch"
}kroger_listing sample output:
{
"item_id": "kroger-vitamin-d-whole-milk-gallon",
"title": null,
"url": "https://www.kroger.com/p/kroger-vitamin-d-whole-milk-gallon/0001111040101",
"price": null,
"currency": null,
"brand": null,
"rating": null,
"reviews_count": null,
"image_url": null,
"source": "kroger_html_links_fallback",
"mode": "category",
"category_url": "https://www.kroger.com/pl/milk/02001",
"page": 1,
"source_url": "https://www.kroger.com/pl/milk/02001"
}Run examples:
common-scrapy crawl kroger_search -a q='milk' -a max_pages=1 -O kroger_search.jsonlcommon-scrapy crawl kroger_listing -a category='milk' -a max_pages=1 -O kroger_listing.jsonl
Notes:
- Added sort variant retries (
bestMatch,sale) to mitigate zero-item responses from Akamai caches;kroger_searchnow emits ~27 items via HTML link fallback even when bootstrap state is stripped. - Listing spider now captures escaped
/p/slugreferences that Kroger injects inside serialized props, so categories likemilkreturn ~31 URLs before the search fallback is considered. - NordVPN US egress (New York, Chicago, Los Angeles, Dallas, Miami, Seattle) continued to return 403s/timeouts during curl checks; disconnecting NordVPN and routing through the configured BRD residential proxy remains the only reliable path in this environment.
{
"item_id": "12345678",
"title": "Body Lotion ...",
"url": "https://www.bathandbodyworks.com/p/...",
"price": 16.95,
"currency": "USD",
"brand": "Bath & Body Works",
"source": "bathandbodyworks_internal_api|bathandbodyworks_html"
}Run examples:
common-scrapy crawl bathandbodyworks_listing -a category='body-care' -a mode=api -a max_pages=1 -O bbw_api.jsonlcommon-scrapy crawl bathandbodyworks_listing -a category='body-care' -a mode=bootstrap -a max_pages=1 -O bbw_bootstrap.jsonlcommon-scrapy crawl bathandbodyworks_listing -a category='body-care' -a mode=html -a max_pages=1 -O bbw_html.jsonl
{
"item_id": null,
"title": "Gift Cards",
"url": "https://www.sallybeauty.com/giftCards.html",
"price": null,
"currency": null,
"brand": "Sally Beauty",
"source": "sallybeauty_html",
"mode": "category_html",
"category_url": "https://www.sallybeauty.com/hair-care/"
}Run examples:
common-scrapy crawl sallybeauty_listing -a category='hair-care' -a mode=api -a max_pages=1 -O sally_api.jsonlcommon-scrapy crawl sallybeauty_listing -a category='hair-care' -a mode=bootstrap -a max_pages=1 -O sally_bootstrap.jsonlcommon-scrapy crawl sallybeauty_listing -a category='hair-care' -a mode=html -a max_pages=1 -O sally_html.jsonl
{
"item_id": "MAC-12345",
"title": "Foundation ...",
"url": "https://www.maccosmetics.com/...",
"price": 42.0,
"currency": "USD",
"brand": "MAC Cosmetics",
"source": "maccosmetics_internal_api_graphql|maccosmetics_html"
}Run examples:
common-scrapy crawl maccosmetics_listing -a category='face' -a mode=api -a max_pages=1 -O mac_api.jsonlcommon-scrapy crawl maccosmetics_listing -a category='face' -a mode=bootstrap -a max_pages=1 -O mac_bootstrap.jsonlcommon-scrapy crawl maccosmetics_listing -a category='face' -a mode=html -a max_pages=1 -O mac_html.jsonl
{
"item_id": "ELF-12345",
"title": "Primer ...",
"url": "https://www.elfcosmetics.com/products/...",
"price": 10.0,
"currency": "USD",
"brand": "e.l.f. Cosmetics",
"source": "elfcosmetics_internal_api|elfcosmetics_preloaded_state|elfcosmetics_html"
}Run examples:
common-scrapy crawl elfcosmetics_listing -a category='face' -a mode=api -a max_pages=1 -O elf_api.jsonlcommon-scrapy crawl elfcosmetics_listing -a category='face' -a mode=bootstrap -a max_pages=1 -O elf_bootstrap.jsonlcommon-scrapy crawl elfcosmetics_listing -a category='face' -a mode=html -a max_pages=1 -O elf_html.jsonl
{
"item_id": "1457_2980_808",
"title": "AE Big Hug V-Neck Sweatshirt",
"url": "https://www.ae.com/us/en/p/women/hoodies-sweatshirts/crew-neck-sweatshirts/ae-big-hug-v-neck-sweatshirt/1457_2980_808",
"price": 38.97,
"original_price": 64.95,
"currency": "USD",
"brand": "American Eagle",
"source": "ae_html",
"mode": "category_html"
}Run example:
common-scrapy crawl ae_listing -a category='women-tops' -a max_pages=1 -O ae_listing.jsonl
Notes:
- Verified in browser and direct HTTP while connected to NordVPN US (Dallas + Seattle).
- In this environment, HTML category pages contain stable product cards/links (
/us/en/p/...) suitable for listing extraction.
Issues and pull requests that add or improve retailer spiders, pagination logic, or extraction helpers are welcome.
common/spiders/– retailer spiders (*_listing_spider.py,*_search_spider.py) and shared helpers.common/settings/– shared Scrapy configuration; reads environment variables via.env.scrapy.cfg– entry point for thescrapyCLI.
- Investigate real browser traffic and identify internal API/bootstrap/HTML patterns.
- Implement a purpose-built spider under
common/spiders/with normalized output fields. - Add category shortcuts (
categories) where applicable. - Validate with
max_pages=1runs and update README examples/output snippets.