fix: Council Fix Pack - December 2025 #1705

m26dvd · 2025-11-11T10:59:19Z

BREAKING CHANGES
fix: #1693 - Fife Council - This scraper now uses postcode and house number instead of UPRN. It also requires a Selenium
browser
fix: #1724 - Rushmoor Council - This scraper requires you to reconfigure your council URL and input your UPRN in a separate field

FIXES

fix: #1622 - Armagh Banbridge Craigavon Council
fix: #1706 - London Borough of Lambeth
fix: #1707 - Chelmsford City Council
fix: #1721 - Mid Sussex Council - No longer requires Selenium
fix: #1720 - Thurrock Council
fix: #1718 - Argyll and Bute Council - Required padding of UPRN to 12 digits
fix: #1712 - South Lanarkshire Council - Scraper could return a date in the past leading to no data being displayed
fix: #1711 - Northumberland Council - Required padding of UPRN to 12 digits
fix: #1209 - Halton Borough Council
fix: #1689 - Wiltshire Council

New Councils

fix: #1640 - Adding Blackpool Council
fix: #1639 - Adding Harlow Council

Summary by CodeRabbit

New Features
- Added support for Blackpool and Harlow council bin lookups.
Bug Fixes & Improvements
- Updated Thurrock collection start dates to 2025.
- Improved retrieval and parsing for Fife, Mid Sussex, Rushmoor, Wiltshire, Northumberland, South Lanarkshire, Chelmsford and others for greater reliability.
- Better address/UPRN handling and refined commercial bin classification for Lambeth.
Documentation
- Updated council index and examples to include Blackpool, Harlow and parameter changes.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

fix: robbrad#1622

coderabbitai · 2025-11-11T10:59:25Z

Walkthrough

This PR adds two new council scrapers (Blackpool, Harlow), migrates several councils from direct HTTP to Selenium or updated HTML parsing, normalises/pads UPRNs, adjusts headers/user-agent usage, fixes bin-type mapping for Lambeth, updates date logic for Thurrock, and updates tests and documentation.

Changes

Cohort / File(s)	Summary
New council scrapers `uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py`	Add new CouncilClass implementations: Blackpool uses token + POST to API and parses PremiseJobs; Harlow GETs HTML with custom headers and parses collection rows.
Selenium-driven scraping / Cloudflare workarounds `uk_bin_collection/uk_bin_collection/councils/FifeCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/ArgyllandButeCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/NorthumberlandCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py`	Replace or add WebDriver flows (Fife, MidSussex) and add UPRN padding (`zfill(12)`) for Argyll & Bute and Northumberland; manage driver lifecycle and element waits.
API / parsing switches `uk_bin_collection/uk_bin_collection/councils/RushmoorCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py`	Rushmoor switched to direct API call using UPRN; MidSussex replaced Selenium-heavy flow with HTTP/BeautifulSoup in one variant; Wiltshire changed endpoint, selectors and now returns {"bins":[...]} with split types.
Headers / user-agent changes `uk_bin_collection/uk_bin_collection/councils/ArmaghBanbridgeCraigavonCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/HaltonBoroughCouncil.py`	Add custom User-Agent headers for requests and pass explicit user_agent to webdriver creation.
ICS/calendar handling `uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py`	Replace hardcoded ICS URL construction with dynamic calendar page retrieval and parsing to locate the ICS link; raise error if ICS not found.
Bin type disambiguation `uk_bin_collection/uk_bin_collection/councils/LondonBoroughLambeth.py`	Map "commercial bin" DisplayPhrase to "recycling" or "refuse" based on container Name to avoid duplicate-key collisions.
Collection cadence & date fixes `uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py`, `uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py`	South Lanarkshire: add cadence parsing and fortnight handling; Thurrock: update hardcoded start dates to 2025.
Tests & docs updates `uk_bin_collection/tests/input.json`, `wiki/Councils.md`	Add/modify test inputs (Blackpool, Harlow, Fife, MidSussex, Rushmoor) and update wiki usage notes and examples for affected councils.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FifeOld as Fife (Old: API)
    participant FifeNew as Fife (New: Selenium)
    participant Server as Fife Server

    rect rgb(240, 240, 255)
    Note over FifeOld,Server: Previous Flow (Blocked by API)
    User->>FifeOld: parse_data(postcode)
    FifeOld->>Server: GET /api/citizen
    Server-->>FifeOld: 403 Forbidden
    FifeOld-->>User: Error
    end

    rect rgb(230, 255, 230)
    Note over FifeNew,Server: New Flow (Selenium-driven)
    User->>FifeNew: parse_data(postcode, paon)
    FifeNew->>Server: Browser navigate (Selenium)
    Server-->>FifeNew: Rendered HTML
    FifeNew->>FifeNew: Wait & bypass protections
    FifeNew->>FifeNew: Fill postcode, select address
    FifeNew->>Server: Trigger search / request data
    Server-->>FifeNew: Rendered collections table
    FifeNew->>FifeNew: Parse HTML -> {"bins":[...]}
    FifeNew-->>User: Result
    end

sequenceDiagram
    participant User
    participant RushmoorOld as Rushmoor (Old: HTML parse)
    participant RushmoorNew as Rushmoor (New: API)
    participant Server as Rushmoor Server

    rect rgb(240, 240, 255)
    Note over RushmoorOld: Old parsing relied on page structure and could hit NoneType
    User->>RushmoorOld: parse_data(page, uprn)
    RushmoorOld->>RushmoorOld: soup.find("p").contents[0] (fragile)
    RushmoorOld-->>User: AttributeError (NoneType)
    end

    rect rgb(230, 255, 230)
    Note over RushmoorNew,Server: New Flow (direct API)
    User->>RushmoorNew: parse_data(uprn)
    RushmoorNew->>Server: GET /api?uprn={uprn}
    Server-->>RushmoorNew: JSON response
    RushmoorNew->>RushmoorNew: Extract collections, format dates
    RushmoorNew-->>User: {"bins":[...]}
    end

sequenceDiagram
    participant User
    participant Blackpool as Blackpool (New)
    participant Server as Blackpool API

    rect rgb(230, 255, 230)
    Note over Blackpool,Server: Blackpool token + POST flow
    User->>Blackpool: parse_data(uprn, postcode)
    Blackpool->>Server: GET /token endpoint (with headers)
    Server-->>Blackpool: token
    Blackpool->>Server: POST /PremiseJobs with uprn, postcode, token
    Server-->>Blackpool: PremiseJobs JSON
    Blackpool->>Blackpool: Build {"bins":[...]} sorted
    Blackpool-->>User: Result
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

FifeCouncil.py — Selenium flow, waits, Cloudflare bypass and webdriver teardown.
MidSussexDistrictCouncil.py — large control-flow changes and selector correctness.
BlackpoolCouncil.py & HarlowCouncil.py — new classes: validate API requests, tokens, date parsing and error handling.
LondonBoroughLambeth.py — ensure commercial-bin mapping covers edge cases without breaking existing mappings.
WiltshireCouncil.py & SouthLanarkshireCouncil.py — verify splitting of types and fortnight cadence logic.
UPRN padding changes — confirm zfill(12) is appropriate for all targets.

Possibly related PRs

fix: Council Fix Pack - November 2025 #1679 — overlaps on Chelmsford ICS/calendar link handling and dynamic ICS retrieval.
November release #1701 — overlapping scraper updates across council modules (potential merge conflicts).

Suggested reviewers

dp247

Poem

🐇 Hop hop — a rabbit’s tiny cheer,

New scrapers bloom both far and near,
Tokens fetched and pages browed,
UPRNs padded, headers proud,
Bins sorted, dates set bright and clear. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'fix: Council Fix Pack - December 2025' is generic and does not clearly specify the main changes; it refers to multiple unrelated council fixes without highlighting a primary objective.	Consider a more specific title that highlights the primary change, such as 'fix: Update council scrapers for API/website changes and add Blackpool/Harlow' or focus on the most impactful fix.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The PR successfully addresses all linked coding objectives: updates 10+ council scrapers (Fife postcode+Selenium, Rushmoor API, Lambeth logic, Chelmsford ICS, Mid Sussex HTTP, Thurrock dates, Argyll/Northumberland UPRN padding, Halton user-agent, South Lanarkshire cadence, Wiltshire parsing) and adds 2 new councils (Blackpool, Harlow).
Out of Scope Changes check	✅ Passed	All changes align with linked issues: council scraper fixes, configuration updates, documentation in wiki/Councils.md, and input.json entries match the stated requirements with no extraneous modifications.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-11T11:01:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (37c8a80) to head (1c68a0d).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1705   +/-   ##
=======================================
  Coverage   86.79%   86.79%           
=======================================
  Files           9        9           
  Lines        1136     1136           
=======================================
  Hits          986      986           
  Misses        150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BREAKING CHANGE fix: robbrad#1693 This scraper now uses postcode and house number instead of UPRN. It also requires a Selenium browser

fix: robbrad#1706 - London Borough of Lambeth

fix: robbrad#1707

fix: robbrad#1721 Mid Sussex

fix: robbrad#1720 - Thurrock Council

fix: robbrad#1718 - Argyll and Bute Council UPRN needs to be 12 digits long

fix: robbrad#1712 - South Lanarkshire Council Council would potentially return a date in the past for that week

fix: robbrad#1711 - Northumberland Council - Requires 12 digit UPRN

fix: robbrad#1209 Halton Borough Council

fix: robbrad#1689 - Wiltshire Council

feat: robbrad#1640 Adding Blackpool Council

feat: robbrad#1639 Adding Harlow Council

fix: robbrad#1724 - Rushmoor Council

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (9)

uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py (2)
36-40: Date update fixes the immediate issue; consider dynamic date calculation for long-term maintainability.

The hardcoded date updates from 2024 to 2025 correctly restore bin collection data generation. However, these dates will require manual maintenance in the future, creating recurring technical debt.

Consider refactoring to calculate start dates dynamically based on the current date and known collection schedules, or move these dates to a configuration file with validation. This would eliminate the need for annual manual updates.

Example approach:
# Calculate the most recent start date for the current year
def get_round_start_dates(year=None):
    if year is None:
        year = datetime.now().year
    # Define the initial dates for the current year
    round_a_blue_brown = datetime(year, 11, 17)
    round_a_green_grey = datetime(year, 11, 24)
    # Return dates based on round
    return {
        'A': {'blue_brown': round_a_blue_brown, 'green_grey': round_a_green_grey},
        'B': {'green_grey': round_a_green_grey, 'blue_brown': round_a_blue_brown}
    }
1-1: Note: Static analysis flagged star import (pre-existing).

Ruff flags that datetime may be undefined due to the star import on line 1. While the code works correctly (the datetime class is available from the common module), explicit imports are generally preferred for clarity and to avoid namespace pollution.

This is pre-existing code and not introduced by this PR, so it's low priority. If refactoring in the future, consider replacing the star import with explicit imports:
from uk_bin_collection.uk_bin_collection.common import datetime, timedelta, get_dates_every_x_days
uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (1)
112-112: Variable name type shadows Python built-in.

Using type as a loop variable shadows the built-in type() function. Consider renaming to bin_type or collection_type_item for clarity.

Apply this diff to use a clearer variable name:
-                    for type in collection_types:
+                    for bin_type in collection_types:

                        dict_data = {
-                            "type": type,
+                            "type": bin_type,
                             "collectionDate": collectiondate,
                        }
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (1)
64-82: Consider addressing date_format F405 and making cadence parsing more robust

Ruff’s F405 for date_format is a style/linting issue but may fail CI depending on project settings. Two options:
Import explicitly instead of relying on import *, e.g.:
from uk_bin_collection.uk_bin_collection.common import date_format
(and import other symbols you need explicitly), or
Keep the wildcard import but add a local # noqa: F405 where date_format is used if that’s the established pattern in this repo.
Separately, since schedule_cadence is only used for a substring check, you can simplify and make it safer by operating on the full cell text and lowering it:
-                    schedule_cadence = row.find("td").get_text().strip().split(" ")[1]
+                    td_text = row.find("td").get_text().strip()
+                    schedule_cadence = td_text.lower()
...
-                                if "Fortnightly" in schedule_cadence:
+                                if "fortnightly" in schedule_cadence:
This avoids depending on a fixed token position and is more resilient to minor HTML/content changes.
uk_bin_collection/uk_bin_collection/councils/ArmaghBanbridgeCraigavonCouncil.py (1)
22-25: User-Agent header looks good; consider adding a request timeout

The added headers (User-Agent) on the requests.get call are a sensible fix for remote blocking. To avoid potential hangs on network issues, consider adding an explicit timeout:
-        response = requests.get(url, headers=headers)
+        response = requests.get(url, headers=headers, timeout=20)
Adjust the timeout value as appropriate for typical council response times.

Also applies to: 47-47
uk_bin_collection/uk_bin_collection/councils/LondonBoroughLambeth.py (1)
45-52: Commercial bin disambiguation looks good; consider case-insensitive match

The new logic properly splits ambiguous "commercial bin" containers into "recycling" vs "refuse" based on Container["Name"], which should prevent entity collapse while keeping other types unchanged.

For a bit more robustness against API text changes, you could optionally make the match case-insensitive:
name = Container["Name"]
if "recycling" in name.lower():
    Bin_Type = "recycling"
else:
    Bin_Type = "refuse"
uk_bin_collection/uk_bin_collection/councils/RushmoorCouncil.py (1)
18-33: API flow is much cleaner; add basic HTTP and JSON guards to avoid regressions

The switch to a direct Umbraco API with UPRN looks good and should resolve the earlier setup issues. To harden this against future content changes (and avoid reintroducing the NoneType.contents failure):

Add a timeout and status check on the HTTP call:
-        response = requests.get(
-            "https://www.rushmoor.gov.uk/Umbraco/Api/BinLookUpWorkAround/Get?",
-            params=params,
-            headers=headers,
-        )
+        response = requests.get(
+            "https://www.rushmoor.gov.uk/Umbraco/Api/BinLookUpWorkAround/Get?",
+            params=params,
+            headers=headers,
+            timeout=20,
+        )
+        response.raise_for_status()
Guard the <p> lookup and NextCollection extraction so a missing node produces a clear error instead of an AttributeError:
-        result = soup.find("p").contents[0]
-
-        json_data = json.loads(result)["NextCollection"]
+        p_tag = soup.find("p")
+        if p_tag is None or not p_tag.contents:
+            raise ValueError("Unexpected response format from Rushmoor API")
+        result = p_tag.contents[0]
+
+        json_data = (json.loads(result).get("NextCollection") or {})
The existing if len(collections) < 1: raise ValueError("No collections found") will still correctly signal the “no data” case.

Also applies to: 35-45
uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py (1)
22-23: New HTTP-based scraper is a good replacement; tighten a few robustness edges

The move away from Selenium to a requests + BeautifulSoup pipeline is a solid simplification and aligns with the updated Mid Sussex site. A few low-friction tweaks would make this more resilient:

Add timeouts to all HTTP calls
-        r = session.get(
-            URI,
-        )
+        r = session.get(URI, timeout=20)
...
-        r = session.post(nextpageurl, data)
+        r = session.post(nextpageurl, data=data, timeout=20)
...
-        r = session.get(
-            nextpageurl,
-        )
+        r = session.get(nextpageurl, timeout=20)
Avoid .find(...).find("a") on a possible None
-            alink = soup.find("div", id="property_list").find("a")
-
-            if alink is None:
-                raise Exception("Address not found")
+            property_list = soup.find("div", id="property_list")
+            if property_list is None:
+                raise Exception("Address list not found")
+            alink = property_list.find("a")
+            if alink is None:
+                raise Exception("Address not found")
Likewise for section#scheduled-collections
-            u1s = soup.find("section", id="scheduled-collections").find_all(
-                "ul", class_="displayinlineblock"
-            )
+            section = soup.find("section", id="scheduled-collections")
+            if section is None:
+                raise Exception("Scheduled collections section not found")
+            u1s = section.find_all("ul", class_="displayinlineblock")
Minor readability

data is used as the POST payload while bindata holds output. Renaming the POST dict to payload (or similar) would make the intent clearer without any behavioural change.

These changes preserve the current success path while turning markup/API drift into clear, controlled exceptions.

Also applies to: 27-37, 43-48, 49-56, 58-66, 68-76, 78-80, 82-99, 101-102
uk_bin_collection/tests/input.json (1)

232-240: Blackpool config looks consistent; optionally mark as skip_get_url.

The new BlackpoolCouncil entry correctly provides both uprn and postcode expected by the scraper. Because the scraper makes its own API calls and ignores the initial page content, consider adding "skip_get_url": true here to avoid an unnecessary preliminary GET of url.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c8a80 and 330bcd9.

📒 Files selected for processing (16)

uk_bin_collection/tests/input.json (5 hunks)
uk_bin_collection/uk_bin_collection/councils/ArgyllandButeCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/ArmaghBanbridgeCraigavonCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (3 hunks)
uk_bin_collection/uk_bin_collection/councils/FifeCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/HaltonBoroughCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughLambeth.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/NorthumberlandCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/RushmoorCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (2 hunks)
uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py (1 hunks)
uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (3 hunks)
wiki/Councils.md (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

uk_bin_collection/uk_bin_collection/councils/HaltonBoroughCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

create_webdriver (321-360)

uk_bin_collection/uk_bin_collection/councils/RushmoorCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py (2)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (2)

check_uprn (67-78)

check_postcode (36-49)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (1)

custom_components/uk_bin_collection/calendar.py (1)

event (54-63)

uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py (3)

uk_bin_collection/uk_bin_collection/councils/NuneatonBedworthBoroughCouncil.py (1)

get_bin_data (63-945)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

check_postcode (36-49)

🪛 Ruff (0.14.5)

uk_bin_collection/uk_bin_collection/councils/ThurrockCouncil.py

36-36: datetime may be undefined, or defined from star imports

(F405)

37-37: datetime may be undefined, or defined from star imports

(F405)

39-39: datetime may be undefined, or defined from star imports

(F405)

40-40: datetime may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/HaltonBoroughCouncil.py

40-40: create_webdriver may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/RushmoorCouncil.py

19-19: check_uprn may be undefined, or defined from star imports

(F405)

29-29: requests may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py

5-5: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

17-17: Unused method argument: page

(ARG002)

21-21: check_uprn may be undefined, or defined from star imports

(F405)

22-22: check_postcode may be undefined, or defined from star imports

(F405)

41-41: Probable use of requests call without timeout

(S113)

59-59: Probable use of requests call without timeout

(S113)

77-77: datetime may be undefined, or defined from star imports

(F405)

80-80: date_format may be undefined, or defined from star imports

(F405)

85-85: datetime may be undefined, or defined from star imports

(F405)

85-85: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py

74-74: date_format may be undefined, or defined from star imports

(F405)

82-82: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/ArmaghBanbridgeCraigavonCouncil.py

47-47: Probable use of requests call without timeout

(S113)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py

102-102: datetime may be undefined, or defined from star imports

(F405)

105-105: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py

6-6: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

18-18: Unused method argument: page

(ARG002)

21-21: check_uprn may be undefined, or defined from star imports

(F405)

32-32: Probable use of requests call without timeout

(S113)

48-48: datetime may be undefined, or defined from star imports

(F405)

54-54: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/FifeCouncil.py

27-27: check_postcode may be undefined, or defined from star imports

(F405)

28-28: check_paon may be undefined, or defined from star imports

(F405)

38-38: create_webdriver may be undefined, or defined from star imports

(F405)

66-66: Unused lambda argument: d

(ARG005)

83-85: Abstract raise to an inner function

(TRY301)

83-85: Avoid specifying long messages outside the exception class

(TRY003)

102-104: Abstract raise to an inner function

(TRY301)

102-104: Avoid specifying long messages outside the exception class

(TRY003)

126-126: date_format may be undefined, or defined from star imports

(F405)

131-131: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/MidSussexDistrictCouncil.py

25-25: check_postcode may be undefined, or defined from star imports

(F405)

29-29: requests may be undefined, or defined from star imports

(F405)

44-44: Abstract raise to an inner function

(TRY301)

44-44: Create your own exception

(TRY002)

44-44: Avoid specifying long messages outside the exception class

(TRY003)

64-64: Abstract raise to an inner function

(TRY301)

64-64: Create your own exception

(TRY002)

64-64: Avoid specifying long messages outside the exception class

(TRY003)

76-76: Abstract raise to an inner function

(TRY301)

76-76: Create your own exception

(TRY002)

76-76: Avoid specifying long messages outside the exception class

(TRY003)

90-90: datetime may be undefined, or defined from star imports

(F405)

93-93: date_format may be undefined, or defined from star imports

(F405)

98-98: datetime may be undefined, or defined from star imports

(F405)

98-98: date_format may be undefined, or defined from star imports

(F405)

101-101: Consider moving this statement to an else block

(TRY300)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

110-112: Abstract raise to an inner function

(TRY301)

110-112: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (14)

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py (2)

17-37: LGTM!

The updated docstring clearly documents the new dynamic ICS lookup flow and parameter requirements.

119-145: LGTM!

The ICS event parsing logic correctly handles events within the next 60 days, splits comma-separated collection types, and ensures proper driver cleanup in the finally block.

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (4)

30-30: LGTM: UPRN padding implemented correctly.

The 12-digit UPRN padding aligns with fixes for other councils in this PR and ensures consistent data format.

57-57: LGTM: Clean data structure initialization.

The data_bins initialization follows the unified aggregation approach and will accumulate results across all requested months.

82-82: LGTM: Endpoint updated to wastecollectioncalendar.

This endpoint change addresses issue #1689 to ensure all collection types are returned.

110-110: Verify the separator format used by Wiltshire Council's API.

The review raises a valid concern: while the code assumes " and " (with spaces) as the separator, no Wiltshire-specific test data or documentation was found in the codebase to confirm this. Other councils in the same project use different separators (comma, forward slash, newline). Without access to actual Wiltshire API responses, the assumption cannot be verified. Check Wiltshire Council's live data to confirm the collection type format is always "Type1 and Type2" with spaces, and not alternatives like "&", "+", or formats without consistent spacing.

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (1)

4-6: BeautifulSoup import is fine

The BeautifulSoup import and usage remain standard and appropriate for this parser; no issues here.

uk_bin_collection/uk_bin_collection/councils/NorthumberlandCouncil.py (1)

22-22: UPRN normalization is correct and aligns with other councils

Coercing user_uprn to str and zfill(12) before using it in select_by_value matches the 12‑digit UPRN requirement and should fix the lookup issues without regressions for already‑padded values.

Also applies to: 43-43

uk_bin_collection/uk_bin_collection/councils/ArgyllandButeCouncil.py (1)

1-1: Datetime import and 12‑digit UPRN padding are appropriate

Importing datetime matches its later usage for parsing and sorting, and zero‑padding user_uprn to 12 digits before select_by_value aligns with the council’s UPRN format requirements and the pattern used in Northumberland.

Also applies to: 31-31

uk_bin_collection/uk_bin_collection/councils/HaltonBoroughCouncil.py (1)

38-41: Explicit user-agent for Selenium driver is a solid improvement

Passing a realistic user_agent into create_webdriver should help with remote/SaaS blocking while remaining compatible with the existing helper signature. No further changes needed here.

wiki/Councils.md (1)

40-40: Documentation updates match the new scraper behaviours

The additions and edits for Blackpool, Harlow, Fife, Mid Sussex, and Rushmoor are consistent with the corresponding scraper implementations and PR objectives:

Blackpool/Harlow: new entries with -u (and -p for Blackpool).

Fife: switched to postcode + house number + Selenium, matching the new Selenium-based scraper.

Mid Sussex: removed web driver parameter and updated note for the non-Selenium HTML flow.

Rushmoor: now uses the base URL and a separate -u UPRN parameter as per the new API-based scraper.

Given this file is auto-generated from uk_bin_collection/tests/input.json, the changes look correct and in sync with the code.

Also applies to: 144-144, 675-685, 1728-1735, 1949-1958, 2557-2564, 3246-3251

uk_bin_collection/tests/input.json (3)

1144-1151: Harlow config matches new scraper expectations.

The HarlowCouncil entry provides an uprn and base url as required by the new scraper; wiki fields and LAD code look consistent with other councils.

1564-1570: Mid Sussex sample updated to match “street name only” requirement.

The house_number example ("OAKLANDS") now aligns with the wiki note that the street name should be passed via the house_number parameter; this looks correct.

2048-2051: Rushmoor config aligns with reworked scraper flow.

Switching url to the base Rushmoor site and adding an explicit uprn with an updated wiki note matches the described change to accept a separate UPRN field. No issues spotted.

coderabbitai · 2025-11-24T20:51:08Z

uk_bin_collection/tests/input.json

    "FifeCouncil": {
-        "uprn": "320203521",
+        "postcode": "KY16 9NQ",
+        "house_number": "1",
+        "web_driver": "http://selenium:4444",
        "url": "https://www.fife.gov.uk",
        "wiki_command_url_override": "https://www.fife.gov.uk",
        "wiki_name": "Fife",
-        "wiki_note": "You will need to use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find the UPRN.",
+        "wiki_note": "Pass in the house number and postcode parameters. This parser requires a Selenium webdriver.",
        "LAD24CD": "S12000047"


⚠️ Potential issue | 🟠 Major

Fife config uses house_number but parser reads paon – this will fail address matching.

FifeCouncil.CouncilClass.parse_data currently pulls the address from kwargs["paon"], but this JSON only defines house_number. In typical flows where kwargs are taken directly from this file, paon will be None, so the dropdown search will never match and a ValueError will be raised for every run.

Add a paon value (or align the parser to also read house_number) so the config and implementation agree. For example:

"FifeCouncil": { "postcode": "KY16 9NQ", - "house_number": "1", + "house_number": "1", + "paon": "1", "web_driver": "http://selenium:4444", "url": "https://www.fife.gov.uk",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"FifeCouncil": {

"uprn": "320203521",

"postcode": "KY16 9NQ",

"house_number": "1",

"web_driver": "http://selenium:4444",

"url": "https://www.fife.gov.uk",

"wiki_command_url_override": "https://www.fife.gov.uk",

"wiki_name": "Fife",

"wiki_note": "You will need to use [FindMyAddress](https://www.findmyaddress.co.uk/search) to find the UPRN.",

"wiki_note": "Pass in the house number and postcode parameters. This parser requires a Selenium webdriver.",

"LAD24CD": "S12000047"

"FifeCouncil": {

"postcode": "KY16 9NQ",

"house_number": "1",

"paon": "1",

"web_driver": "http://selenium:4444",

"url": "https://www.fife.gov.uk",

"wiki_command_url_override": "https://www.fife.gov.uk",

"wiki_name": "Fife",

"wiki_note": "Pass in the house number and postcode parameters. This parser requires a Selenium webdriver.",

"LAD24CD": "S12000047"

🤖 Prompt for AI Agents

In uk_bin_collection/tests/input.json around lines 957–965, the FifeCouncil test config uses "house_number" while the parser expects kwargs["paon"], causing paon to be None and address matching to fail; fix this by updating the parser to read paon = kwargs.get("paon") or kwargs.get("house_number") (i.e., fall back to "house_number" when "paon" is absent), adjust any callers/tests if they assume paon-only, and add a unit test or update the existing test fixture to ensure the fallback works (or alternatively add "paon": "1" to the JSON if you prefer the config-side fix).

coderabbitai · 2025-11-24T20:51:08Z

uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py

+    def parse_data(self, page: str, **kwargs) -> dict:
+
+        user_uprn = kwargs.get("uprn")
+        user_postcode = kwargs.get("postcode")
+        check_uprn(user_uprn)
+        check_postcode(user_postcode)
+        bindata = {"bins": []}
+
+        headers = {
+            "Accept": "*/*",
+            "Accept-Language": "en-GB,en;q=0.9",
+            "Connection": "keep-alive",
+            "DNT": "1",
+            "Origin": "https://www.blackpool.gov.uk",
+            "Referer": "https://www.blackpool.gov.uk/",
+            "Sec-Fetch-Dest": "empty",
+            "Sec-Fetch-Mode": "cors",
+            "Sec-Fetch-Site": "same-site",
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
+            "sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
+            "sec-ch-ua-mobile": "?0",
+            "sec-ch-ua-platform": '"Windows"',
+        }
+
+        response = requests.get(
+            "https://api.blackpool.gov.uk/live//api/bartec/security/token",
+            headers=headers,
+        )
+
+        token = response.text.strip().replace('"', "")
+
+        json_data = {
+            "UPRN": user_uprn,
+            "USRN": "",
+            "PostCode": user_postcode,
+            "StreetNumber": "",
+            "CurrentUser": {
+                "UserId": "",
+                "Token": token,
+            },
+        }
+
+        response = requests.post(
+            "https://api.blackpool.gov.uk/live//api/bartec/collection/PremiseJobs",
+            headers=headers,
+            json=json_data,
+        )
+
+        # Parse the JSON response
+        bin_collection = response.json()
+
+        # Loop through each collection in bin_collection
+        for collection in bin_collection["jobsField"]:
+
+            job = collection["jobField"]
+            date = job.get("scheduledStartField")
+            bin_type = job.get("nameField", "") or job.get("descriptionField", "")
+
+            dict_data = {
+                "type": bin_type,
+                "collectionDate": datetime.strptime(
+                    date,
+                    "%Y-%m-%dT%H:%M:%S",
+                ).strftime(date_format),
+            }
+            bindata["bins"].append(dict_data)
+
+        bindata["bins"].sort(
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
+        )
+
+        return bindata


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Blackpool scraper should use timeouts, check HTTP status, and handle missing jobs defensively.

As written:

Both the token GET and PremiseJobs POST have no timeout and don’t validate status, so a stuck endpoint can hang forever and 4xx/5xx responses will cause obscure JSON/Key errors.

Iteration over bin_collection["jobsField"] and collection["jobField"] assumes those keys always exist and that scheduledStartField and a name/description are present.

You can keep behaviour but make it much more robust like this:

- response = requests.get( - "https://api.blackpool.gov.uk/live//api/bartec/security/token", - headers=headers, - ) - - token = response.text.strip().replace('"', "") + response = requests.get( + "https://api.blackpool.gov.uk/live//api/bartec/security/token", + headers=headers, + timeout=120, + ) + response.raise_for_status() + + token = response.text.strip().replace('"', "") @@ - response = requests.post( - "https://api.blackpool.gov.uk/live//api/bartec/collection/PremiseJobs", - headers=headers, - json=json_data, - ) - - # Parse the JSON response - bin_collection = response.json() - - # Loop through each collection in bin_collection - for collection in bin_collection["jobsField"]: - - job = collection["jobField"] - date = job.get("scheduledStartField") - bin_type = job.get("nameField", "") or job.get("descriptionField", "") - - dict_data = { - "type": bin_type, - "collectionDate": datetime.strptime( - date, - "%Y-%m-%dT%H:%M:%S", - ).strftime(date_format), - } - bindata["bins"].append(dict_data) + response = requests.post( + "https://api.blackpool.gov.uk/live//api/bartec/collection/PremiseJobs", + headers=headers, + json=json_data, + timeout=120, + ) + response.raise_for_status() + + bin_collection = response.json() + + for collection in bin_collection.get("jobsField", []): + job = collection.get("jobField") or {} + date = job.get("scheduledStartField") + bin_type = job.get("nameField") or job.get("descriptionField") + if not (date and bin_type): + continue + + dict_data = { + "type": bin_type, + "collectionDate": datetime.strptime( + date, + "%Y-%m-%dT%H:%M:%S", + ).strftime(date_format), + } + bindata["bins"].append(dict_data)

This keeps the same output structure while avoiding hangs and hard crashes on minor API changes.

🏁 Script executed:

#!/bin/bash # Verify BlackpoolCouncil.py exists and contains the code mentioned in review find . -name "BlackpoolCouncil.py" -type f

Length of output: 66

🏁 Script executed:

#!/bin/bash # Inspect the parse_data method in BlackpoolCouncil.py cat uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py | head -100

Length of output: 2865

🏁 Script executed:

#!/bin/bash # Check how other council scrapers handle requests (e.g., timeout patterns) rg -n "requests\.(get|post)" --type py -C 2 uk_bin_collection/uk_bin_collection/councils/ | head -60

Length of output: 5470

🏁 Script executed:

#!/bin/bash # Search for timeout usage in the codebase to see if it's a standard pattern rg -n "timeout\s*=" --type py uk_bin_collection/uk_bin_collection/councils/ | head -20

Length of output: 2862

🏁 Script executed:

#!/bin/bash # Look for response.raise_for_status() usage in other scrapers rg -n "raise_for_status" --type py uk_bin_collection/

Length of output: 11821

Add timeout parameters and HTTP status validation to both API requests; use defensive JSON parsing.

The BlackpoolCouncil scraper omits critical error handling that is standard across 50+ other council implementations in the codebase. Both the token request and PremiseJobs request lack timeouts and status validation:

Without timeout, network issues will cause indefinite hangs

Without raise_for_status(), error responses (4xx/5xx) bypass validation and cause JSON parse failures

Direct dict access (bin_collection["jobsField"]) crashes on API schema changes instead of gracefully skipping incomplete records

Apply the suggested changes:

- response = requests.get( - "https://api.blackpool.gov.uk/live//api/bartec/security/token", - headers=headers, - ) - - token = response.text.strip().replace('"', "") + response = requests.get( + "https://api.blackpool.gov.uk/live//api/bartec/security/token", + headers=headers, + timeout=120, + ) + response.raise_for_status() + + token = response.text.strip().replace('"', "") @@ - response = requests.post( - "https://api.blackpool.gov.uk/live//api/bartec/collection/PremiseJobs", - headers=headers, - json=json_data, - ) - - # Parse the JSON response - bin_collection = response.json() - - # Loop through each collection in bin_collection - for collection in bin_collection["jobsField"]: - - job = collection["jobField"] - date = job.get("scheduledStartField") - bin_type = job.get("nameField", "") or job.get("descriptionField", "") - - dict_data = { - "type": bin_type, - "collectionDate": datetime.strptime( - date, - "%Y-%m-%dT%H:%M:%S", - ).strftime(date_format), - } - bindata["bins"].append(dict_data) + response = requests.post( + "https://api.blackpool.gov.uk/live//api/bartec/collection/PremiseJobs", + headers=headers, + json=json_data, + timeout=120, + ) + response.raise_for_status() + + bin_collection = response.json() + + for collection in bin_collection.get("jobsField", []): + job = collection.get("jobField") or {} + date = job.get("scheduledStartField") + bin_type = job.get("nameField") or job.get("descriptionField") + if not (date and bin_type): + continue + + dict_data = { + "type": bin_type, + "collectionDate": datetime.strptime( + date, + "%Y-%m-%dT%H:%M:%S", + ).strftime(date_format), + } + bindata["bins"].append(dict_data)

🧰 Tools

🪛 Ruff (0.14.5)

17-17: Unused method argument: page

(ARG002)

21-21: check_uprn may be undefined, or defined from star imports

(F405)

22-22: check_postcode may be undefined, or defined from star imports

(F405)

41-41: Probable use of requests call without timeout

(S113)

59-59: Probable use of requests call without timeout

(S113)

77-77: datetime may be undefined, or defined from star imports

(F405)

80-80: date_format may be undefined, or defined from star imports

(F405)

85-85: datetime may be undefined, or defined from star imports

(F405)

85-85: date_format may be undefined, or defined from star imports

(F405)

coderabbitai · 2025-11-24T20:51:08Z

uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py

+                        calendar_url = f"https://www.chelmsford.gov.uk/bins-and-recycling/check-your-collection-day/{day}-{letter}-collection-calendar/"
+                        driver.get(calendar_url)
+                        soup = BeautifulSoup(driver.page_source, features="html.parser")
+                        a = soup.find(
+                            "a", href=lambda h: h and h.lower().endswith(".ics")
+                        )
+                        if a:
+                            ics_url = a["href"]
+                        else:
+                            raise ValueError(
+                                f"Could not find collection ICS file for address: {user_paon}"
+                            )


⚠️ Potential issue | 🟠 Major

Handle relative URLs when extracting ICS href.

The a["href"] attribute extracted at line 108 may be a relative URL (e.g., "/bins-and-recycling/file.ics" or "../../file.ics"). The events() function at line 124 expects an absolute URL and will fail if given a relative path.

Apply this diff to convert potentially relative URLs to absolute:

+from urllib.parse import urljoin + # (at line 101-112) calendar_url = f"https://www.chelmsford.gov.uk/bins-and-recycling/check-your-collection-day/{day}-{letter}-collection-calendar/" driver.get(calendar_url) soup = BeautifulSoup(driver.page_source, features="html.parser") a = soup.find( "a", href=lambda h: h and h.lower().endswith(".ics") ) if a: - ics_url = a["href"] + ics_url = urljoin(calendar_url, a["href"]) else: raise ValueError( f"Could not find collection ICS file for address: {user_paon}" )

Additionally, consider improving the error message to be more specific:

else: raise ValueError( - f"Could not find collection ICS file for address: {user_paon}" + f"Could not find ICS link on calendar page ({day}-{letter}) for address: {user_paon}" )

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.5)

110-112: Abstract raise to an inner function

(TRY301)

110-112: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/ChelmsfordCityCouncil.py around lines 101-112, the code takes a[href] which can be a relative URL and later feeds it to events(), causing failures; update the extraction to resolve relative hrefs into absolute URLs (for example use urllib.parse.urljoin with the current page URL or driver.current_url) before assigning ics_url, and when raising the ValueError include more specific context (address/user_paon, the calendar page URL and the raw href if present) so the error message is actionable.

coderabbitai · 2025-11-24T20:51:08Z

uk_bin_collection/uk_bin_collection/councils/FifeCouncil.py

    def parse_data(self, page: str, **kwargs) -> dict:
-        # Get and check UPRN
-        user_uprn = kwargs.get("uprn")
-        check_uprn(user_uprn)
-        bindata = {"bins": []}
-
-        API_URL = "https://www.fife.gov.uk/api/custom?action=powersuite_bin_calendar_collections&actionedby=bin_calendar&loadform=true&access=citizen&locale=en"
-        AUTH_URL = "https://www.fife.gov.uk/api/citizen?preview=false&locale=en"
-        AUTH_KEY = "Authorization"
-
-        r = requests.get(AUTH_URL)
-        r.raise_for_status()
-        auth_token = r.headers[AUTH_KEY]
-
-        post_data = {
-            "name": "bin_calendar",
-            "data": {
-                "uprn": user_uprn,
-            },
-            "email": "",
-            "caseid": "",
-            "xref": "",
-            "xref1": "",
-            "xref2": "",
-        }
-
-        headers = {
-            "referer": "https://www.fife.gov.uk/services/forms/bin-calendar",
-            "accept": "application/json",
-            "content-type": "application/json",
-            AUTH_KEY: auth_token,
-        }
-
-        r = requests.post(API_URL, data=json.dumps(post_data), headers=headers)
-        r.raise_for_status()
-
-        result = r.json()
-
-        for collection in result["data"]["tab_collections"]:
-            dict_data = {
-                "type": collection["colour"],
-                "collectionDate": datetime.strptime(
-                    collection["date"],
-                    "%A, %B %d, %Y",
-                ).strftime("%d/%m/%Y"),
-            }
-            bindata["bins"].append(dict_data)
-
-        bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
-        )
+        driver = None
+        try:
+            # Get and check UPRN
+            user_postcode = kwargs.get("postcode")
+            user_paon = kwargs.get("paon")
+            check_postcode(user_postcode)
+            check_paon(user_paon)
+
+            web_driver = kwargs.get("web_driver")
+            headless = kwargs.get("headless")
+
+            bindata = {"bins": []}
+
+            URL = "https://fife.portal.uk.empro.verintcloudservices.com/site/fife/request/bin_calendar"
+
+            user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
+            driver = create_webdriver(web_driver, headless, user_agent, __name__)
+            driver.get(URL)
+
+            wait = WebDriverWait(driver, 30)
+
+            ID_POSTCODE = "dform_widget_ps_45M3LET8_txt_postcode"
+            ID_SEARCH_BTN = "dform_widget_ps_3SHSN93_searchbutton"
+            ID_ADDRESS_SELECT = "dform_widget_ps_3SHSN93_id"
+            ID_COLLECTIONS = "dform_table_tab_collections"
+
+            # Wait for initial page load and Cloudflare bypass
+            wait.until(lambda d: "Just a moment" not in d.title and d.title != "")
+            time.sleep(3)
+
+            # Wait for the postcode field to appear then populate it
+            inputElement_postcode = wait.until(
+                EC.presence_of_element_located((By.ID, ID_POSTCODE))
+            )
+            inputElement_postcode.send_keys(user_postcode)
+
+            # Click search button
+            findAddress = wait.until(EC.element_to_be_clickable((By.ID, ID_SEARCH_BTN)))
+            findAddress.click()
+
+            # Wait for the 'Select address' dropdown to appear and select option matching the house name/number
+            select_el = wait.until(
+                EC.visibility_of_element_located((By.ID, ID_ADDRESS_SELECT))
+            )
+            wait.until(lambda d: len(Select(select_el).options) > 1)
+
+            paon_norm = str(user_paon).strip().casefold()
+            sel = Select(select_el)
+
+            time.sleep(10)
+
+            def _best_option():
+                # Prefer exact contains on visible text; fallback to casefold contains
+                for opt in sel.options:
+                    txt = (opt.text or "").strip()
+                    if paon_norm and paon_norm in txt.casefold():
+                        return opt
+                return None
+
+            opt = _best_option()
+            if not opt:
+                raise ValueError(
+                    f"Could not find an address containing '{user_paon}' in the dropdown."
+                )
+            sel.select_by_visible_text(opt.text)
+
+            # After selecting, the collections table should (re)render; wait for it
+            wait.until(EC.presence_of_element_located((By.ID, ID_COLLECTIONS)))
+            # Also wait until at least one data row is present (beyond headers)
+            wait.until(
+                lambda d: len(
+                    d.find_elements(By.CSS_SELECTOR, f"#{ID_COLLECTIONS} .dform_tr")
+                )
+                > 1
+            )
+
+            soup = BeautifulSoup(driver.page_source, features="html.parser")
+
+            table = soup.find("div", id=ID_COLLECTIONS)
+            if not table:
+                raise ValueError(
+                    f"Could not find collections table by id='{ID_COLLECTIONS}'"
+                )
+
+            rows = table.find_all("div", class_="dform_tr")
+
+            # Skip header row (first row with .dform_th entries)
+            for row in rows[1:]:
+                tds = row.find_all("div", class_="dform_td")
+                if len(tds) < 3:
+                    continue
+
+                # Colour comes from the <img alt="...">
+                colour_cell = tds[0]
+                img = colour_cell.find("img")
+                colour = img.get("alt").strip() if img and img.has_attr("alt") else None
+
+                # Date text
+                raw_date = tds[1].get_text(strip=True)
+                # Example: "Wednesday, November 12, 2025"
+                dt = datetime.strptime(raw_date, "%A, %B %d, %Y")
+
+                dict_data = {
+                    "type": colour,
+                    "collectionDate": dt.strftime(date_format),
+                }
+                bindata["bins"].append(dict_data)
+
+            bindata["bins"].sort(
+                key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
+            )
+
+        except Exception as e:
+            # Here you can log the exception if needed
+            print(f"An error occurred: {e}")
+            # Optionally, re-raise the exception if you want it to propagate
+            raise
+        finally:
+            # This block ensures that the driver is closed regardless of an exception
+            if driver:
+                driver.quit()


⚠️ Potential issue | 🟠 Major

Fife parser expects paon but config/wikis only supply house_number.

Inside parse_data you do:

user_postcode = kwargs.get("postcode") user_paon = kwargs.get("paon") check_postcode(user_postcode) check_paon(user_paon) ... paon_norm = str(user_paon).strip().casefold()

However, the FifeCouncil entry in tests/input.json now only defines house_number, and the wiki text tells users to pass a “house number” parameter, not paon. In flows that rely on those kwargs, user_paon will be None, so _best_option will end up searching for "none" and always raise the “Could not find an address” ValueError.

To make this work with both existing conventions and the new config, have the parser fall back to house_number when paon is absent:

- user_postcode = kwargs.get("postcode") - user_paon = kwargs.get("paon") - check_postcode(user_postcode) - check_paon(user_paon) + user_postcode = kwargs.get("postcode") + user_paon = kwargs.get("paon") or kwargs.get("house_number") + check_postcode(user_postcode) + check_paon(user_paon) @@ - paon_norm = str(user_paon).strip().casefold() + paon_norm = str(user_paon).strip().casefold()

(Only the source of user_paon changes.)

This keeps support for any callers already using paon while making the documented house_number parameter work for Fife.

🧰 Tools

🪛 Ruff (0.14.5)

21-21: Unused method argument: page

(ARG002)

27-27: check_postcode may be undefined, or defined from star imports

(F405)

28-28: check_paon may be undefined, or defined from star imports

(F405)

38-38: create_webdriver may be undefined, or defined from star imports

(F405)

66-66: Unused lambda argument: d

(ARG005)

83-85: Abstract raise to an inner function

(TRY301)

83-85: Avoid specifying long messages outside the exception class

(TRY003)

102-104: Abstract raise to an inner function

(TRY301)

102-104: Avoid specifying long messages outside the exception class

(TRY003)

126-126: date_format may be undefined, or defined from star imports

(F405)

131-131: date_format may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/FifeCouncil.py around lines 21-142, the parser reads user_paon from kwargs["paon"] which can be None because callers provide "house_number"; update the source of the PAON to use kwargs.get("paon") with a fallback to kwargs.get("house_number"), assign that resolved value to user_paon, and use that resolved value for check_paon() and further processing (paon_norm and error messages) so both old callers using "paon" and new callers using "house_number" work correctly.

coderabbitai · 2025-11-24T20:51:08Z

uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py

+    def parse_data(self, page: str, **kwargs) -> dict:
+
+        user_uprn = kwargs.get("uprn")
+        check_uprn(user_uprn)
+        bindata = {"bins": []}
+
+        headers = {
+            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
+        }
+
+        params = {
+            "uprn": user_uprn,
+        }
+
+        response = requests.get(
+            "https://selfserve.harlow.gov.uk/appshost/firmstep/self/apps/custompage/bincollectionsecho",
+            params=params,
+            headers=headers,
+        )
+
+        soup = BeautifulSoup(response.text, features="html.parser")
+
+        summary = soup.find("div", {"class": "summary"})
+        collectionrows = summary.find_all("div", {"class": "collectionsrow"})
+
+        for collectionrow in collectionrows:
+            bin_type = collectionrow.find("div", {"class": "col-xs-4"})
+            collection_time = collectionrow.find("div", {"class": "col-sm-6"})
+
+            if bin_type and collection_time:
+                collectionDate = datetime.strptime(
+                    collection_time.text.strip(), "%a - %d %b %Y"
+                )
+
+                dict_data = {
+                    "type": bin_type.text.strip(),
+                    "collectionDate": collectionDate.strftime(date_format),
+                }
+                bindata["bins"].append(dict_data)
+
+        return bindata


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Harden Harlow HTTP/HTML handling: add timeouts/status checks and guard missing .summary.

Right now the scraper can hang indefinitely or crash on layout/HTTP changes:

requests.get has no timeout and doesn’t check status_code; a slow or broken endpoint will block indefinitely, and 4xx/5xx HTML will fall through to parsing.

summary = soup.find("div", {"class": "summary"}) is assumed to succeed; if it’s missing, summary.find_all(...) will raise AttributeError.

Consider tightening this up along these lines:

- response = requests.get( - "https://selfserve.harlow.gov.uk/appshost/firmstep/self/apps/custompage/bincollectionsecho", - params=params, - headers=headers, - ) - - soup = BeautifulSoup(response.text, features="html.parser") - - summary = soup.find("div", {"class": "summary"}) - collectionrows = summary.find_all("div", {"class": "collectionsrow"}) + response = requests.get( + "https://selfserve.harlow.gov.uk/appshost/firmstep/self/apps/custompage/bincollectionsecho", + params=params, + headers=headers, + timeout=120, + ) + response.raise_for_status() + + soup = BeautifulSoup(response.text, features="html.parser") + + summary = soup.find("div", {"class": "summary"}) + if not summary: + raise ValueError("Could not find bin collection summary on Harlow page") + collectionrows = summary.find_all("div", {"class": "collectionsrow"})

You can also safely drop the unused import time at the top.

--- 🏁 Script executed: ```shell #!/bin/bash # Locate and examine HarlowCouncil.py file fd -t f "HarlowCouncil.py" 2>/dev/null | head -20

Length of output: 61

🏁 Script executed:

#!/bin/bash # Examine the full HarlowCouncil.py file to check imports and implementation if [ -f "uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py" ]; then cat "uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py" fi

Length of output: 1911

🏁 Script executed:

#!/bin/bash # Check for any try-except or error handling patterns in the file rg -n "timeout|raise_for_status|try|except" uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py -C2 2>/dev/null

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if requests module has timeout defaults or if other council scrapers use timeouts rg -n "timeout=" uk_bin_collection/uk_bin_collection/councils/ -t py 2>/dev/null | head -20

Length of output: 2826

🏁 Script executed:

#!/bin/bash # Check import patterns in the file for 'time' module rg -n "^import|^from" uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py 2>/dev/null

Length of output: 209

Add timeout, status check, and guard against missing .summary div.

The code lacks critical robustness safeguards:

requests.get() (lines 33–38) has no timeout and doesn't check response.status_code; requests can hang indefinitely, and error pages will pass through to parsing. Other council scrapers in this codebase use timeout=20 or higher as standard.

summary.find_all() (line 43) assumes the .summary div exists; if the HTML changes, None.find_all() will raise AttributeError.

import time (line 1) is unused and should be removed.

Apply the suggested changes:

response = requests.get( "https://selfserve.harlow.gov.uk/appshost/firmstep/self/apps/custompage/bincollectionsecho", params=params, headers=headers, + timeout=120, ) + response.raise_for_status() soup = BeautifulSoup(response.text, features="html.parser") summary = soup.find("div", {"class": "summary"}) + if not summary: + raise ValueError("Could not find bin collection summary on Harlow page") collectionrows = summary.find_all("div", {"class": "collectionsrow"})

Also remove the unused import time at line 1.

🧰 Tools

🪛 Ruff (0.14.5)

18-18: Unused method argument: page

(ARG002)

21-21: check_uprn may be undefined, or defined from star imports

(F405)

32-32: Probable use of requests call without timeout

(S113)

48-48: datetime may be undefined, or defined from star imports

(F405)

54-54: date_format may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/HarlowCouncil.py around lines 18 to 58, the HTTP call and HTML parsing need hardening: remove the unused import time at line 1; call requests.get(...) with a timeout (e.g. timeout=20) and verify the response is successful (check response.status_code == 200 or call response.raise_for_status()) before parsing; guard against a missing summary div by checking if summary is not None before calling summary.find_all(...) (return bindata or an empty bins list if summary is missing) so you don't call .find_all() on None.

coderabbitai · 2025-11-24T20:51:09Z

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py

            for day in week_days:
                for row in collection_schedule:
                    schedule_type = row.find("th").get_text().strip()
                    results2 = re.search("([^(]+)", row.find("td").get_text().strip())
+                    schedule_cadence = row.find("td").get_text().strip().split(" ")[1]
                    if results2:
                        schedule_day = results2[1].strip()
                        for collection_type in week_collection_types:
+                            collectionDate = None
                            if collection_type in schedule_type.lower():
                                if (
                                    day.weekday()
                                    == time.strptime(schedule_day, "%A").tm_wday
                                ):
-                                    dict_data = {
-                                        "type": schedule_type,
-                                        "collectionDate": day.strftime(date_format),
-                                    }
-                                    data["bins"].append(dict_data)
+                                    collectionDate = day.strftime(date_format)
+                            else:
+                                if "Fortnightly" in schedule_cadence:
+                                    if (
+                                        day.weekday()
+                                        == time.strptime(schedule_day, "%A").tm_wday
+                                    ):
+                                        day = day + timedelta(days=7)
+                                        collectionDate = day.strftime(date_format)
+
+                            if schedule_type and collectionDate:
+                                dict_data = {
+                                    "type": schedule_type,
+                                    "collectionDate": collectionDate,
+                                }
+                                data["bins"].append(dict_data)


⚠️ Potential issue | 🔴 Critical

Fortnightly cadence branch is mis-attached and can generate incorrect dates/types

The new cadence logic has a couple of problems that can lead to wrong or missing collections:

else is attached to the wrong condition (logic bug)

Currently:

else: (Line 75) pairs with if collection_type in schedule_type.lower(): (Line 69).

This means the “Fortnightly” branch runs only when the collection_type does not match the schedule_type, and never when it does match.

Consequences:

Fortnightly adjustment is skipped for the actual matching bin type.

For non-matching collection_type values, the fortnightly branch can still set a collectionDate, then if schedule_type and collectionDate: (Lines 84–89) will append entries using the wrong schedule_type, potentially duplicating or mis-dating bins.

Brittle schedule_cadence parsing

schedule_cadence = row.find("td").get_text().strip().split(" ")[1] (Line 64) assumes there are at least two space‑separated tokens in the cell.

If the HTML ever changes to a single word (e.g. just "Thursday") or another format, this will raise an IndexError and break the scraper.

Repeated parsing of the <td> text

row.find("td").get_text().strip() is called multiple times (Lines 63–64). It’s minor, but you can compute this once for clarity and slight efficiency.

A safer and clearer structure is to:

Parse the <td> text once.

Safely derive schedule_cadence.

Compute schedule_day and its weekday once.

Apply the fortnightly adjustment inside the matching-collection_type branch.

For example:

- for row in collection_schedule: - schedule_type = row.find("th").get_text().strip() - results2 = re.search("([^(]+)", row.find("td").get_text().strip()) - schedule_cadence = row.find("td").get_text().strip().split(" ")[1] - if results2: - schedule_day = results2[1].strip() - for collection_type in week_collection_types: - collectionDate = None - if collection_type in schedule_type.lower(): - if ( - day.weekday() - == time.strptime(schedule_day, "%A").tm_wday - ): - collectionDate = day.strftime(date_format) - else: - if "Fortnightly" in schedule_cadence: - if ( - day.weekday() - == time.strptime(schedule_day, "%A").tm_wday - ): - day = day + timedelta(days=7) - collectionDate = day.strftime(date_format) - - if schedule_type and collectionDate: - dict_data = { - "type": schedule_type, - "collectionDate": collectionDate, - } - data["bins"].append(dict_data) + for row in collection_schedule: + schedule_type = row.find("th").get_text().strip() + td = row.find("td") + if not td: + continue + + td_text = td.get_text().strip() + results2 = re.search(r"([^(]+)", td_text) + + tokens = td_text.split() + schedule_cadence = tokens[1] if len(tokens) > 1 else "" + + if results2: + schedule_day = results2.group(1).strip() + schedule_wday = time.strptime(schedule_day, "%A").tm_wday + is_fortnightly = "Fortnightly" in schedule_cadence + + for collection_type in week_collection_types: + collectionDate = None + if collection_type in schedule_type.lower(): + if day.weekday() == schedule_wday: + effective_day = ( + day + timedelta(days=7) + if is_fortnightly + else day + ) + collectionDate = effective_day.strftime(date_format) + + if schedule_type and collectionDate: + dict_data = { + "type": schedule_type, + "collectionDate": collectionDate, + } + data["bins"].append(dict_data)

This keeps the fortnightly handling tied to the correct schedule_type, avoids accidental extra entries, and guards against brittle .split()[1] usage.

🧰 Tools

🪛 Ruff (0.14.5)

63-63: re may be undefined, or defined from star imports

(F405)

74-74: date_format may be undefined, or defined from star imports

(F405)

82-82: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (1)
102-116: Consider renaming loop variable and reducing reliance on * imports

Two small polish items:

The loop uses for type in collection_types: which shadows the built-in type; consider renaming to bin_type (and updating the dict accordingly) for clarity.

Ruff flags datetime/date_format as F405 because they come from a from ... import *; if you want cleaner linting and clearer dependencies, consider importing the specific names you need instead of using a star import.

Example for the loop rename:
-                    collection_types = collection_type.split(" and ")
-
-                    for type in collection_types:
-
-                        dict_data = {
-                            "type": type,
+                    collection_types = collection_type.split(" and ")
+
+                    for bin_type in collection_types:
+
+                        dict_data = {
+                            "type": bin_type,
                             "collectionDate": collectiondate,
                         }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 330bcd9 and 1c68a0d.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (1)

custom_components/uk_bin_collection/calendar.py (1)

event (54-63)

🪛 Ruff (0.14.5)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py

102-102: datetime may be undefined, or defined from star imports

(F405)

105-105: date_format may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run Unit Tests (3.12, 1.8.4)
GitHub Check: Run Integration Tests (3.12, 1.8.4)

🔇 Additional comments (1)

uk_bin_collection/uk_bin_collection/councils/WiltshireCouncil.py (1)

57-120: Bin list construction and type/date mapping look correct

data_bins is initialised once and populated with per-event dicts using "type": type and "collectionDate": collectiondate, and collection_type.split(" and ") ensures multi-service entries are split into individual bin types. This resolves the earlier swapped-key issue and aligns with the expected { "bins": [...] } structure.

fix: Armagh Banbridge Craigavon Council

382d9dd

fix: robbrad#1622

m26dvd added 13 commits November 11, 2025 12:12

fix: Fife Council

bf72126

BREAKING CHANGE fix: robbrad#1693 This scraper now uses postcode and house number instead of UPRN. It also requires a Selenium browser

fix: robbrad#1706 - London Borough of Lambeth

2cda641

fix: robbrad#1706 - London Borough of Lambeth

fix: Chelmsford City Council

75ba6e6

fix: robbrad#1707

fix: Mid Sussex

7e7d181

fix: robbrad#1721 Mid Sussex

fix: Thurrock Council

62eaa7d

fix: robbrad#1720 - Thurrock Council

fix: Argyll and Bute Council

f00a113

fix: robbrad#1718 - Argyll and Bute Council UPRN needs to be 12 digits long

fix: South Lanarkshire Council

36dc6ea

fix: robbrad#1712 - South Lanarkshire Council Council would potentially return a date in the past for that week

fix: Northumberland Council

0bed9ee

fix: robbrad#1711 - Northumberland Council - Requires 12 digit UPRN

fix: Halton Borough Council

a91e07a

fix: robbrad#1209 Halton Borough Council

fix: Wiltshire Council

4115678

fix: robbrad#1689 - Wiltshire Council

feat: Adding Blackpool Council

e8b52b6

feat: robbrad#1640 Adding Blackpool Council

feat: Adding Harlow Council

f2a0249

feat: robbrad#1639 Adding Harlow Council

fix: Rushmoor Council

330bcd9

fix: robbrad#1724 - Rushmoor Council

m26dvd mentioned this pull request Nov 20, 2025

Failed Setup #1724

Open

4 tasks

m26dvd marked this pull request as ready for review November 24, 2025 20:45

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

fix: WiltshireCouncil.py

1c68a0d

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Council Fix Pack - December 2025 #1705

fix: Council Fix Pack - December 2025 #1705

Uh oh!

m26dvd commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Council Fix Pack - December 2025 #1705

Are you sure you want to change the base?

fix: Council Fix Pack - December 2025 #1705

Uh oh!

Conversation

m26dvd commented Nov 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

m26dvd commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 11, 2025 •

edited

Loading