Remove Selenium from Broadland by PaulBrack · Pull Request #1814 · robbrad/UKBinCollectionData

PaulBrack · 2026-01-15T23:31:46Z

Summary by CodeRabbit

Performance
- Accelerated bin collection data retrieval with optimized processing and faster response times.
Bug Fixes
- Strengthened input validation to ensure accurate address information handling and prevent processing errors.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T23:32:06Z

📝 Walkthrough

Walkthrough

A single council scraper module was refactored to replace Selenium WebDriver automation with direct HTTP requests, removing browser interactions and cookie banner handling. Input validation for UPRN and postcode was added, and the parse_data method signature was updated to accept kwargs for flexible parameter passing.

Changes

Cohort / File(s)	Summary
Broadland District Council Scraper Refactor `uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py`	Replaced Selenium WebDriver with HTTP requests using custom Cookie header containing JSON payload; removed WebDriver initialization, form submission, and browser interactions; added UPRN/postcode validation and normalization; switched to BeautifulSoup for response parsing; updated parse_data signature to `parse_data(self, page: str = None, **kwargs)`

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

feat: add direct mobile API support to North Hertfordshire District Council scraper #1715: Replaces Selenium scraping with direct HTTP/mobile-API UPRN-based retrieval approach, adding UPRN validation and updating parse_data method signature to accept kwargs.

Poem

🐰 No more browser clicks and wheel turns,
Just cookies and requests—how the scraper learns!
UPRN validated, one-two-three,
Bins parsed cleanly, efficient and free! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: removing Selenium (a browser automation tool) from the Broadland council implementation, which aligns with the refactoring from Selenium-driven to HTTP request-based approach.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py`:
- Around line 44-48: The requests.get call in BroadlandDistrictCouncil (where
you call requests.get(...)) is using a hardcoded
"https://area.southnorfolkandbroadland.gov.uk/" instead of the configured URL
stored in the local variable url (obtained from kwargs via url =
kwargs.get("url")); update that requests.get invocation to use the url variable
(keeping headers and timeout as-is) so the configured URL (e.g.,
".../FindAddress") from input.json is honored.

🧹 Nitpick comments (6)

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (6)
104-104: Remove duplicate import.

parse from dateutil.parser is already imported at line 8. This redundant import inside the function should be removed.
Proposed fix
-                                from dateutil.parser import parse
-
-                                parsed_date = parse(cleaned_date_text, fuzzy=True)
+                                parsed_date = parse(cleaned_date_text, fuzzy=True)
17-17: Fix implicit Optional type hint.

Per PEP 484, page: str = None should explicitly declare the Optional type. The unused page parameter is understood since this council handles its own request.
Proposed fix
-    def parse_data(self, page: str = None, **kwargs) -> dict:
+    def parse_data(self, page: str | None = None, **kwargs) -> dict:
23-23: Remove unused variable.

The url variable is assigned but never used. This was likely left over from the Selenium refactor.
Proposed fix
             uprn = kwargs.get("uprn")
             postcode = kwargs.get("postcode")
-            url = kwargs.get("url")
57-57: Remove unused variable.

current_date is assigned but never used in the parsing logic.
Proposed fix
-            # Initialize current date
-            current_date = datetime.now()
-
-            # Process collection details
+            # Process collection details
60-60: Consider removing or replacing debug print statements.

Multiple print() calls remain in the code for debugging. For production code, consider removing them or replacing with proper logging (e.g., _LOGGER.debug()) as used elsewhere in the codebase.

Also applies to: 78-78, 96-96, 116-116

117-119: Consider using logging instead of print for error reporting.

The error message uses print() which is inconsistent with production practices. Consider using a logger for consistent error reporting.
Proposed fix
         except Exception as e:
-            print(f"An error occurred: {e}")
+            import logging
+            logging.getLogger(__name__).error(f"An error occurred: {e}")
             raise
Or import _LOGGER if available in the common module.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and 43557af.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (2)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (2)

check_uprn (67-78)

check_postcode (36-49)

🪛 Ruff (0.14.11)

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py

11-11: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

17-17: Unused method argument: page

(ARG002)

17-17: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

23-23: Local variable url is assigned to but never used

Remove assignment to unused variable url

(F841)

25-25: check_uprn may be undefined, or defined from star imports

(F405)

26-26: check_postcode may be undefined, or defined from star imports

(F405)

🔇 Additional comments (1)

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (1)

25-27: Input validation and UPRN normalization look good.

The check_uprn function validates that the UPRN exists and is not empty, while check_postcode validates UK postcode formatting against the postcodes.io API. The zero-padding to 12 digits is correct for UK UPRNs.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T23:34:09Z

uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py

+            r = requests.get(
+                "https://area.southnorfolkandbroadland.gov.uk/",
+                headers=headers,
+                timeout=30
            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find configuration files that might contain URL definitions find . -name "*.json" -o -name "*.yaml" -o -name "*.yml" | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 851

🏁 Script executed:

# Search for BroadlandDistrictCouncil in configuration files rg "BroadlandDistrictCouncil" -t json -A 5 2>/dev/null || echo "No JSON matches" rg "BroadlandDistrictCouncil" -A 5 --type-list | grep -E "\.(json|yaml|yml)" && rg "BroadlandDistrictCouncil" -A 5 || echo "Search complete"

Repository: robbrad/UKBinCollectionData

Length of output: 949

🏁 Script executed:

# Check the actual BroadlandDistrictCouncil implementation to see how it uses kwargs cat -n uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py | head -60

Repository: robbrad/UKBinCollectionData

Length of output: 2215

🏁 Script executed:

# Look at how other council classes handle URLs - check if they use kwargs.get("url") rg "kwargs\.get\(.*url" -A 2 -B 2

Repository: robbrad/UKBinCollectionData

Length of output: 17682

URL from kwargs is retrieved but never used; requests.get() uses a hardcoded URL instead.

At line 23, url = kwargs.get("url") retrieves the configured URL (https://area.southnorfolkandbroadland.gov.uk/FindAddress from input.json), but at line 45, the code uses a different hardcoded URL (https://area.southnorfolkandbroadland.gov.uk/). This means configuration changes to the URL will be ignored. Use the url variable from kwargs in the request instead of the hardcoded value.

🤖 Prompt for AI Agents

In `@uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py` around lines 44 - 48, The requests.get call in BroadlandDistrictCouncil (where you call requests.get(...)) is using a hardcoded "https://area.southnorfolkandbroadland.gov.uk/" instead of the configured URL stored in the local variable url (obtained from kwargs via url = kwargs.get("url")); update that requests.get invocation to use the url variable (keeping headers and timeout as-is) so the configured URL (e.g., ".../FindAddress") from input.json is honored.

codecov · 2026-01-17T07:32:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (43557af).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1814   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Remove selenium from Broadland

43557af

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

PaulBrack marked this pull request as draft January 16, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Selenium from Broadland#1814

Remove Selenium from Broadland#1814
PaulBrack wants to merge 1 commit intorobbrad:masterfrom
PaulBrack:remove-selenium-broadland

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

codecov bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PaulBrack commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading