Skip to content

Remove Selenium from Broadland#1814

Draft
PaulBrack wants to merge 1 commit intorobbrad:masterfrom
PaulBrack:remove-selenium-broadland
Draft

Remove Selenium from Broadland#1814
PaulBrack wants to merge 1 commit intorobbrad:masterfrom
PaulBrack:remove-selenium-broadland

Conversation

@PaulBrack
Copy link

@PaulBrack PaulBrack commented Jan 15, 2026

Summary by CodeRabbit

  • Performance

    • Accelerated bin collection data retrieval with optimized processing and faster response times.
  • Bug Fixes

    • Strengthened input validation to ensure accurate address information handling and prevent processing errors.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

A single council scraper module was refactored to replace Selenium WebDriver automation with direct HTTP requests, removing browser interactions and cookie banner handling. Input validation for UPRN and postcode was added, and the parse_data method signature was updated to accept kwargs for flexible parameter passing.

Changes

Cohort / File(s) Summary
Broadland District Council Scraper Refactor
uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py
Replaced Selenium WebDriver with HTTP requests using custom Cookie header containing JSON payload; removed WebDriver initialization, form submission, and browser interactions; added UPRN/postcode validation and normalization; switched to BeautifulSoup for response parsing; updated parse_data signature to parse_data(self, page: str = None, **kwargs)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 No more browser clicks and wheel turns,
Just cookies and requests—how the scraper learns!
UPRN validated, one-two-three,
Bins parsed cleanly, efficient and free! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: removing Selenium (a browser automation tool) from the Broadland council implementation, which aligns with the refactoring from Selenium-driven to HTTP request-based approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py`:
- Around line 44-48: The requests.get call in BroadlandDistrictCouncil (where
you call requests.get(...)) is using a hardcoded
"https://area.southnorfolkandbroadland.gov.uk/" instead of the configured URL
stored in the local variable url (obtained from kwargs via url =
kwargs.get("url")); update that requests.get invocation to use the url variable
(keeping headers and timeout as-is) so the configured URL (e.g.,
".../FindAddress") from input.json is honored.
🧹 Nitpick comments (6)
uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (6)

104-104: Remove duplicate import.

parse from dateutil.parser is already imported at line 8. This redundant import inside the function should be removed.

Proposed fix
-                                from dateutil.parser import parse
-
-                                parsed_date = parse(cleaned_date_text, fuzzy=True)
+                                parsed_date = parse(cleaned_date_text, fuzzy=True)

17-17: Fix implicit Optional type hint.

Per PEP 484, page: str = None should explicitly declare the Optional type. The unused page parameter is understood since this council handles its own request.

Proposed fix
-    def parse_data(self, page: str = None, **kwargs) -> dict:
+    def parse_data(self, page: str | None = None, **kwargs) -> dict:

23-23: Remove unused variable.

The url variable is assigned but never used. This was likely left over from the Selenium refactor.

Proposed fix
             uprn = kwargs.get("uprn")
             postcode = kwargs.get("postcode")
-            url = kwargs.get("url")

57-57: Remove unused variable.

current_date is assigned but never used in the parsing logic.

Proposed fix
-            # Initialize current date
-            current_date = datetime.now()
-
-            # Process collection details
+            # Process collection details

60-60: Consider removing or replacing debug print statements.

Multiple print() calls remain in the code for debugging. For production code, consider removing them or replacing with proper logging (e.g., _LOGGER.debug()) as used elsewhere in the codebase.

Also applies to: 78-78, 96-96, 116-116


117-119: Consider using logging instead of print for error reporting.

The error message uses print() which is inconsistent with production practices. Consider using a logger for consistent error reporting.

Proposed fix
         except Exception as e:
-            print(f"An error occurred: {e}")
+            import logging
+            logging.getLogger(__name__).error(f"An error occurred: {e}")
             raise

Or import _LOGGER if available in the common module.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and 43557af.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (2)
  • check_uprn (67-78)
  • check_postcode (36-49)
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py

11-11: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


17-17: Unused method argument: page

(ARG002)


17-17: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


23-23: Local variable url is assigned to but never used

Remove assignment to unused variable url

(F841)


25-25: check_uprn may be undefined, or defined from star imports

(F405)


26-26: check_postcode may be undefined, or defined from star imports

(F405)

🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py (1)

25-27: Input validation and UPRN normalization look good.

The check_uprn function validates that the UPRN exists and is not empty, while check_postcode validates UK postcode formatting against the postcodes.io API. The zero-padding to 12 digits is correct for UK UPRNs.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +44 to 48
r = requests.get(
"https://area.southnorfolkandbroadland.gov.uk/",
headers=headers,
timeout=30
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find configuration files that might contain URL definitions
find . -name "*.json" -o -name "*.yaml" -o -name "*.yml" | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 851


🏁 Script executed:

# Search for BroadlandDistrictCouncil in configuration files
rg "BroadlandDistrictCouncil" -t json -A 5 2>/dev/null || echo "No JSON matches"
rg "BroadlandDistrictCouncil" -A 5 --type-list | grep -E "\.(json|yaml|yml)" && rg "BroadlandDistrictCouncil" -A 5 || echo "Search complete"

Repository: robbrad/UKBinCollectionData

Length of output: 949


🏁 Script executed:

# Check the actual BroadlandDistrictCouncil implementation to see how it uses kwargs
cat -n uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py | head -60

Repository: robbrad/UKBinCollectionData

Length of output: 2215


🏁 Script executed:

# Look at how other council classes handle URLs - check if they use kwargs.get("url")
rg "kwargs\.get\(.*url" -A 2 -B 2

Repository: robbrad/UKBinCollectionData

Length of output: 17682


URL from kwargs is retrieved but never used; requests.get() uses a hardcoded URL instead.

At line 23, url = kwargs.get("url") retrieves the configured URL (https://area.southnorfolkandbroadland.gov.uk/FindAddress from input.json), but at line 45, the code uses a different hardcoded URL (https://area.southnorfolkandbroadland.gov.uk/). This means configuration changes to the URL will be ignored. Use the url variable from kwargs in the request instead of the hardcoded value.

🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/BroadlandDistrictCouncil.py`
around lines 44 - 48, The requests.get call in BroadlandDistrictCouncil (where
you call requests.get(...)) is using a hardcoded
"https://area.southnorfolkandbroadland.gov.uk/" instead of the configured URL
stored in the local variable url (obtained from kwargs via url =
kwargs.get("url")); update that requests.get invocation to use the url variable
(keeping headers and timeout as-is) so the configured URL (e.g.,
".../FindAddress") from input.json is honored.

@PaulBrack PaulBrack marked this pull request as draft January 16, 2026 14:37
@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (43557af).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1814   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant