Remove Selenium from Arun by PaulBrack · Pull Request #1812 · robbrad/UKBinCollectionData

PaulBrack · 2026-01-15T20:15:55Z

Summary by CodeRabbit

Bug Fixes
- Improved error handling for Arun Council bin collection data retrieval with clearer messages when collection information is unavailable.
Refactor
- Enhanced Arun Council scraper for improved reliability and performance.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T20:16:08Z

📝 Walkthrough

Walkthrough

The ArunCouncil scraper is refactored to replace Selenium browser automation with direct HTTP requests using the requests library and BeautifulSoup parsing. The scraper now implements a UPRN-based lookup flow with postcode validation and updates the parse_data method signature to accept optional parameters.

Changes

Cohort / File(s)	Summary
Selenium to HTTP Migration `uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py`	Removed Selenium driver-based navigation; replaced with requests session and desktop User-Agent. Implemented UPRN validation via new check_uprn/check_postcode calls. Changed page parsing from driver methods to HTTP-fetched HTML with BeautifulSoup. Updated data extraction with new table selectors and appends dictionaries with "type" and "collectionDate" keys. Modified parse_data signature to make page parameter optional (page: str = None). Enhanced error handling with explicit checks for missing bin collection table. Removed driver.quit() cleanup logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add direct mobile API support to North Hertfordshire District Council scraper #1715: Performs the same Selenium-to-HTTP migration pattern with UPRN-driven flow for a different council scraper
fix: Cumberland Council #1764: Converts another council scraper (CumberlandCouncil) from form-based Selenium to direct UPRN-driven HTTP requests with HTML parsing

Poem

🐰 Selenium's fade, requests now lead the way,
HTTP calls through UPRN's display,
No browser to drive, just parsing so fine,
BeautifulSoup's dance makes collection dates shine! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: removing Selenium dependency from the Arun Council scraper implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py`:
- Around line 33-58: The HTTP requests in ArunCouncil.py (the session calls
using s.get and s.post and the final response stored in r) aren't validated, so
add response checks: call raise_for_status() on each response returned by s.get/
s.post (the calls to f"{BASE}/when-are-my-bins-collected",
f"{BASE}/when-are-my-bins-collected/postcode",
f"{BASE}/when-are-my-bins-collected/select") and on the final r from
s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing;
ensure you capture each request's return value (e.g., resp = s.post(...)) then
call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.
- Line 16: The call to check_uprn(user_uprn) in ArunCouncil.py currently ignores
validation failures because check_uprn in common.py catches all exceptions and
only prints an error; either make the preferred fix in common.py by removing the
swallow—log the error then re-raise the exception (i.e., avoid a bare except and
raise the caught exception) so callers like the call in ArunCouncil.py will get
an exception, or if you can't change common.py right now, change the call in
ArunCouncil.py to explicitly check the result (e.g., if not
check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop
execution; reference check_uprn and the caller in ArunCouncil.py when making the
change.

🧹 Nitpick comments (3)

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (3)

8-8: Type hint should explicitly allow None.

Per PEP 484, use explicit str | None (or Optional[str]) when the default is None.

Suggested fix

-    def parse_data(self, page: str = None, **kwargs) -> dict:
+    def parse_data(self, page: str | None = None, **kwargs) -> dict:

64-64: soup.prettify() is a no-op here.

prettify() returns a formatted string but the return value isn't used. Either remove this line or assign the result if debugging output is needed.

70-80: Consider defensive null checks on element lookups.

If a table row lacks the expected th or td elements, find() returns None and .text.strip() will raise AttributeError. Consider adding guards or using a try/except within the loop to skip malformed rows gracefully.

Suggested defensive approach

             for row in table.find("tbody").find_all("tr"):
-                collection_type = (
-                    row.find("th", class_="govuk-table__header")
-                    .text.strip()
-                    .split(" ")[0]
-                )
-
-                collection_date = (
-                    row.find("td", class_="govuk-table__cell")
-                    .text.strip()
-                )
+                th = row.find("th", class_="govuk-table__header")
+                td = row.find("td", class_="govuk-table__cell")
+                if not th or not td:
+                    continue
+                collection_type = th.text.strip().split(" ")[0]
+                collection_date = td.text.strip()

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and d54fdc3.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (2)

check_postcode (36-49)

check_uprn (67-78)

🪛 Ruff (0.14.11)

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

3-3: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

8-8: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

15-15: check_postcode may be undefined, or defined from star imports

(F405)

16-16: check_uprn may be undefined, or defined from star imports

(F405)

68-68: Abstract raise to an inner function

(TRY301)

68-68: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)

18-31: Session setup looks good.

The User-Agent and headers appropriately simulate a browser request, which is necessary for this council's website.

89-93: Exception handling and return structure are appropriate.

The catch-print-reraise pattern preserves error visibility while allowing callers to handle exceptions. The data structure matches the expected format.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T20:18:13Z

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

+
            check_postcode(user_postcode)
-            driver.get(page)
+            check_uprn(user_uprn)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the common.py file fd common.py

Repository: robbrad/UKBinCollectionData

Length of output: 116

🏁 Script executed:

# Search for check_uprn function definition rg -A 20 "def check_uprn" --type py

Repository: robbrad/UKBinCollectionData

Length of output: 1542

🏁 Script executed:

# Check ArunCouncil.py to see how check_uprn is used rg -B 5 -A 5 "check_uprn" uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 426

check_uprn silently swallows validation errors and execution continues.

The check_uprn function in common.py catches all exceptions, prints an error message, but does not re-raise the exception or return a failure indicator. On line 16, the return value is not checked, so invalid UPRNs result only in a printed message while execution continues with potentially bad data.

Either:

Fix check_uprn in common.py to re-raise the exception (preferred)

Check the return value here: if not check_uprn(user_uprn): raise ValueError("Invalid UPRN")

🧰 Tools

🪛 Ruff (0.14.11)

16-16: check_uprn may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` at line 16, The call to check_uprn(user_uprn) in ArunCouncil.py currently ignores validation failures because check_uprn in common.py catches all exceptions and only prints an error; either make the preferred fix in common.py by removing the swallow—log the error then re-raise the exception (i.e., avoid a bare except and raise the caught exception) so callers like the call in ArunCouncil.py will get an exception, or if you can't change common.py right now, change the call in ArunCouncil.py to explicitly check the result (e.g., if not check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop execution; reference check_uprn and the caller in ArunCouncil.py when making the change.

coderabbitai · 2026-01-15T20:18:13Z

uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

+            s.get(f"{BASE}/when-are-my-bins-collected")
+
+            s.post(
+                f"{BASE}/when-are-my-bins-collected/postcode",
+                data={"postcode": user_postcode},
+                headers={
+                    "Referer": f"{BASE}/when-are-my-bins-collected",
+                    "Origin": BASE,
+                },
            )
-            continue_button.click()
-
-            address_selection_menu = Select(driver.find_element(By.ID, "address"))
-            for idx, addr_option in enumerate(address_selection_menu.options):
-                option_name = addr_option.text[0 : len(user_paon)]
-                if option_name == user_paon:
-                    selected_address = addr_option
-                    break
-            address_selection_menu.select_by_visible_text(selected_address.text)
-
-            continue_button = WebDriverWait(driver, timeout=15).until(
-                EC.presence_of_element_located((By.CLASS_NAME, "govuk-button"))
+
+            s.post(
+                f"{BASE}/when-are-my-bins-collected/select",
+                data={"address": user_uprn},
+                headers={
+                    "Referer": f"{BASE}/when-are-my-bins-collected/postcode",
+                    "Origin": BASE,
+                },
            )
-            continue_button.click()
-            # Check for text saying "Next collection dates"
-            WebDriverWait(driver, 30).until(
-                EC.presence_of_element_located(
-                    (By.XPATH, "//*[contains(text(), 'Next collection dates')]")
-                )
+
+            r = s.get(
+                f"{BASE}/when-are-my-bins-collected/collections",
+                headers={
+                    "Referer": f"{BASE}/when-are-my-bins-collected/select",
+                },
            )


⚠️ Potential issue | 🟠 Major

HTTP responses are not validated for success.

None of the requests check for HTTP errors. If the server returns 4xx/5xx (e.g., invalid postcode, rate limiting, server error), the code silently continues and fails later with a confusing error parsing the response.

Add raise_for_status() after critical requests, or at minimum check the final response before parsing.

Suggested fix

s.get(f"{BASE}/when-are-my-bins-collected") - s.post( + resp = s.post( f"{BASE}/when-are-my-bins-collected/postcode", data={"postcode": user_postcode}, headers={ "Referer": f"{BASE}/when-are-my-bins-collected", "Origin": BASE, }, ) + resp.raise_for_status() - s.post( + resp = s.post( f"{BASE}/when-are-my-bins-collected/select", data={"address": user_uprn}, headers={ "Referer": f"{BASE}/when-are-my-bins-collected/postcode", "Origin": BASE, }, ) + resp.raise_for_status() r = s.get( f"{BASE}/when-are-my-bins-collected/collections", headers={ "Referer": f"{BASE}/when-are-my-bins-collected/select", }, ) + r.raise_for_status()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

s.get(f"{BASE}/when-are-my-bins-collected")

s.post(

f"{BASE}/when-are-my-bins-collected/postcode",

data={"postcode": user_postcode},

headers={

"Referer": f"{BASE}/when-are-my-bins-collected",

"Origin": BASE,

},

)

continue_button.click()

address_selection_menu = Select(driver.find_element(By.ID, "address"))

for idx, addr_option in enumerate(address_selection_menu.options):

option_name = addr_option.text[0 : len(user_paon)]

if option_name == user_paon:

selected_address = addr_option

break

address_selection_menu.select_by_visible_text(selected_address.text)

continue_button = WebDriverWait(driver, timeout=15).until(

EC.presence_of_element_located((By.CLASS_NAME, "govuk-button"))

s.post(

f"{BASE}/when-are-my-bins-collected/select",

data={"address": user_uprn},

headers={

"Referer": f"{BASE}/when-are-my-bins-collected/postcode",

"Origin": BASE,

},

)

continue_button.click()

# Check for text saying "Next collection dates"

WebDriverWait(driver, 30).until(

EC.presence_of_element_located(

(By.XPATH, "//*[contains(text(), 'Next collection dates')]")

)

r = s.get(

f"{BASE}/when-are-my-bins-collected/collections",

headers={

"Referer": f"{BASE}/when-are-my-bins-collected/select",

},

)

s.get(f"{BASE}/when-are-my-bins-collected")

resp = s.post(

f"{BASE}/when-are-my-bins-collected/postcode",

data={"postcode": user_postcode},

headers={

"Referer": f"{BASE}/when-are-my-bins-collected",

"Origin": BASE,

},

)

resp.raise_for_status()

resp = s.post(

f"{BASE}/when-are-my-bins-collected/select",

data={"address": user_uprn},

headers={

"Referer": f"{BASE}/when-are-my-bins-collected/postcode",

"Origin": BASE,

},

)

resp.raise_for_status()

r = s.get(

f"{BASE}/when-are-my-bins-collected/collections",

headers={

"Referer": f"{BASE}/when-are-my-bins-collected/select",

},

)

r.raise_for_status()

🤖 Prompt for AI Agents

In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` around lines 33 - 58, The HTTP requests in ArunCouncil.py (the session calls using s.get and s.post and the final response stored in r) aren't validated, so add response checks: call raise_for_status() on each response returned by s.get/ s.post (the calls to f"{BASE}/when-are-my-bins-collected", f"{BASE}/when-are-my-bins-collected/postcode", f"{BASE}/when-are-my-bins-collected/select") and on the final r from s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing; ensure you capture each request's return value (e.g., resp = s.post(...)) then call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.

codecov · 2026-01-17T07:32:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (d54fdc3).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1812   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Remove Selenium

24db77a

Match original formatting

d54fdc3

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

PaulBrack changed the title ~~Remove selenium from arun council~~ Remove selenium from Arun council Jan 15, 2026

PaulBrack changed the title ~~Remove selenium from Arun council~~ Remove Selenium from Arun council Jan 15, 2026

PaulBrack changed the title ~~Remove Selenium from Arun council~~ Remove Selenium from Arun Jan 15, 2026

PaulBrack marked this pull request as draft January 16, 2026 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Selenium from Arun#1812

Remove Selenium from Arun#1812
PaulBrack wants to merge 2 commits intorobbrad:masterfrom
PaulBrack:remove_selenium_from_arun_council

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

codecov bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PaulBrack commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading