Skip to content

Remove Selenium from Arun#1812

Draft
PaulBrack wants to merge 2 commits intorobbrad:masterfrom
PaulBrack:remove_selenium_from_arun_council
Draft

Remove Selenium from Arun#1812
PaulBrack wants to merge 2 commits intorobbrad:masterfrom
PaulBrack:remove_selenium_from_arun_council

Conversation

@PaulBrack
Copy link

@PaulBrack PaulBrack commented Jan 15, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling for Arun Council bin collection data retrieval with clearer messages when collection information is unavailable.
  • Refactor

    • Enhanced Arun Council scraper for improved reliability and performance.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

The ArunCouncil scraper is refactored to replace Selenium browser automation with direct HTTP requests using the requests library and BeautifulSoup parsing. The scraper now implements a UPRN-based lookup flow with postcode validation and updates the parse_data method signature to accept optional parameters.

Changes

Cohort / File(s) Summary
Selenium to HTTP Migration
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py
Removed Selenium driver-based navigation; replaced with requests session and desktop User-Agent. Implemented UPRN validation via new check_uprn/check_postcode calls. Changed page parsing from driver methods to HTTP-fetched HTML with BeautifulSoup. Updated data extraction with new table selectors and appends dictionaries with "type" and "collectionDate" keys. Modified parse_data signature to make page parameter optional (page: str = None). Enhanced error handling with explicit checks for missing bin collection table. Removed driver.quit() cleanup logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Selenium's fade, requests now lead the way,
HTTP calls through UPRN's display,
No browser to drive, just parsing so fine,
BeautifulSoup's dance makes collection dates shine! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: removing Selenium dependency from the Arun Council scraper implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py`:
- Around line 33-58: The HTTP requests in ArunCouncil.py (the session calls
using s.get and s.post and the final response stored in r) aren't validated, so
add response checks: call raise_for_status() on each response returned by s.get/
s.post (the calls to f"{BASE}/when-are-my-bins-collected",
f"{BASE}/when-are-my-bins-collected/postcode",
f"{BASE}/when-are-my-bins-collected/select") and on the final r from
s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing;
ensure you capture each request's return value (e.g., resp = s.post(...)) then
call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.
- Line 16: The call to check_uprn(user_uprn) in ArunCouncil.py currently ignores
validation failures because check_uprn in common.py catches all exceptions and
only prints an error; either make the preferred fix in common.py by removing the
swallow—log the error then re-raise the exception (i.e., avoid a bare except and
raise the caught exception) so callers like the call in ArunCouncil.py will get
an exception, or if you can't change common.py right now, change the call in
ArunCouncil.py to explicitly check the result (e.g., if not
check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop
execution; reference check_uprn and the caller in ArunCouncil.py when making the
change.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (3)

8-8: Type hint should explicitly allow None.

Per PEP 484, use explicit str | None (or Optional[str]) when the default is None.

Suggested fix
-    def parse_data(self, page: str = None, **kwargs) -> dict:
+    def parse_data(self, page: str | None = None, **kwargs) -> dict:

64-64: soup.prettify() is a no-op here.

prettify() returns a formatted string but the return value isn't used. Either remove this line or assign the result if debugging output is needed.


70-80: Consider defensive null checks on element lookups.

If a table row lacks the expected th or td elements, find() returns None and .text.strip() will raise AttributeError. Consider adding guards or using a try/except within the loop to skip malformed rows gracefully.

Suggested defensive approach
             for row in table.find("tbody").find_all("tr"):
-                collection_type = (
-                    row.find("th", class_="govuk-table__header")
-                    .text.strip()
-                    .split(" ")[0]
-                )
-
-                collection_date = (
-                    row.find("td", class_="govuk-table__cell")
-                    .text.strip()
-                )
+                th = row.find("th", class_="govuk-table__header")
+                td = row.find("td", class_="govuk-table__cell")
+                if not th or not td:
+                    continue
+                collection_type = th.text.strip().split(" ")[0]
+                collection_date = td.text.strip()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and d54fdc3.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
  • AbstractGetBinDataClass (43-146)
uk_bin_collection/uk_bin_collection/common.py (2)
  • check_postcode (36-49)
  • check_uprn (67-78)
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

3-3: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


8-8: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: check_postcode may be undefined, or defined from star imports

(F405)


16-16: check_uprn may be undefined, or defined from star imports

(F405)


68-68: Abstract raise to an inner function

(TRY301)


68-68: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)

18-31: Session setup looks good.

The User-Agent and headers appropriately simulate a browser request, which is necessary for this council's website.


89-93: Exception handling and return structure are appropriate.

The catch-print-reraise pattern preserves error visibility while allowing callers to handle exceptions. The data structure matches the expected format.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


check_postcode(user_postcode)
driver.get(page)
check_uprn(user_uprn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the common.py file
fd common.py

Repository: robbrad/UKBinCollectionData

Length of output: 116


🏁 Script executed:

# Search for check_uprn function definition
rg -A 20 "def check_uprn" --type py

Repository: robbrad/UKBinCollectionData

Length of output: 1542


🏁 Script executed:

# Check ArunCouncil.py to see how check_uprn is used
rg -B 5 -A 5 "check_uprn" uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 426


check_uprn silently swallows validation errors and execution continues.

The check_uprn function in common.py catches all exceptions, prints an error message, but does not re-raise the exception or return a failure indicator. On line 16, the return value is not checked, so invalid UPRNs result only in a printed message while execution continues with potentially bad data.

Either:

  1. Fix check_uprn in common.py to re-raise the exception (preferred)
  2. Check the return value here: if not check_uprn(user_uprn): raise ValueError("Invalid UPRN")
🧰 Tools
🪛 Ruff (0.14.11)

16-16: check_uprn may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` at line 16, The
call to check_uprn(user_uprn) in ArunCouncil.py currently ignores validation
failures because check_uprn in common.py catches all exceptions and only prints
an error; either make the preferred fix in common.py by removing the swallow—log
the error then re-raise the exception (i.e., avoid a bare except and raise the
caught exception) so callers like the call in ArunCouncil.py will get an
exception, or if you can't change common.py right now, change the call in
ArunCouncil.py to explicitly check the result (e.g., if not
check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop
execution; reference check_uprn and the caller in ArunCouncil.py when making the
change.

Comment on lines +33 to 58
s.get(f"{BASE}/when-are-my-bins-collected")

s.post(
f"{BASE}/when-are-my-bins-collected/postcode",
data={"postcode": user_postcode},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected",
"Origin": BASE,
},
)
continue_button.click()

address_selection_menu = Select(driver.find_element(By.ID, "address"))
for idx, addr_option in enumerate(address_selection_menu.options):
option_name = addr_option.text[0 : len(user_paon)]
if option_name == user_paon:
selected_address = addr_option
break
address_selection_menu.select_by_visible_text(selected_address.text)

continue_button = WebDriverWait(driver, timeout=15).until(
EC.presence_of_element_located((By.CLASS_NAME, "govuk-button"))

s.post(
f"{BASE}/when-are-my-bins-collected/select",
data={"address": user_uprn},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/postcode",
"Origin": BASE,
},
)
continue_button.click()
# Check for text saying "Next collection dates"
WebDriverWait(driver, 30).until(
EC.presence_of_element_located(
(By.XPATH, "//*[contains(text(), 'Next collection dates')]")
)

r = s.get(
f"{BASE}/when-are-my-bins-collected/collections",
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/select",
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HTTP responses are not validated for success.

None of the requests check for HTTP errors. If the server returns 4xx/5xx (e.g., invalid postcode, rate limiting, server error), the code silently continues and fails later with a confusing error parsing the response.

Add raise_for_status() after critical requests, or at minimum check the final response before parsing.

Suggested fix
             s.get(f"{BASE}/when-are-my-bins-collected")

-            s.post(
+            resp = s.post(
                 f"{BASE}/when-are-my-bins-collected/postcode",
                 data={"postcode": user_postcode},
                 headers={
                     "Referer": f"{BASE}/when-are-my-bins-collected",
                     "Origin": BASE,
                 },
             )
+            resp.raise_for_status()

-            s.post(
+            resp = s.post(
                 f"{BASE}/when-are-my-bins-collected/select",
                 data={"address": user_uprn},
                 headers={
                     "Referer": f"{BASE}/when-are-my-bins-collected/postcode",
                     "Origin": BASE,
                 },
             )
+            resp.raise_for_status()

             r = s.get(
                 f"{BASE}/when-are-my-bins-collected/collections",
                 headers={
                     "Referer": f"{BASE}/when-are-my-bins-collected/select",
                 },
             )
+            r.raise_for_status()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s.get(f"{BASE}/when-are-my-bins-collected")
s.post(
f"{BASE}/when-are-my-bins-collected/postcode",
data={"postcode": user_postcode},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected",
"Origin": BASE,
},
)
continue_button.click()
address_selection_menu = Select(driver.find_element(By.ID, "address"))
for idx, addr_option in enumerate(address_selection_menu.options):
option_name = addr_option.text[0 : len(user_paon)]
if option_name == user_paon:
selected_address = addr_option
break
address_selection_menu.select_by_visible_text(selected_address.text)
continue_button = WebDriverWait(driver, timeout=15).until(
EC.presence_of_element_located((By.CLASS_NAME, "govuk-button"))
s.post(
f"{BASE}/when-are-my-bins-collected/select",
data={"address": user_uprn},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/postcode",
"Origin": BASE,
},
)
continue_button.click()
# Check for text saying "Next collection dates"
WebDriverWait(driver, 30).until(
EC.presence_of_element_located(
(By.XPATH, "//*[contains(text(), 'Next collection dates')]")
)
r = s.get(
f"{BASE}/when-are-my-bins-collected/collections",
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/select",
},
)
s.get(f"{BASE}/when-are-my-bins-collected")
resp = s.post(
f"{BASE}/when-are-my-bins-collected/postcode",
data={"postcode": user_postcode},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected",
"Origin": BASE,
},
)
resp.raise_for_status()
resp = s.post(
f"{BASE}/when-are-my-bins-collected/select",
data={"address": user_uprn},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/postcode",
"Origin": BASE,
},
)
resp.raise_for_status()
r = s.get(
f"{BASE}/when-are-my-bins-collected/collections",
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/select",
},
)
r.raise_for_status()
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` around lines 33
- 58, The HTTP requests in ArunCouncil.py (the session calls using s.get and
s.post and the final response stored in r) aren't validated, so add response
checks: call raise_for_status() on each response returned by s.get/ s.post (the
calls to f"{BASE}/when-are-my-bins-collected",
f"{BASE}/when-are-my-bins-collected/postcode",
f"{BASE}/when-are-my-bins-collected/select") and on the final r from
s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing;
ensure you capture each request's return value (e.g., resp = s.post(...)) then
call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.

@PaulBrack PaulBrack changed the title Remove selenium from arun council Remove selenium from Arun council Jan 15, 2026
@PaulBrack PaulBrack changed the title Remove selenium from Arun council Remove Selenium from Arun council Jan 15, 2026
@PaulBrack PaulBrack changed the title Remove Selenium from Arun council Remove Selenium from Arun Jan 15, 2026
@PaulBrack PaulBrack marked this pull request as draft January 16, 2026 14:38
@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (d54fdc3).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1812   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant