Conversation
📝 WalkthroughWalkthroughThe ArunCouncil scraper is refactored to replace Selenium browser automation with direct HTTP requests using the requests library and BeautifulSoup parsing. The scraper now implements a UPRN-based lookup flow with postcode validation and updates the parse_data method signature to accept optional parameters. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py`:
- Around line 33-58: The HTTP requests in ArunCouncil.py (the session calls
using s.get and s.post and the final response stored in r) aren't validated, so
add response checks: call raise_for_status() on each response returned by s.get/
s.post (the calls to f"{BASE}/when-are-my-bins-collected",
f"{BASE}/when-are-my-bins-collected/postcode",
f"{BASE}/when-are-my-bins-collected/select") and on the final r from
s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing;
ensure you capture each request's return value (e.g., resp = s.post(...)) then
call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.
- Line 16: The call to check_uprn(user_uprn) in ArunCouncil.py currently ignores
validation failures because check_uprn in common.py catches all exceptions and
only prints an error; either make the preferred fix in common.py by removing the
swallow—log the error then re-raise the exception (i.e., avoid a bare except and
raise the caught exception) so callers like the call in ArunCouncil.py will get
an exception, or if you can't change common.py right now, change the call in
ArunCouncil.py to explicitly check the result (e.g., if not
check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop
execution; reference check_uprn and the caller in ArunCouncil.py when making the
change.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (3)
8-8: Type hint should explicitly allowNone.Per PEP 484, use explicit
str | None(orOptional[str]) when the default isNone.Suggested fix
- def parse_data(self, page: str = None, **kwargs) -> dict: + def parse_data(self, page: str | None = None, **kwargs) -> dict:
64-64:soup.prettify()is a no-op here.
prettify()returns a formatted string but the return value isn't used. Either remove this line or assign the result if debugging output is needed.
70-80: Consider defensive null checks on element lookups.If a table row lacks the expected
thortdelements,find()returnsNoneand.text.strip()will raiseAttributeError. Consider adding guards or using atry/exceptwithin the loop to skip malformed rows gracefully.Suggested defensive approach
for row in table.find("tbody").find_all("tr"): - collection_type = ( - row.find("th", class_="govuk-table__header") - .text.strip() - .split(" ")[0] - ) - - collection_date = ( - row.find("td", class_="govuk-table__cell") - .text.strip() - ) + th = row.find("th", class_="govuk-table__header") + td = row.find("td", class_="govuk-table__cell") + if not th or not td: + continue + collection_type = th.text.strip().split(" ")[0] + collection_date = td.text.strip()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)
uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (2)
check_postcode(36-49)check_uprn(67-78)
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py
3-3: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
8-8: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
15-15: check_postcode may be undefined, or defined from star imports
(F405)
16-16: check_uprn may be undefined, or defined from star imports
(F405)
68-68: Abstract raise to an inner function
(TRY301)
68-68: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py (2)
18-31: Session setup looks good.The User-Agent and headers appropriately simulate a browser request, which is necessary for this council's website.
89-93: Exception handling and return structure are appropriate.The catch-print-reraise pattern preserves error visibility while allowing callers to handle exceptions. The data structure matches the expected format.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
|
||
| check_postcode(user_postcode) | ||
| driver.get(page) | ||
| check_uprn(user_uprn) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the common.py file
fd common.pyRepository: robbrad/UKBinCollectionData
Length of output: 116
🏁 Script executed:
# Search for check_uprn function definition
rg -A 20 "def check_uprn" --type pyRepository: robbrad/UKBinCollectionData
Length of output: 1542
🏁 Script executed:
# Check ArunCouncil.py to see how check_uprn is used
rg -B 5 -A 5 "check_uprn" uk_bin_collection/uk_bin_collection/councils/ArunCouncil.pyRepository: robbrad/UKBinCollectionData
Length of output: 426
check_uprn silently swallows validation errors and execution continues.
The check_uprn function in common.py catches all exceptions, prints an error message, but does not re-raise the exception or return a failure indicator. On line 16, the return value is not checked, so invalid UPRNs result only in a printed message while execution continues with potentially bad data.
Either:
- Fix
check_uprnincommon.pyto re-raise the exception (preferred) - Check the return value here:
if not check_uprn(user_uprn): raise ValueError("Invalid UPRN")
🧰 Tools
🪛 Ruff (0.14.11)
16-16: check_uprn may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` at line 16, The
call to check_uprn(user_uprn) in ArunCouncil.py currently ignores validation
failures because check_uprn in common.py catches all exceptions and only prints
an error; either make the preferred fix in common.py by removing the swallow—log
the error then re-raise the exception (i.e., avoid a bare except and raise the
caught exception) so callers like the call in ArunCouncil.py will get an
exception, or if you can't change common.py right now, change the call in
ArunCouncil.py to explicitly check the result (e.g., if not
check_uprn(user_uprn): raise ValueError("Invalid UPRN")) so invalid UPRNs stop
execution; reference check_uprn and the caller in ArunCouncil.py when making the
change.
| s.get(f"{BASE}/when-are-my-bins-collected") | ||
|
|
||
| s.post( | ||
| f"{BASE}/when-are-my-bins-collected/postcode", | ||
| data={"postcode": user_postcode}, | ||
| headers={ | ||
| "Referer": f"{BASE}/when-are-my-bins-collected", | ||
| "Origin": BASE, | ||
| }, | ||
| ) | ||
| continue_button.click() | ||
|
|
||
| address_selection_menu = Select(driver.find_element(By.ID, "address")) | ||
| for idx, addr_option in enumerate(address_selection_menu.options): | ||
| option_name = addr_option.text[0 : len(user_paon)] | ||
| if option_name == user_paon: | ||
| selected_address = addr_option | ||
| break | ||
| address_selection_menu.select_by_visible_text(selected_address.text) | ||
|
|
||
| continue_button = WebDriverWait(driver, timeout=15).until( | ||
| EC.presence_of_element_located((By.CLASS_NAME, "govuk-button")) | ||
|
|
||
| s.post( | ||
| f"{BASE}/when-are-my-bins-collected/select", | ||
| data={"address": user_uprn}, | ||
| headers={ | ||
| "Referer": f"{BASE}/when-are-my-bins-collected/postcode", | ||
| "Origin": BASE, | ||
| }, | ||
| ) | ||
| continue_button.click() | ||
| # Check for text saying "Next collection dates" | ||
| WebDriverWait(driver, 30).until( | ||
| EC.presence_of_element_located( | ||
| (By.XPATH, "//*[contains(text(), 'Next collection dates')]") | ||
| ) | ||
|
|
||
| r = s.get( | ||
| f"{BASE}/when-are-my-bins-collected/collections", | ||
| headers={ | ||
| "Referer": f"{BASE}/when-are-my-bins-collected/select", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
HTTP responses are not validated for success.
None of the requests check for HTTP errors. If the server returns 4xx/5xx (e.g., invalid postcode, rate limiting, server error), the code silently continues and fails later with a confusing error parsing the response.
Add raise_for_status() after critical requests, or at minimum check the final response before parsing.
Suggested fix
s.get(f"{BASE}/when-are-my-bins-collected")
- s.post(
+ resp = s.post(
f"{BASE}/when-are-my-bins-collected/postcode",
data={"postcode": user_postcode},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected",
"Origin": BASE,
},
)
+ resp.raise_for_status()
- s.post(
+ resp = s.post(
f"{BASE}/when-are-my-bins-collected/select",
data={"address": user_uprn},
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/postcode",
"Origin": BASE,
},
)
+ resp.raise_for_status()
r = s.get(
f"{BASE}/when-are-my-bins-collected/collections",
headers={
"Referer": f"{BASE}/when-are-my-bins-collected/select",
},
)
+ r.raise_for_status()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| s.get(f"{BASE}/when-are-my-bins-collected") | |
| s.post( | |
| f"{BASE}/when-are-my-bins-collected/postcode", | |
| data={"postcode": user_postcode}, | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected", | |
| "Origin": BASE, | |
| }, | |
| ) | |
| continue_button.click() | |
| address_selection_menu = Select(driver.find_element(By.ID, "address")) | |
| for idx, addr_option in enumerate(address_selection_menu.options): | |
| option_name = addr_option.text[0 : len(user_paon)] | |
| if option_name == user_paon: | |
| selected_address = addr_option | |
| break | |
| address_selection_menu.select_by_visible_text(selected_address.text) | |
| continue_button = WebDriverWait(driver, timeout=15).until( | |
| EC.presence_of_element_located((By.CLASS_NAME, "govuk-button")) | |
| s.post( | |
| f"{BASE}/when-are-my-bins-collected/select", | |
| data={"address": user_uprn}, | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected/postcode", | |
| "Origin": BASE, | |
| }, | |
| ) | |
| continue_button.click() | |
| # Check for text saying "Next collection dates" | |
| WebDriverWait(driver, 30).until( | |
| EC.presence_of_element_located( | |
| (By.XPATH, "//*[contains(text(), 'Next collection dates')]") | |
| ) | |
| r = s.get( | |
| f"{BASE}/when-are-my-bins-collected/collections", | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected/select", | |
| }, | |
| ) | |
| s.get(f"{BASE}/when-are-my-bins-collected") | |
| resp = s.post( | |
| f"{BASE}/when-are-my-bins-collected/postcode", | |
| data={"postcode": user_postcode}, | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected", | |
| "Origin": BASE, | |
| }, | |
| ) | |
| resp.raise_for_status() | |
| resp = s.post( | |
| f"{BASE}/when-are-my-bins-collected/select", | |
| data={"address": user_uprn}, | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected/postcode", | |
| "Origin": BASE, | |
| }, | |
| ) | |
| resp.raise_for_status() | |
| r = s.get( | |
| f"{BASE}/when-are-my-bins-collected/collections", | |
| headers={ | |
| "Referer": f"{BASE}/when-are-my-bins-collected/select", | |
| }, | |
| ) | |
| r.raise_for_status() |
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/ArunCouncil.py` around lines 33
- 58, The HTTP requests in ArunCouncil.py (the session calls using s.get and
s.post and the final response stored in r) aren't validated, so add response
checks: call raise_for_status() on each response returned by s.get/ s.post (the
calls to f"{BASE}/when-are-my-bins-collected",
f"{BASE}/when-are-my-bins-collected/postcode",
f"{BASE}/when-are-my-bins-collected/select") and on the final r from
s.get(f"{BASE}/when-are-my-bins-collected/collections") before any parsing;
ensure you capture each request's return value (e.g., resp = s.post(...)) then
call resp.raise_for_status() so HTTP 4xx/5xx are surfaced immediately.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1812 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.