Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions uk_bin_collection/uk_bin_collection/councils/NewhamCouncil.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def parse_data(self, page: str, **kwargs) -> dict:
raise ValueError(f"Error getting identifier: {str(e)}")

# Make a BS4 object
page = requests.get(url)
page = requests.get(url, verify=False)
soup = BeautifulSoup(page.text, "html.parser")
soup.prettify

Expand All @@ -36,20 +36,27 @@ def parse_data(self, page: str, **kwargs) -> dict:
if len(sections_recycling) > 0:
sections.append(sections_recycling[0])

# as well as one for food waste
sections_food_waste = soup.find_all(
"div", {"class": "card h-100 card-food"}
)
if len(sections_food_waste) > 0:
sections.append(sections_food_waste[0])

# For each bin section, get the text and the list elements
for item in sections:
header = item.find("div", {"class": "card-header"})
bin_type_element = header.find_next("b")
if bin_type_element is not None:
bin_type = bin_type_element.text
array_expected_types = ["Domestic", "Recycling"]
array_expected_types = ["Domestic", "Recycling", "Food Waste"]
if bin_type in array_expected_types:
date = (
item.find_next("p", {"class": "card-text"})
.find_next("mark")
.next_sibling.strip()
)
Comment on lines 47 to 58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Chained attribute access can raise AttributeError if the food waste card has a different DOM structure.

header, bin_type_element, and the find_next("p")...find_next("mark").next_sibling chain all assume a specific HTML structure. If the food waste card layout differs even slightly (e.g., missing <mark> tag), this will throw an unhandled AttributeError. This was a pre-existing risk for domestic/recycling but the new food waste type increases the surface area.

Consider wrapping the inner parsing in a try/except or adding None guards.

🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/NewhamCouncil.py` around lines
47 - 58, The parsing loop for sections (variables: header, bin_type_element,
array_expected_types, date) assumes a rigid DOM and can raise AttributeError
when chaining item.find_next("p", {"class":
"card-text"}).find_next("mark").next_sibling; modify the loop to defensively
handle missing nodes by either (a) checking each intermediate value is not None
before accessing its children (verify header, bin_p = item.find_next("p",
{"class":"card-text"}), mark = bin_p.find_next("mark"), and mark.next_sibling)
and only then assign date, or (b) wrap the inner parsing in a try/except
AttributeError that logs/continues on failure; ensure the check for bin_type in
array_expected_types remains and that malformed cards are skipped without
raising.

next_collection = datetime.strptime(date, "%d/%m/%Y")
next_collection = datetime.strptime(date, "%m/%d/%Y")

dict_data = {
"type": bin_type,
Expand Down