Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions docs/url-realism-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# URL Realism Audit

## Scope

This audit covers user-visible URL surfaces in WebHarbor mirror sites: share
links, copy-link buttons, hidden post-action return URLs, and middleware that
rewrites external hosts into a local mirror. Benchmark task entry points are
out of scope because `sites/*/tasks.jsonl` intentionally points agents at
`http://localhost:40000` through `http://localhost:40014`.

## Findings

### Google Map Share And Website URLs

The Google Map place detail share box rendered a local mirror URL:

```text
http://localhost:40008/place/<slug>
```

The same site seeded many place website rows as `https://example.com/<slug>`.
Both values could be exposed in the user-facing place detail page.

Fix:

- The share box now uses a real Google Maps place URL:

```text
https://www.google.com/maps/place/<place+city>/
```

- Future Google Map seed data writes Google Maps place URLs instead of
`example.com` placeholders.
- Runtime rendering falls back to the Google Maps place URL when an existing
packaged seed database still contains an `example.com` placeholder.

### Booking And Allrecipes Return URLs

Several save buttons wrote `{{ request.url }}` into hidden `next` inputs. In a
local mirror this serializes an absolute local URL such as
`http://localhost:40005/...` into the DOM. That URL is not a real upstream URL
and also makes post-action redirects trust host-derived input.

Fix:

- Booking and Allrecipes now render relative return paths via
`current_relative_url()`.
- Their post-action redirects validate `next` values with
`safe_redirect_target()` and only allow root-relative paths.

### BBC News Share URL

The BBC News article share button copied `window.location.href`, which copies
the local mirror article URL. The Article rows already carry a `source_url`
field from the source dataset.

Fix:

- Article detail now passes `article_share_url` to the template.
- The copy button writes the real `source_url`, or a BBC article fallback URL
when no source URL exists.

### GitHub External-Host Recovery

The GitHub mirror has middleware for agents that accidentally request
`github.com` with this Flask app as the backend. It previously redirected to a
fixed local mirror URL:

```text
http://localhost:40006<path>
```

That is brittle outside the default port layout and leaks a local address when
the app is mounted under another host or port.

Fix:

- The middleware now redirects to the same relative path on the current host.

## Intentional Matches

These URL-looking values are intentional and should not be "fixed":

- `sites/*/tasks.jsonl`: local mirror entry URLs for benchmark agents.
- `README.md`, `AGENTS.md`, `CONTRIBUTING.md`, `Dockerfile`, and
`site_runner.py`: local runtime and container wiring.
- Form placeholders such as `you@example.com` and editable profile placeholders
such as `https://example.com`.
- GitHub Codespaces terminal copy that says a dev server listens on
`http://localhost:3000`.
- JavaScript that uses `window.location.href` only to update the current page's
query string or navigate to a local route.

## Regression Rule

Run this before merging URL-related changes:

```bash
python3 scripts/check_url_realism.py
```

The check guards the known user-visible leaks while leaving benchmark/runtime
localhost contracts intact.
115 changes: 115 additions & 0 deletions scripts/check_url_realism.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env python3
"""Regression checks for user-visible URL realism across WebHarbor sites."""

from pathlib import Path


ROOT = Path(__file__).resolve().parents[1]

CHECKS = [
(
"Google Map share UI must not expose the local mirror port",
ROOT / "sites/google_map/templates/place_detail.html",
["http://localhost:40008", "request.url"],
),
(
"Google Map seeds must not write example.com place websites",
ROOT / "sites/google_map/seed_data.py",
[
'website=f"https://example.com/{slug}"',
'website=f"https://example.com/{syn_slug}"',
'kwargs.pop("website", f"https://example.com/{slug}")',
],
),
(
"BBC article share should not copy the mirror location",
ROOT / "sites/bbc_news/templates/article_detail.html",
["navigator.clipboard.writeText(window.location.href)"],
),
(
"GitHub external-host recovery must not redirect to a fixed local port",
ROOT / "sites/github/app.py",
['redirect(f"http://localhost:40006{target}"'],
),
]

REQUIRED = [
(
"Google Map runtime URL helpers",
ROOT / "sites/google_map/app.py",
["def google_maps_place_url(place):", "def display_place_website(place):"],
),
(
"BBC real share URL helper",
ROOT / "sites/bbc_news/app.py",
["def bbc_article_share_url(article):", "article_share_url=bbc_article_share_url(art)"],
),
(
"Allrecipes relative next helper",
ROOT / "sites/allrecipes/app.py",
["def safe_redirect_target(target", "def current_relative_url():"],
),
(
"Booking relative next helper",
ROOT / "sites/booking/app.py",
["def safe_redirect_target(target", "def current_relative_url():"],
),
]

RELATIVE_NEXT_TEMPLATES = [
ROOT / "sites/allrecipes/templates/recipe_detail.html",
ROOT / "sites/booking/templates/index.html",
ROOT / "sites/booking/templates/deals.html",
ROOT / "sites/booking/templates/stays.html",
ROOT / "sites/booking/templates/property_detail.html",
]


def check_forbidden():
failed = False
for label, path, needles in CHECKS:
text = path.read_text()
for needle in needles:
if needle in text:
print(f"{label}: forbidden pattern in {path.relative_to(ROOT)}: {needle}")
failed = True
return failed


def check_required():
failed = False
for label, path, needles in REQUIRED:
text = path.read_text()
for needle in needles:
if needle not in text:
print(f"{label}: missing required pattern in {path.relative_to(ROOT)}: {needle}")
failed = True
return failed


def check_relative_next_templates():
failed = False
for path in RELATIVE_NEXT_TEMPLATES:
text = path.read_text()
if 'name="next" value="{{ request.url }}"' in text:
print(f"absolute request.url next value in {path.relative_to(ROOT)}")
failed = True
if 'name="next" value="{{ current_relative_url() }}"' not in text:
print(f"missing current_relative_url() next value in {path.relative_to(ROOT)}")
failed = True
return failed


def main():
failed = False
failed |= check_forbidden()
failed |= check_required()
failed |= check_relative_next_templates()

if failed:
raise SystemExit(1)
print("URL realism checks passed")


if __name__ == "__main__":
main()
22 changes: 18 additions & 4 deletions sites/allrecipes/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,21 @@ def is_in_recipe_box(recipe_id):
user_id=current_user.id, recipe_id=recipe_id).first() is not None
return False

return dict(recipe_box_count=recipe_box_count, is_in_recipe_box=is_in_recipe_box)
def current_relative_url():
path = request.full_path.rstrip('?')
return path or url_for('index')

return dict(
recipe_box_count=recipe_box_count,
is_in_recipe_box=is_in_recipe_box,
current_relative_url=current_relative_url,
)


def safe_redirect_target(target, default_endpoint='index'):
if target and target.startswith('/') and not target.startswith('//'):
return target
return url_for(default_endpoint)


# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -1113,7 +1127,7 @@ def login():
login_user(user, remember=request.form.get('remember'))
flash('Welcome back!', 'success')
next_page = request.args.get('next')
return redirect(next_page or url_for('index'))
return redirect(safe_redirect_target(next_page))
flash('Invalid email or password.', 'danger')
return render_template('login.html')

Expand Down Expand Up @@ -1275,8 +1289,8 @@ def save_to_recipe_box(recipe_id):
flash(f'"{recipe.title}" saved to your Recipe Box.', 'success')
else:
flash(f'"{recipe.title}" is already in your Recipe Box.', 'info')
next_page = request.form.get('next') or request.referrer or url_for('recipe_box')
return redirect(next_page)
next_page = request.form.get('next')
return redirect(safe_redirect_target(next_page, 'recipe_box'))


@app.route('/recipe-box/note/<int:item_id>', methods=['POST'])
Expand Down
2 changes: 1 addition & 1 deletion sites/allrecipes/templates/recipe_detail.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ <h1>{{ recipe.title }}</h1>
{% if not is_in_recipe_box(recipe.id) %}
<form method="POST" action="{{ url_for('save_to_recipe_box', recipe_id=recipe.id) }}" style="display:inline">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="next" value="{{ request.url }}">
<input type="hidden" name="next" value="{{ current_relative_url() }}">
<button type="submit" class="recipe-action-btn" style="margin-left:4px">+ Save to Box</button>
</form>
{% endif %}
Expand Down
7 changes: 7 additions & 0 deletions sites/bbc_news/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@
csrf = CSRFProtect(app)


def bbc_article_share_url(article):
if article.source_url:
return article.source_url
return f"https://www.bbc.com/news/articles/{article.slug}"


# =======================================================================
# MODELS
# =======================================================================
Expand Down Expand Up @@ -1068,6 +1074,7 @@ def article_detail(slug):
return render_template(
"article_detail.html",
article=art,
article_share_url=bbc_article_share_url(art),
article_gallery=article_gallery,
related=related,
more_articles=more_articles,
Expand Down
2 changes: 1 addition & 1 deletion sites/bbc_news/templates/article_detail.html
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ <h3 class="side-title">{{ article.category.name }}</h3>
{% block extra_js %}
<script>
function copyShareUrl() {
navigator.clipboard.writeText(window.location.href).then(() => {
navigator.clipboard.writeText({{ article_share_url|tojson }}).then(() => {
alert('Article link copied to clipboard');
}).catch(() => {});
}
Expand Down
24 changes: 18 additions & 6 deletions sites/booking/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,13 +511,25 @@ def inject_global():
'saved_count': saved_count,
'current_year': datetime.now().year,
'csrf_token_value': generate_csrf(),
'current_relative_url': current_relative_url,
'currency_code': currency_code,
'currency_rate': CURRENCY_RATES[currency_code],
'currency_symbol': CURRENCY_SYMBOLS[currency_code],
'currency_rates': CURRENCY_RATES,
}


def current_relative_url():
path = request.full_path.rstrip('?')
return path or url_for('index')


def safe_redirect_target(target, default_endpoint='index'):
if target and target.startswith('/') and not target.startswith('//'):
return target
return url_for(default_endpoint)


@app.route('/set-currency', methods=['GET', 'POST'])
def set_currency():
code = (request.values.get('currency') or 'USD').upper()
Expand Down Expand Up @@ -1274,7 +1286,7 @@ def login():
login_user(user, remember=True)
flash('Welcome back!', 'success')
next_url = request.args.get('next')
return redirect(next_url or url_for('index'))
return redirect(safe_redirect_target(next_url))
flash('Invalid email or password.', 'danger')
return render_template('login.html', form=form)

Expand Down Expand Up @@ -1511,8 +1523,8 @@ def cart_add_form(property_id):
db.session.add(item)
db.session.commit()
flash(f'{prop.name} added to your bag.', 'success')
redirect_to = request.form.get('next') or url_for('bag')
return redirect(redirect_to)
redirect_to = request.form.get('next')
return redirect(safe_redirect_target(redirect_to, 'bag'))


@app.route('/cart/remove/<int:item_id>', methods=['POST'])
Expand Down Expand Up @@ -1573,8 +1585,8 @@ def saved_add_form(property_id):
flash(f'{prop.name} saved to your wishlist.', 'success')
else:
flash(f'{prop.name} is already in your saved list.', 'info')
redirect_to = request.form.get('next') or url_for('saved')
return redirect(redirect_to)
redirect_to = request.form.get('next')
return redirect(safe_redirect_target(redirect_to, 'saved'))


@app.route('/saved/toggle/<int:property_id>', methods=['POST'])
Expand All @@ -1594,7 +1606,7 @@ def saved_toggle_form(property_id):
db.session.add(SavedProperty(user_id=current_user.id, property_id=property_id))
db.session.commit()
flash(f'{prop.name} saved.', 'success')
return redirect(request.form.get('next') or url_for('saved'))
return redirect(safe_redirect_target(request.form.get('next'), 'saved'))


# =====================================================================
Expand Down
2 changes: 1 addition & 1 deletion sites/booking/templates/deals.html
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ <h2>Top discounted stays</h2>
{% if current_user.is_authenticated %}
<form method="POST" action="{{ url_for('saved_toggle_form', property_id=prop.id) }}" style="position:absolute;top:8px;right:8px;">
<input type="hidden" name="csrf_token" value="{{ csrf_token_value }}">
<input type="hidden" name="next" value="{{ request.url }}">
<input type="hidden" name="next" value="{{ current_relative_url() }}">
<button type="submit" class="save-btn">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M20.84 4.61a5.5 5.5 0 00-7.78 0L12 5.67l-1.06-1.06a5.5 5.5 0 00-7.78 7.78l1.06 1.06L12 21.23l7.78-7.78 1.06-1.06a5.5 5.5 0 000-7.78z"/></svg>
</button>
Expand Down
2 changes: 1 addition & 1 deletion sites/booking/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ <h2>Our top-rated stays</h2>
{% if current_user.is_authenticated %}
<form method="POST" action="{{ url_for('saved_toggle_form', property_id=prop.id) }}" style="position:absolute;top:8px;right:8px;">
<input type="hidden" name="csrf_token" value="{{ csrf_token_value }}">
<input type="hidden" name="next" value="{{ request.url }}">
<input type="hidden" name="next" value="{{ current_relative_url() }}">
<button type="submit" class="save-btn">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M20.84 4.61a5.5 5.5 0 00-7.78 0L12 5.67l-1.06-1.06a5.5 5.5 0 00-7.78 7.78l1.06 1.06L12 21.23l7.78-7.78 1.06-1.06a5.5 5.5 0 000-7.78z"/></svg>
</button>
Expand Down
2 changes: 1 addition & 1 deletion sites/booking/templates/property_detail.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
{% if current_user.is_authenticated %}
<form method="POST" action="{{ url_for('saved_toggle_form', property_id=property.id) }}" style="display:inline;">
<input type="hidden" name="csrf_token" value="{{ csrf_token_value }}">
<input type="hidden" name="next" value="{{ request.url }}">
<input type="hidden" name="next" value="{{ current_relative_url() }}">
<button type="submit" class="btn-secondary {% if is_saved %}saved{% endif %}" id="saveBtn">
{% if is_saved %}&#9829; Saved{% else %}&#9825; Save{% endif %}
</button>
Expand Down
Loading