Skip to content

Add Ohio State University mirror site (port 40015)#12

Open
richard-peng-xia wants to merge 1 commit into
aiming-lab:mainfrom
richard-peng-xia:add-osu-mirror
Open

Add Ohio State University mirror site (port 40015)#12
richard-peng-xia wants to merge 1 commit into
aiming-lab:mainfrom
richard-peng-xia:add-osu-mirror

Conversation

@richard-peng-xia
Copy link
Copy Markdown

TL;DR

Adds a fully functional osu.edu mirror site to WebHarbor at port 40015, with 20 benchmark tasks covering programs, news, events, faculty, research centers, athletics, and admissions.

Motivation

Ohio State University is one of the largest public universities in the US (60,000+ students), with a highly visited portal spanning academics, research, health sciences, and Big Ten athletics. It covers a domain — large public flagship university browsing — not represented in the existing 15 sites, and offers uniquely diverse multi-step tasks: professional degree programs (MD, JD, PharmD, DVM, OD), a full athletics section with 36 varsity sports, and a multi-campus structure (Columbus, Lima, Marion, Mansfield, Newark, Wooster).

Design

Flask application (sites/osu/app.py)

Ten SQLAlchemy models: User, College, Department, Program, NewsArticle, Event, ResearchCenter, Faculty, AthleticTeam, Bookmark. All seeded idempotently (seed function gated on College.query.first()).

Route coverage mirrors the real site's navigation:

  • Homepage with featured news, upcoming events, and institutional stats
  • News listing + article detail with category/search filters
  • Academics overview listing all 16 colleges/schools
  • Programs listing + detail with degree type, college, and online filters
  • Events listing + detail with category, campus, and date filters
  • Research centers listing + detail pages
  • Departments listing (grouped by college) + detail with faculty roster
  • Faculty listing + profile pages with department filter
  • Athletics overview + team detail pages (unique to OSU vs other sites)
  • Admissions overview with undergraduate/graduate/professional/online tabs
  • About page with real OSU statistics
  • Unified search across programs, news, events, and faculty
  • Auth: login, register, logout, account (bookmarks)

Seed database

16 real OSU colleges (Arts & Sciences, Fisher Business, Moritz Law, Medicine, Veterinary Medicine, ...), 20 degree programs (including MD, JD, PharmD, DVM, OD — unique professional degrees), 20 news articles (2023–2025, 7 categories), 16 events (multiple campuses), 15 research centers (TDAI, Byrd Alzheimer's Center, Ohio Supercomputer Center, ...), 15 faculty, 26 athletic teams (all real OSU varsity sports), and 4 benchmark users (alice/bob/carol/dave @osu.edu, password: test1234). Seed DB generated at image build time via seed_data.py.

Templates

25 Jinja2 templates styled with OSU Scarlet (#BB0000) and Gray (#666666), modeled on real osu.edu layout: main nav (About / Academics / Research / Impact / Athletics), card-grid listings, detail pages with sidebars, paginated results (20 items/page). Includes an athletics section not present in other university mirrors.

Benchmark tasks (tasks.jsonl)

20 tasks (IDs Ohio State University--0 through Ohio State University--19) covering: professional degree program lookup (PharmD, DVM, OD), athletics records and team info, multi-campus event filtering, research center browsing, faculty research lookup, admissions requirements, and 3+ multi-step reasoning tasks.

Verification

Check Result
py_compile sites/osu/app.py
All 12 main routes → HTTP 200 (werkzeug test client)
Seed: 16 colleges, 20 programs, 20 news, 16 events, 15 faculty, 15 centers, 26 teams, 4 users
Seed idempotent (second run produces no duplicate rows)
./scripts/build.sh webharbor:dev requires Docker daemon — not run locally
POST /reset/osu + md5sum match requires Docker daemon — not run locally

HuggingFace assets

OSU has no scraped image assets — all data is code-generated from seed_data.py. The instance_seed/osu.db is built directly inside the Docker image via a RUN step in the Dockerfile, eliminating the need for a HuggingFace tarball. No .assets-revision bump is required.

Registration

Site registered in all three required locations:

  • websyn_start.sh SITES=(...) — index 15, port 40015
  • control_server.py SITES=[...]
  • Dockerfile EXPOSE 40000-40015

🤖 Generated with Claude Code

Adds a fully functional osu.edu mirror as the 16th WebHarbor site.

Models: User, College, Department, Program, NewsArticle, Event,
ResearchCenter, Faculty, AthleticTeam, Bookmark.
Routes: homepage, news, programs, events, research, departments,
faculty, athletics, admissions, about, unified search, auth.
Templates: 25 Jinja2 templates in OSU Scarlet (#BB0000) / Gray.
Tasks: 20 benchmark tasks (WebVoyager schema).

Seed: 16 real OSU colleges, 20 programs (incl. MD/JD/PharmD/DVM/OD),
20 news articles, 16 events, 15 faculty, 15 research centers,
26 athletic teams, 4 benchmark users (alice/bob/carol/dave @osu.edu,
password: test1234). Seed DB generated at Docker build time via
seed_data.py — no HuggingFace asset required.

Registers osu in websyn_start.sh, control_server.py, and Dockerfile.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant