Add Ohio State University mirror site (port 40015)#12
Open
richard-peng-xia wants to merge 1 commit into
Open
Conversation
Adds a fully functional osu.edu mirror as the 16th WebHarbor site. Models: User, College, Department, Program, NewsArticle, Event, ResearchCenter, Faculty, AthleticTeam, Bookmark. Routes: homepage, news, programs, events, research, departments, faculty, athletics, admissions, about, unified search, auth. Templates: 25 Jinja2 templates in OSU Scarlet (#BB0000) / Gray. Tasks: 20 benchmark tasks (WebVoyager schema). Seed: 16 real OSU colleges, 20 programs (incl. MD/JD/PharmD/DVM/OD), 20 news articles, 16 events, 15 faculty, 15 research centers, 26 athletic teams, 4 benchmark users (alice/bob/carol/dave @osu.edu, password: test1234). Seed DB generated at Docker build time via seed_data.py — no HuggingFace asset required. Registers osu in websyn_start.sh, control_server.py, and Dockerfile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Adds a fully functional osu.edu mirror site to WebHarbor at port 40015, with 20 benchmark tasks covering programs, news, events, faculty, research centers, athletics, and admissions.
Motivation
Ohio State University is one of the largest public universities in the US (60,000+ students), with a highly visited portal spanning academics, research, health sciences, and Big Ten athletics. It covers a domain — large public flagship university browsing — not represented in the existing 15 sites, and offers uniquely diverse multi-step tasks: professional degree programs (MD, JD, PharmD, DVM, OD), a full athletics section with 36 varsity sports, and a multi-campus structure (Columbus, Lima, Marion, Mansfield, Newark, Wooster).
Design
Flask application (
sites/osu/app.py)Ten SQLAlchemy models:
User,College,Department,Program,NewsArticle,Event,ResearchCenter,Faculty,AthleticTeam,Bookmark. All seeded idempotently (seed function gated onCollege.query.first()).Route coverage mirrors the real site's navigation:
Seed database
16 real OSU colleges (Arts & Sciences, Fisher Business, Moritz Law, Medicine, Veterinary Medicine, ...), 20 degree programs (including MD, JD, PharmD, DVM, OD — unique professional degrees), 20 news articles (2023–2025, 7 categories), 16 events (multiple campuses), 15 research centers (TDAI, Byrd Alzheimer's Center, Ohio Supercomputer Center, ...), 15 faculty, 26 athletic teams (all real OSU varsity sports), and 4 benchmark users (alice/bob/carol/dave @osu.edu, password: test1234). Seed DB generated at image build time via
seed_data.py.Templates
25 Jinja2 templates styled with OSU Scarlet (
#BB0000) and Gray (#666666), modeled on real osu.edu layout: main nav (About / Academics / Research / Impact / Athletics), card-grid listings, detail pages with sidebars, paginated results (20 items/page). Includes an athletics section not present in other university mirrors.Benchmark tasks (
tasks.jsonl)20 tasks (IDs
Ohio State University--0throughOhio State University--19) covering: professional degree program lookup (PharmD, DVM, OD), athletics records and team info, multi-campus event filtering, research center browsing, faculty research lookup, admissions requirements, and 3+ multi-step reasoning tasks.Verification
py_compile sites/osu/app.py./scripts/build.sh webharbor:devPOST /reset/osu+ md5sum matchHuggingFace assets
OSU has no scraped image assets — all data is code-generated from
seed_data.py. Theinstance_seed/osu.dbis built directly inside the Docker image via aRUNstep in the Dockerfile, eliminating the need for a HuggingFace tarball. No.assets-revisionbump is required.Registration
Site registered in all three required locations:
websyn_start.shSITES=(...)— index 15, port 40015control_server.pySITES=[...]DockerfileEXPOSE 40000-40015🤖 Generated with Claude Code