Skip to content

refactor: restructure project for production readiness#19

Merged
spideystreet merged 293 commits intodevelopfrom
refactor/project-structure
Mar 6, 2026
Merged

refactor: restructure project for production readiness#19
spideystreet merged 293 commits intodevelopfrom
refactor/project-structure

Conversation

@spideystreet
Copy link
Copy Markdown
Collaborator

Summary

Full restructure of the OST Linker codebase (278 commits, 190 files changed) covering every layer of the stack:

  • Pipeline & Dagster — complete asset graph with ingestion, classification, embedding, matching, and sync groups; jobs, schedules (5×/day), and sensors; custom PandasPostgresIOManager
  • dbt — staging → intermediate → marts model hierarchy with proper naming (stg_, int_, fct_, match_); SQLFluff-clean SQL; schema tests and docs
  • Go services — scraper and fetcher binaries invoked as subprocesses by Dagster assets
  • ML — SentenceTransformer embeddings (384-dim, MiniLM-L6-v2) + FastText language detection + LLM classification via OpenRouter
  • Database — Prisma schema across 4 PostgreSQL schemas (public, github, ml, match) with pgvector
  • Docker — 3-stage build (Go builder → Python builder → runtime), CPU-only torch (~2GB savings), non-root user, healthcheck
  • CI/CD — reusable quality-checks.yml with lint, format, type check, unit tests (80% coverage), Go vet+build, Docker build, Prisma validate, pip-audit, gitleaks, dbt parse, docs submodule validation
  • Docs — git submodule pointing to ost-docs, automated sync workflow to create PRs on ost-docs when docs change
  • Repo hygiene — cleaned .gitignore, untracked 131MB FastText binary, tracked utility scripts, concise README

Test plan

  • ruff check src/ — no errors
  • ruff format --check src/ — formatted
  • mypy src/ — passes strict mode
  • pytest -m unit — 20/20 tests, 100% coverage
  • go vet + go build — both scraper and fetcher compile
  • prisma validate — schema valid
  • pip-audit — no vulnerabilities (dbt-common upgraded to 1.37.3)
  • docker build — image builds successfully with CPU-only torch
  • CI pipeline runs on PR creation

🤖 Generated with Claude Code

spideystreet and others added 11 commits March 4, 2026 17:59
The database service is only needed for local development — staging uses
an external Postgres instance. Move it to docker-compose.override.yml
which is auto-loaded by `docker compose up` locally but skipped in
staging with `docker compose -f docker-compose.yml up`.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
…kflow

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
…ck start

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Remove obsolete ignore rules (Django, Flask, Celery, etc.), untrack
models/lid.176.ftz (should be downloaded at build time, not stored in git),
and update models/README.md with current resource paths.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
- go-check: vet + build for scraper and fetcher
- docker-build: build image without push to catch Dockerfile errors early
- prisma-validate: validate schema without a database
- security: pip-audit for dependency vulnerabilities + gitleaks for secret leaks
- quality: add --cov-fail-under=80 coverage threshold

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Installs torch from the CPU-only index before the main pip install,
then strips torch/nvidia/triton/cuda lines from requirements.txt
so pip doesn't re-download the CUDA variant.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
@spideystreet spideystreet self-assigned this Mar 5, 2026
spideystreet and others added 15 commits March 5, 2026 16:38
Add known-third-party for dagster packages to prevent ruff from
misdetecting the local dagster/ runtime directory as a first-party
package, causing import order differences between local and CI.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
- Add dummy DATABASE_URL for Prisma validate step
- Remove SQLFluff lint from CI (dbt templater needs DB; dbt parse suffices)
- Make gitleaks continue-on-error when license is missing
- Skip docs-sync PR creation when no new commits vs main

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Use gitleaks CLI directly instead of gitleaks-action which requires
a paid license. Scans the working tree (--no-git) to avoid false
positives from old commits.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
- New user_recommendation_job: embed users + dbt match models + public sync
- New user_recommendation_schedule: every 2h (Europe/Paris)
- Reduce run_all_schedule from 5x/day to 1x/day at 3 AM
  (scraping new projects doesn't need to be frequent;
   user recommendations do)

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
…d match models

- Rename @@Map("verification_token") to @@Map("verification") to align with backend
- Remove unused ProjectEmbedding model and its relation on Project
- Add MatchGlobalRecommendation and MatchUserRecommendation (dbt-managed, read-only)
- Add migration for all three changes

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Move prisma schema, migrations and seeds to opensource-together/prisma
repo and reference it as a git submodule (same pattern as docs/).

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
- Add OST_PRISMA_TOKEN secret to quality-checks and caller workflows
- Update prisma-validate to checkout with submodule token
- Add prisma-submodule SHA check (mirrors docs-submodule pattern)
- Add sync-prisma-submodule.yml to auto-PR schema changes to prisma repo

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Prisma stays as a regular directory in ost-linker (source of truth).
Schema changes will be synced to ost-backend via CI workflow instead.

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
- Remove prisma-submodule check job and OST_PRISMA_TOKEN
- Revert prisma-validate to simple checkout (no submodule)
- Replace sync-prisma-submodule.yml with sync-prisma-backend.yml
  that copies prisma/ to ost-backend and creates a PR on changes

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Add claude.yml (PR/issue assistant via @claude mention) and
claude-code-review.yml (auto code review on PR events).

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
…flows

- pipeline-doctor: Dagster pipeline debugging (opus, memory)
- dbt-analyst: dbt model review and debugging (sonnet, memory)
- security-auditor: security audit before PRs (opus, stateless)
- go-service-reviewer: Go scraper/fetcher review (sonnet, memory)

Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
Co-Authored-By: spidecode-bot <263227865+spicode-bot@users.noreply.github.com>
@spideystreet spideystreet changed the base branch from staging to develop March 6, 2026 14:54
@spideystreet spideystreet merged commit facf2bf into develop Mar 6, 2026
8 of 11 checks passed
@spideystreet spideystreet deleted the refactor/project-structure branch March 6, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant