ModelHub Platform — SQL & JDBC Case Study

A relational database backend for a Hugging Face-style Model Hub platform, built with Java and SQL. The project models how machine learning models, datasets, users, organizations, runs, results, and publications interact — and implements 20 analytical queries against that schema using JDBC and an embedded H2 database.

What This Project Does

The platform tracks the full lifecycle of ML model experimentation:

Organizations release models across multiple versions
Models are designed for one or more tasks (e.g., Vulnerability Detection, Code Search)
Users upload datasets, run model versions against them, and record results (accuracy, F1, hyperparameter configs)
Results get published in publications
Users follow each other and build reputation scores

All of this is queried through 20 implemented Java methods backed by hand-written SQL.

Tech Stack

Layer	Technology
Language	Java 14
Database	H2 (embedded, in-memory)
DB Access	JDBC (`PreparedStatement`, `Statement`)
Build	Maven
Schema	Relational — 14 tables with FK constraints

Database Schema

Users ──────────────── Profiles
  │
  ├──── follows (self-join)
  ├──── uploads ──────── Datasets
  ├──── runs ─────────── ModelVersions ── Models ── Organizations
  │         └─────────── Datasets               └── designed_for ── Tasks
  └──── Results ──────── runs (FK)
              └── includes ──── Publications

14 tables: Users, Organizations, Tasks, Datasets, Publications, Profiles, follows, Models, uploads, ModelVersions, designed_for, runs, Results, includes

Full schema with all primary/foreign key constraints is defined in ModelHubPlatform.java → createTables().

Project Structure

ModelHubPlatform/
├── pom.xml
├── output/
│   └── Output.txt                  ← generated query results
└── src/main/
    ├── java/ceng/ceng351/ModelHubPlatform/
    │   ├── ModelHubPlatform.java   ← core implementation (all SQL logic)
    │   ├── IModelHubPlatform.java  ← interface defining all methods
    │   ├── Evaluation.java         ← test runner, writes Output.txt
    │   ├── FileOperations.java     ← reads seed data from .txt files
    │   ├── QueryResult.java        ← result container classes
    │   ├── User.java               ─┐
    │   ├── Organization.java        │
    │   ├── Profile.java             │
    │   ├── follow.java              │  entity model classes
    │   ├── Model.java               │
    │   ├── ModelVersion.java        │
    │   ├── Task.java                │
    │   ├── designed_for.java        │
    │   ├── Dataset.java             │
    │   ├── upload.java              │
    │   ├── run.java                 │
    │   ├── Result.java              │
    │   ├── Publication.java         │
    │   └── include.java            ─┘
    └── resources/data/
        ├── Users.txt
        ├── Organizations.txt
        ├── Profiles.txt
        ├── follows.txt
        ├── Models.txt
        ├── ModelVersions.txt
        ├── Tasks.txt
        ├── designed_for.txt
        ├── Datasets.txt
        ├── uploads.txt
        ├── runs.txt
        ├── Results.txt
        ├── Publications.txt
        └── includes.txt

How to Run

Prerequisites: Java 14+, Maven 3.x

# Clone the repo
git clone https://github.com/YOUR_USERNAME/ModelHubPlatform.git
cd ModelHubPlatform

# Build and run
mvn exec:java "-Dexec.mainClass=ceng.ceng351.ModelHubPlatform.Evaluation"

Results are written to output/Output.txt.

Optional — inspect the live database while the program runs:
Open http://localhost:8082 in your browser and connect with:

JDBC URL: jdbc:h2:tcp://localhost/mem:ModelHubPlatformdb
Username: sa | Password: (empty)

Implemented Queries (20 Tasks)

#	Method	Description
1	`createTables()`	Create all 14 tables with PK/FK constraints in dependency order
2	`insert*()`	Batch-insert seed data for all 14 tables via `PreparedStatement`
3	`getUsersWithoutProfiles()`	LEFT JOIN to find users with no profile entry
4	`decreaseReputationForMissingProfiles()`	Conditional UPDATE — deduct 10 pts if score ≥ 10
5	`getUsersByBioKeywords()`	Case-insensitive `LIKE` search for engineer/scientist/student
6	`getOrganizationsWithNoReleasedModelsAndLowRating()`	`NOT EXISTS` + rating filter
7	`deleteOrganizationsWithNoReleasedModelsAndLowRating()`	Cascading delete with transaction rollback safety
8	`getModelPrimaryTaskInfo()`	Aggregation to verify one primary task per model
9	`getUserPopularityScore()`	Derived metric: followers − followees, top 20
10	`getComprehensiveModelInfo()`	CTE to resolve latest model version + join across 4 tables
11	`getDatasetStatisticsByModality()`	`GROUP BY` with `AVG` and `ROUND`
12	`getLargeModelVersionsByDateRange()`	`LIKE '%B'` size filter + `BETWEEN` date range
13	`getDatasetsWithMaxUploadCount()`	Nested subquery to find max upload frequency
14	`getCompleteDatasets()`	Relational division — datasets covering every existing role
15	`getUsersCreatorOrContributorButNotValidator()`	`EXISTS` / `NOT EXISTS` role logic
16	`getUsersWhoRanAllVersionsOfModels()`	CTE-based relational division across model versions
17	`getRunTypeStatistics()`	`JOIN` + `GROUP BY` run type for result counts and avg F1
18	`getPublicationsUsingDataset()`	Multi-join publication lookup with accuracy ≥ 0.70 filter
19	`getTopTenHighlyReputedUsers()`	Composite score from uploads + publications + reputation
20	`getVulnerabilityDetectionPublications()`	Full 7-table join with task-specific and quality filters

Key SQL Patterns Used

Relational division (Task 14, 16) — finding rows that satisfy all conditions in a set, implemented with double NOT EXISTS:

-- Datasets that cover every role in the system
WHERE NOT EXISTS (
  SELECT 1 FROM (SELECT DISTINCT role FROM uploads) roles
  WHERE NOT EXISTS (
    SELECT 1 FROM uploads u
    WHERE u.DatasetID = d.DatasetID AND u.role = roles.role
  )
)

CTE for latest version (Task 10) — avoiding correlated subqueries in the SELECT clause:

WITH latest_ver AS (
  SELECT mv.ModelID, mv.version_no, mv.version_date
  FROM ModelVersions mv
  JOIN (SELECT ModelID, MAX(version_date) AS max_dt FROM ModelVersions GROUP BY ModelID) md
    ON mv.ModelID = md.ModelID AND mv.version_date = md.max_dt
  ...
)

Composite score aggregation (Task 19) — combining counts from different tables:

COALESCE(owner_uploads.cnt, 0) + COALESCE(pub_results.cnt, 0) + u.reputation_score AS user_score

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
output		output
src/main		src/main
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelHub Platform — SQL & JDBC Case Study

What This Project Does

Tech Stack

Database Schema

Project Structure

How to Run

Implemented Queries (20 Tasks)

Key SQL Patterns Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ModelHub Platform — SQL & JDBC Case Study

What This Project Does

Tech Stack

Database Schema

Project Structure

How to Run

Implemented Queries (20 Tasks)

Key SQL Patterns Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages