A relational database backend for a Hugging Face-style Model Hub platform, built with Java and SQL. The project models how machine learning models, datasets, users, organizations, runs, results, and publications interact — and implements 20 analytical queries against that schema using JDBC and an embedded H2 database.
The platform tracks the full lifecycle of ML model experimentation:
- Organizations release models across multiple versions
- Models are designed for one or more tasks (e.g., Vulnerability Detection, Code Search)
- Users upload datasets, run model versions against them, and record results (accuracy, F1, hyperparameter configs)
- Results get published in publications
- Users follow each other and build reputation scores
All of this is queried through 20 implemented Java methods backed by hand-written SQL.
| Layer | Technology |
|---|---|
| Language | Java 14 |
| Database | H2 (embedded, in-memory) |
| DB Access | JDBC (PreparedStatement, Statement) |
| Build | Maven |
| Schema | Relational — 14 tables with FK constraints |
Users ──────────────── Profiles
│
├──── follows (self-join)
├──── uploads ──────── Datasets
├──── runs ─────────── ModelVersions ── Models ── Organizations
│ └─────────── Datasets └── designed_for ── Tasks
└──── Results ──────── runs (FK)
└── includes ──── Publications
14 tables:
Users, Organizations, Tasks, Datasets, Publications, Profiles, follows, Models, uploads, ModelVersions, designed_for, runs, Results, includes
Full schema with all primary/foreign key constraints is defined in ModelHubPlatform.java → createTables().
ModelHubPlatform/
├── pom.xml
├── output/
│ └── Output.txt ← generated query results
└── src/main/
├── java/ceng/ceng351/ModelHubPlatform/
│ ├── ModelHubPlatform.java ← core implementation (all SQL logic)
│ ├── IModelHubPlatform.java ← interface defining all methods
│ ├── Evaluation.java ← test runner, writes Output.txt
│ ├── FileOperations.java ← reads seed data from .txt files
│ ├── QueryResult.java ← result container classes
│ ├── User.java ─┐
│ ├── Organization.java │
│ ├── Profile.java │
│ ├── follow.java │ entity model classes
│ ├── Model.java │
│ ├── ModelVersion.java │
│ ├── Task.java │
│ ├── designed_for.java │
│ ├── Dataset.java │
│ ├── upload.java │
│ ├── run.java │
│ ├── Result.java │
│ ├── Publication.java │
│ └── include.java ─┘
└── resources/data/
├── Users.txt
├── Organizations.txt
├── Profiles.txt
├── follows.txt
├── Models.txt
├── ModelVersions.txt
├── Tasks.txt
├── designed_for.txt
├── Datasets.txt
├── uploads.txt
├── runs.txt
├── Results.txt
├── Publications.txt
└── includes.txt
Prerequisites: Java 14+, Maven 3.x
# Clone the repo
git clone https://github.com/YOUR_USERNAME/ModelHubPlatform.git
cd ModelHubPlatform
# Build and run
mvn exec:java "-Dexec.mainClass=ceng.ceng351.ModelHubPlatform.Evaluation"Results are written to output/Output.txt.
Optional — inspect the live database while the program runs:
Open http://localhost:8082 in your browser and connect with:
- JDBC URL:
jdbc:h2:tcp://localhost/mem:ModelHubPlatformdb - Username:
sa| Password: (empty)
| # | Method | Description |
|---|---|---|
| 1 | createTables() |
Create all 14 tables with PK/FK constraints in dependency order |
| 2 | insert*() |
Batch-insert seed data for all 14 tables via PreparedStatement |
| 3 | getUsersWithoutProfiles() |
LEFT JOIN to find users with no profile entry |
| 4 | decreaseReputationForMissingProfiles() |
Conditional UPDATE — deduct 10 pts if score ≥ 10 |
| 5 | getUsersByBioKeywords() |
Case-insensitive LIKE search for engineer/scientist/student |
| 6 | getOrganizationsWithNoReleasedModelsAndLowRating() |
NOT EXISTS + rating filter |
| 7 | deleteOrganizationsWithNoReleasedModelsAndLowRating() |
Cascading delete with transaction rollback safety |
| 8 | getModelPrimaryTaskInfo() |
Aggregation to verify one primary task per model |
| 9 | getUserPopularityScore() |
Derived metric: followers − followees, top 20 |
| 10 | getComprehensiveModelInfo() |
CTE to resolve latest model version + join across 4 tables |
| 11 | getDatasetStatisticsByModality() |
GROUP BY with AVG and ROUND |
| 12 | getLargeModelVersionsByDateRange() |
LIKE '%B' size filter + BETWEEN date range |
| 13 | getDatasetsWithMaxUploadCount() |
Nested subquery to find max upload frequency |
| 14 | getCompleteDatasets() |
Relational division — datasets covering every existing role |
| 15 | getUsersCreatorOrContributorButNotValidator() |
EXISTS / NOT EXISTS role logic |
| 16 | getUsersWhoRanAllVersionsOfModels() |
CTE-based relational division across model versions |
| 17 | getRunTypeStatistics() |
JOIN + GROUP BY run type for result counts and avg F1 |
| 18 | getPublicationsUsingDataset() |
Multi-join publication lookup with accuracy ≥ 0.70 filter |
| 19 | getTopTenHighlyReputedUsers() |
Composite score from uploads + publications + reputation |
| 20 | getVulnerabilityDetectionPublications() |
Full 7-table join with task-specific and quality filters |
Relational division (Task 14, 16) — finding rows that satisfy all conditions in a set, implemented with double NOT EXISTS:
-- Datasets that cover every role in the system
WHERE NOT EXISTS (
SELECT 1 FROM (SELECT DISTINCT role FROM uploads) roles
WHERE NOT EXISTS (
SELECT 1 FROM uploads u
WHERE u.DatasetID = d.DatasetID AND u.role = roles.role
)
)CTE for latest version (Task 10) — avoiding correlated subqueries in the SELECT clause:
WITH latest_ver AS (
SELECT mv.ModelID, mv.version_no, mv.version_date
FROM ModelVersions mv
JOIN (SELECT ModelID, MAX(version_date) AS max_dt FROM ModelVersions GROUP BY ModelID) md
ON mv.ModelID = md.ModelID AND mv.version_date = md.max_dt
...
)Composite score aggregation (Task 19) — combining counts from different tables:
COALESCE(owner_uploads.cnt, 0) + COALESCE(pub_results.cnt, 0) + u.reputation_score AS user_score