Skip to content

combat269/ModelHubPlatform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelHub Platform — SQL & JDBC Case Study

A relational database backend for a Hugging Face-style Model Hub platform, built with Java and SQL. The project models how machine learning models, datasets, users, organizations, runs, results, and publications interact — and implements 20 analytical queries against that schema using JDBC and an embedded H2 database.


What This Project Does

The platform tracks the full lifecycle of ML model experimentation:

  • Organizations release models across multiple versions
  • Models are designed for one or more tasks (e.g., Vulnerability Detection, Code Search)
  • Users upload datasets, run model versions against them, and record results (accuracy, F1, hyperparameter configs)
  • Results get published in publications
  • Users follow each other and build reputation scores

All of this is queried through 20 implemented Java methods backed by hand-written SQL.


Tech Stack

Layer Technology
Language Java 14
Database H2 (embedded, in-memory)
DB Access JDBC (PreparedStatement, Statement)
Build Maven
Schema Relational — 14 tables with FK constraints

Database Schema

Users ──────────────── Profiles
  │
  ├──── follows (self-join)
  ├──── uploads ──────── Datasets
  ├──── runs ─────────── ModelVersions ── Models ── Organizations
  │         └─────────── Datasets               └── designed_for ── Tasks
  └──── Results ──────── runs (FK)
              └── includes ──── Publications

14 tables: Users, Organizations, Tasks, Datasets, Publications, Profiles, follows, Models, uploads, ModelVersions, designed_for, runs, Results, includes

Full schema with all primary/foreign key constraints is defined in ModelHubPlatform.java → createTables().


Project Structure

ModelHubPlatform/
├── pom.xml
├── output/
│   └── Output.txt                  ← generated query results
└── src/main/
    ├── java/ceng/ceng351/ModelHubPlatform/
    │   ├── ModelHubPlatform.java   ← core implementation (all SQL logic)
    │   ├── IModelHubPlatform.java  ← interface defining all methods
    │   ├── Evaluation.java         ← test runner, writes Output.txt
    │   ├── FileOperations.java     ← reads seed data from .txt files
    │   ├── QueryResult.java        ← result container classes
    │   ├── User.java               ─┐
    │   ├── Organization.java        │
    │   ├── Profile.java             │
    │   ├── follow.java              │  entity model classes
    │   ├── Model.java               │
    │   ├── ModelVersion.java        │
    │   ├── Task.java                │
    │   ├── designed_for.java        │
    │   ├── Dataset.java             │
    │   ├── upload.java              │
    │   ├── run.java                 │
    │   ├── Result.java              │
    │   ├── Publication.java         │
    │   └── include.java            ─┘
    └── resources/data/
        ├── Users.txt
        ├── Organizations.txt
        ├── Profiles.txt
        ├── follows.txt
        ├── Models.txt
        ├── ModelVersions.txt
        ├── Tasks.txt
        ├── designed_for.txt
        ├── Datasets.txt
        ├── uploads.txt
        ├── runs.txt
        ├── Results.txt
        ├── Publications.txt
        └── includes.txt

How to Run

Prerequisites: Java 14+, Maven 3.x

# Clone the repo
git clone https://github.com/YOUR_USERNAME/ModelHubPlatform.git
cd ModelHubPlatform

# Build and run
mvn exec:java "-Dexec.mainClass=ceng.ceng351.ModelHubPlatform.Evaluation"

Results are written to output/Output.txt.

Optional — inspect the live database while the program runs:
Open http://localhost:8082 in your browser and connect with:

  • JDBC URL: jdbc:h2:tcp://localhost/mem:ModelHubPlatformdb
  • Username: sa | Password: (empty)

Implemented Queries (20 Tasks)

# Method Description
1 createTables() Create all 14 tables with PK/FK constraints in dependency order
2 insert*() Batch-insert seed data for all 14 tables via PreparedStatement
3 getUsersWithoutProfiles() LEFT JOIN to find users with no profile entry
4 decreaseReputationForMissingProfiles() Conditional UPDATE — deduct 10 pts if score ≥ 10
5 getUsersByBioKeywords() Case-insensitive LIKE search for engineer/scientist/student
6 getOrganizationsWithNoReleasedModelsAndLowRating() NOT EXISTS + rating filter
7 deleteOrganizationsWithNoReleasedModelsAndLowRating() Cascading delete with transaction rollback safety
8 getModelPrimaryTaskInfo() Aggregation to verify one primary task per model
9 getUserPopularityScore() Derived metric: followers − followees, top 20
10 getComprehensiveModelInfo() CTE to resolve latest model version + join across 4 tables
11 getDatasetStatisticsByModality() GROUP BY with AVG and ROUND
12 getLargeModelVersionsByDateRange() LIKE '%B' size filter + BETWEEN date range
13 getDatasetsWithMaxUploadCount() Nested subquery to find max upload frequency
14 getCompleteDatasets() Relational division — datasets covering every existing role
15 getUsersCreatorOrContributorButNotValidator() EXISTS / NOT EXISTS role logic
16 getUsersWhoRanAllVersionsOfModels() CTE-based relational division across model versions
17 getRunTypeStatistics() JOIN + GROUP BY run type for result counts and avg F1
18 getPublicationsUsingDataset() Multi-join publication lookup with accuracy ≥ 0.70 filter
19 getTopTenHighlyReputedUsers() Composite score from uploads + publications + reputation
20 getVulnerabilityDetectionPublications() Full 7-table join with task-specific and quality filters

Key SQL Patterns Used

Relational division (Task 14, 16) — finding rows that satisfy all conditions in a set, implemented with double NOT EXISTS:

-- Datasets that cover every role in the system
WHERE NOT EXISTS (
  SELECT 1 FROM (SELECT DISTINCT role FROM uploads) roles
  WHERE NOT EXISTS (
    SELECT 1 FROM uploads u
    WHERE u.DatasetID = d.DatasetID AND u.role = roles.role
  )
)

CTE for latest version (Task 10) — avoiding correlated subqueries in the SELECT clause:

WITH latest_ver AS (
  SELECT mv.ModelID, mv.version_no, mv.version_date
  FROM ModelVersions mv
  JOIN (SELECT ModelID, MAX(version_date) AS max_dt FROM ModelVersions GROUP BY ModelID) md
    ON mv.ModelID = md.ModelID AND mv.version_date = md.max_dt
  ...
)

Composite score aggregation (Task 19) — combining counts from different tables:

COALESCE(owner_uploads.cnt, 0) + COALESCE(pub_results.cnt, 0) + u.reputation_score AS user_score

About

Relational database backend for a Model Hub platform

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages