Skip to content

PeaceDeadTS/dataset-canvas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

279 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dataset Canvas

TypeScript React Node.js Express.js MariaDB

Dataset Canvas is a comprehensive web application inspired by Hugging Face's Data Studio, designed for professional dataset management and visualization. Built with modern technologies and international accessibility in mind, it provides a robust platform for managing image datasets with advanced features like multilingual support, personalized theme system, granular permissions, discussion system, comprehensive administrative panel, enhanced authentication architecture, role-based access control, CSV data imports, and intelligent image metadata processing.

πŸ†• Latest Updates (October 2025)

🎨 Personalized Theme System (NEW)

  • Database-persisted preferences: Each user's theme settings stored in database, eliminating cross-user conflicts
  • Three theme modes: Light, Dark, and System (follows OS preferences) with smooth transitions
  • Professional dark theme: Inspired by GitHub Dark, Discord, and VS Code with balanced contrast for reduced eye strain
  • Hybrid storage: Authenticated users get server-side persistence, anonymous users use localStorage fallback
  • Settings dialog: Modern interface for theme selection and preferences with extensible architecture
  • Real-time synchronization: Theme changes apply immediately with automatic API sync

πŸ†• Previous Major Updates (2025)

πŸ“Š Advanced Dataset Analytics (NEW)

  • Comprehensive Statistics API: New /api/datasets/:id/statistics endpoint providing detailed dataset analysis
  • Resolution Distribution Analysis: Real-time breakdown of image resolutions with percentage calculations
  • Interactive Statistics Display: Expandable resolution lists with progress bars and visual feedback
  • Training Compatibility Validation: Automatic check for neural network training requirements (64px divisibility)
  • Prompt Length Analytics: Average prompt length calculation for text-to-image datasets
  • Smart Visual Indicators: Color-coded compatibility warnings and success messages

⚑ Administrative Panel

  • New Admin Interface: Comprehensive /admin panel exclusively for administrators
  • Complete User Management: Change roles, delete users, view system statistics
  • Dataset Administration: Force delete any dataset with proper audit logging
  • Security Features: Built-in safeguards prevent admin account lockouts

πŸ” Enhanced Authentication

  • Centralized Auth Context: Unified user state management across all components
  • Automatic Token Management: Axios interceptors handle JWT tokens seamlessly
  • Universal Dataset Creation: All authenticated users can now create datasets
  • Smart Error Handling: Automatic logout and redirect on authentication failures

πŸ› Critical Bug Fixes

  • Fixed URL Duplication: Resolved /api/api/datasets 404 errors from conflicting configurations
  • React Navigation Error: Fixed logout errors by implementing proper React Router navigation
  • Authentication Sync: Resolved user state inconsistencies between components

✨ Features

🌍 Internationalization & Accessibility

  • Multi-language support: Complete interface translation for English (default) and Russian with 100% coverage
  • Smart language detection: Automatic browser language detection with persistent user preferences
  • Real-time language switching: Dynamic language changes without page refresh
  • TypeScript integration: Type-safe translations with full IDE support
  • Modern language selector: Intuitive language switcher with flag indicators
  • Quality assurance: Comprehensive localization testing with systematic bug fixes for missing translation keys

πŸ‘₯ User Management System

  • Comprehensive user directory (/users): Complete listing of all system users
  • Role-based filtering: Filter users by role through navigation menu (/users?role=ADMIN, /users?role=DEVELOPER, /users?role=USER)
  • Advanced sorting capabilities: Sort users by name, registration date, or public dataset count
  • User profile cards: Professional display with avatars, roles, and statistics
  • Direct profile navigation: Click-through to individual user profiles
  • Complete role coverage: Navigation menu includes options for all user roles (Administrators, Developers, Regular Users)

πŸ“Š Advanced Dataset Discovery

  • All datasets page (/datasets): Unified browsing interface for all available datasets
  • Enhanced three-tab system: Separate tabs for "Public" (all public datasets), "My Public" (user's own public datasets), and "My Private" (user's private datasets)
  • Intelligent URL state management: Tab selection preserved in URL parameters (?tab=public/my-public/my-private)
  • Multi-criteria sorting: Sort by name, creation date, image count, or author
  • Enhanced filtering: Advanced dataset organization with persistent URL parameters
  • Seamless navigation: Deep linking support with URL state preservation

πŸ” Authentication & Authorization

  • Secure JWT-based authentication with advanced token management and axios interceptors
  • Centralized Authentication Context: Global user state management with React Context
  • Automatic Token Handling: JWT tokens automatically included in all API requests
  • Role-Based Access Control (RBAC):
    • Administrator: Full control over all datasets and users, access to admin panel
    • Developer: Can create datasets and manage their own (public/private)
    • User: Can create datasets and view public content, manage own private datasets
  • First user auto-promotion to Administrator role
  • Smart Session Management: Automatic logout and redirect on authentication failures

⚑ Administrative Panel

  • Comprehensive Admin Interface (/admin): Exclusive administrative control panel for system management
  • User Management System:
    • View all users with advanced filtering and sorting capabilities
    • Change user roles (Administrator, Developer, User) with confirmation dialogs
    • Delete users from the system with proper safety controls
    • Protection against self-modification to prevent admin lockouts
  • Dataset Administration:
    • Force delete any dataset regardless of ownership
    • Comprehensive dataset overview with owner information and statistics
    • Confirmation dialogs and audit logging for all administrative actions
  • Security & Compliance:
    • All administrative actions are logged with detailed audit trails
    • Built-in safeguards prevent system lockout scenarios
    • Professional UI with modern tabbed interface and data tables
  • Full Localization: Complete translation support for all admin features

πŸ“Š Dataset Management & Analytics

  • Universal Dataset Creation: All authenticated users can create and manage datasets
  • Complete CRUD operations for datasets with proper authorization controls
  • Public/Private dataset support with intelligent visibility controls
  • CSV data upload with intelligent parsing (filename, url, width, height, prompt columns)
  • Advanced Dataset Statistics: Comprehensive analytics including:
    • Resolution Distribution: Real-time analysis of image resolutions with percentage breakdowns and interactive expansion
    • Training Compatibility Check: Automatic validation for neural network training (64px divisibility check)
    • Prompt Analytics: Average prompt length calculation for text-to-image datasets
    • Interactive Statistics Display: Expandable resolution lists with progress bars and visual indicators
  • Advanced pagination system with URL parameter support (?p=22, customizable items per page: 10/25/50/100)
  • Smart dataset organization with separate sections for private and public datasets
  • Administrative Override: Admins can manage any dataset regardless of ownership

πŸ–ΌοΈ Advanced Image Data Display

  • Interactive image previews with click-to-expand modals
  • Comprehensive metadata presentation:
    • Smart aspect ratio calculation using GCD (Greatest Common Divisor)
    • Automatic detection of standard ratios (16:9, 4:3, etc.)
    • File extension detection from URLs
    • Clickable image URLs for direct access
  • Optimized table layout with responsive column sizing
  • Sticky header/footer interface - dataset info stays visible while scrolling through images

🎨 Modern User Interface & Navigation

  • Revolutionary navigation system: Organized menu structure with logical grouping (Main, Datasets, Community)
  • Enhanced visual design: Modern dropdown menus with icons, descriptions, and contextual help
  • Unified interface: Consistent navigation throughout the entire application
  • Responsive design with Tailwind CSS and shadcn/ui components
  • Sticky layout system: Header and pagination remain fixed while table scrolls
  • Breadcrumb navigation for easy navigation between views
  • Full-screen utilization for optimal data visualization
  • Loading states and error handling throughout the application

⚑ Performance Optimization

  • Lazy loading implementation: All pages load on-demand using React.lazy() and Suspense
  • Significant bundle reduction: Main bundle size reduced from 504KB to 313KB (37% improvement)
  • Intelligent code splitting: Automatic chunk optimization for faster loading
  • Progressive loading: Improved initial load times with optimized resource distribution

🎨 Personalized Settings & Theming

  • Per-user theme preferences: Database-stored themes ensure consistent experience across devices and sessions
  • Three theme options: Light, Dark, and System (automatically follows OS preferences)
  • Professional dark theme: Carefully crafted color palette inspired by industry leaders (GitHub, Discord, VS Code)
  • Smooth transitions: Instant theme switching without page reload using CSS variables
  • Settings dialog: Modern modal interface with theme selection, language preferences, and future extensibility
  • Hybrid storage approach: Server-side persistence for authenticated users, localStorage for anonymous visitors
  • API-driven: Dedicated REST endpoints (GET/PATCH /api/users/me/settings) for preferences management

πŸ” Granular Permissions System

  • MediaWiki-inspired architecture: Flexible permission system supporting unlimited permission types
  • Many-to-many relationships: User-permission associations with dedicated junction tables
  • Built-in permissions: Comprehensive set including discussion management, caption editing, and content moderation
  • Default permissions: Automatic assignment during registration (read/create/reply to discussions, edit own posts)
  • Administrator override: Admins automatically inherit all permissions without explicit grants
  • Admin panel integration: Full permission management interface with real-time status indicators and grant/revoke actions

πŸ’¬ Discussion System

  • Complete discussion infrastructure: Three TypeORM entities with proper relationships and cascading deletes
  • Full CRUD operations: 10 RESTful API endpoints covering all discussion functionality
  • Nested reply support: Thread-style conversations with quotations and visual hierarchy
  • Edit history tracking: MediaWiki-style diff viewer with color-coded changes for all post edits
  • Moderation tools: Lock/unlock, pin/unpin discussions, soft delete posts with admin controls
  • Permission-based access: Six granular permissions controlling discussion participation
  • Real-time UI: URL-based navigation, expandable threads, clickable usernames, and visual indicators

✏️ Advanced Caption Editing

  • Inline caption editor: React component with textarea, character count, and keyboard shortcuts (Ctrl+Enter, Esc)
  • Complete revision history: All caption changes tracked in database with timestamps and user attribution
  • MediaWiki-style diff viewer: Visual comparison with color-coded additions (green) and deletions (red)
  • Permission-based access: Edit button displayed only for users with edit_caption permission
  • Audit trail: Full history of all modifications for compliance and accountability
  • Real-time updates: Immediate UI refresh after caption edits across dataset views

πŸ“ User Activity Tracking

  • User edit history: Dedicated profile tab (/users/:username?tab=edits) showing all caption modifications
  • Recent changes page: Global activity monitor (/recent-changes) for site-wide edit tracking
  • Comprehensive metadata: Shows editor, dataset, image key, timestamps, and expandable diffs
  • Pagination support: Efficient browsing of large edit histories
  • Deep linking: Direct links from edits to specific datasets and images
  • Full localization: Relative time formatting ("2 hours ago") in English and Russian

❀️ Like System

  • Dataset likes: GitHub/Telegram-style UI with overlapping avatars and like counts
  • Post likes: Like system for discussion posts with compact display
  • Interactive feedback: Heart icon with smooth animations (bounce effect, fill transition)
  • User attribution: Modal dialogs showing all users who liked with timestamps
  • Backend validation: Prevents duplicate likes and self-liking on posts
  • Real-time updates: Optimistic UI with automatic counter refresh
  • Anonymous support: View-only mode with login prompts for guests

πŸ“ File Management System

  • CSV file storage: Robust backend system with dedicated DatasetFile entity
  • Secure handling: Protected upload directories with unique naming and metadata tracking
  • Download functionality: Secure file downloads with proper MIME types and authentication
  • Version tracking: Infrastructure for managing multiple file versions and upload history
  • API endpoints: Dedicated routes for file listing (/files) and downloading (/files/:id/download)

πŸ› οΈ Technical Excellence

  • Advanced Authentication Architecture: Centralized auth context with React Context API for consistent state management
  • Automated HTTP Management: Axios interceptors for automatic JWT token injection, refresh, and error handling
  • Database migrations for safe schema management with TypeORM (10+ migrations deployed)
  • Comprehensive testing with Vitest for both frontend and backend
  • Type-safe development with TypeScript throughout and complete i18n integration
  • Production-ready deployment with systemd service configuration
  • Centralized logging with Winston and detailed audit trails for admin actions
  • Security-first design: Built-in safeguards, input validation, and secure session management
  • Development tools: ESLint, testing utilities, development servers, and comprehensive error handling
  • Migration system: Robust database schema management with 10+ migrations for safe deployments

πŸš€ Quick Start

Prerequisites

  • Node.js (v18 or newer recommended)
  • Bun (recommended) or npm package manager
  • MariaDB or MySQL database server

Installation

  1. Clone the repository:

    git clone https://github.com/PeaceDeadTS/dataset-canvas.git
    cd dataset-canvas
  2. Install dependencies:

    # Frontend dependencies (including i18n support)
    # Using Bun (recommended)
    bun install
    
    # Or using npm
    npm install
    
    # Install internationalization dependencies if not already included
    npm install react-i18next i18next i18next-browser-languagedetector
    
    # Backend dependencies
    cd backend
    bun install  # or npm install
    cd ..
  3. Database setup: Create a .env file in the backend directory:

    # Database Configuration
    DB_HOST=localhost
    DB_PORT=3306
    DB_USER=your_username
    DB_PASSWORD=your_password
    DB_NAME=dataset_canvas
    
    # JWT Configuration
    JWT_SECRET=your_super_secret_jwt_key_here
    
    # Optional: Unix Socket (overrides host/port for production)
    # DB_SOCKET_PATH=/var/run/mysqld/mysqld.sock
  4. Run database migrations:

    cd backend
    npm run migration:run

Development

Start both servers simultaneously:

  1. Backend server (in backend/ directory):

    npm run dev  # Development with nodemon
    # or
    npm start    # Production mode

    Backend runs on http://localhost:5000

  2. Frontend server (in root directory):

    npm run dev

    Frontend runs on http://localhost:5173

πŸ—οΈ Technology Stack

Backend

  • Runtime: Node.js with TypeScript
  • Framework: Express.js with custom middleware
  • Database: TypeORM with MariaDB/MySQL
  • Authentication: JWT with bcrypt password hashing
  • File Processing: Multer + CSV-parser for data uploads
  • Logging: Winston with file and console outputs
  • Testing: Vitest with Supertest for API testing

Frontend

  • Build Tool: Vite for fast development and optimized builds
  • Framework: React 18 with TypeScript and lazy loading optimization
  • Styling: Tailwind CSS with shadcn/ui component library
  • Routing: React Router DOM with URL parameter management and deep linking
  • Internationalization: react-i18next with browser detection and TypeScript integration
  • Performance: Intelligent code splitting and lazy loading for optimal bundle size
  • HTTP Client: Axios with JWT token management
  • State Management: Custom hooks with localStorage persistence
  • Testing: Vitest with React Testing Library and JSDOM

Development & Deployment

  • Package Manager: Bun (recommended) or npm
  • Linting: ESLint with TypeScript rules
  • Type Safety: Full TypeScript coverage with custom type definitions
  • Database Migrations: TypeORM migration system
  • Production: systemd service with environment configuration

πŸ“– API Documentation

Authentication Endpoints

  • POST /api/auth/register - User registration
  • POST /api/auth/login - User authentication

Dataset Endpoints

  • GET /api/datasets - List datasets (with role-based filtering and sorting)
  • GET /api/datasets/:id - Get dataset details with paginated images
  • GET /api/datasets/:id/statistics - Get comprehensive dataset statistics (resolution distribution, prompt analytics, training compatibility)
  • POST /api/datasets - Create new dataset (Developer/Admin)
  • PUT /api/datasets/:id - Update dataset (Owner/Admin)
  • DELETE /api/datasets/:id - Delete dataset (Owner/Admin)
  • POST /api/datasets/:id/upload - Upload CSV data (Owner/Admin)

User Management Endpoints

  • GET /api/users - List all users with sorting and role filtering options (sortBy: username/createdAt/publicDatasetCount, role: ADMIN/DEVELOPER/USER)
  • GET /api/users/:username - Get user profile and their datasets
  • GET /api/users/:id/edits - Get user's edit history (caption edits and discussion activity) with pagination
  • GET /api/users/me/settings - Get current user settings (theme preferences)
  • PATCH /api/users/me/settings - Update user settings (theme: light/dark/system)
  • PUT /api/users/:id/role - Update user role (Admin only)
  • DELETE /api/users/:id - Delete user (Admin only)

Discussion Endpoints

  • GET /api/datasets/:id/discussions - List all discussions for a dataset
  • POST /api/datasets/:id/discussions - Create new discussion with initial post
  • GET /api/discussions/:id - Get single discussion with all posts and replies
  • POST /api/discussions/:id/posts - Add reply to discussion
  • PATCH /api/posts/:id - Edit post with automatic history tracking
  • GET /api/posts/:id/history - Retrieve post edit history
  • DELETE /api/discussions/:id - Delete discussion (admin only)
  • DELETE /api/posts/:id - Soft delete post
  • PATCH /api/discussions/:id/lock - Lock/unlock discussion
  • PATCH /api/discussions/:id/pin - Pin/unpin discussion

Like Endpoints

  • GET /api/datasets/:id/likes - Get all likes for a dataset
  • POST /api/datasets/:id/likes - Like a dataset
  • DELETE /api/datasets/:id/likes - Unlike a dataset
  • GET /api/posts/:id/likes - Get all likes for a post
  • POST /api/posts/:id/likes - Like a post
  • DELETE /api/posts/:id/likes - Unlike a post

Permission Endpoints

  • GET /api/permissions - List all available permissions
  • GET /api/permissions/user/:userId - Get user's permissions
  • POST /api/permissions/grant - Grant permission to user
  • DELETE /api/permissions/revoke - Revoke permission from user

Recent Changes Endpoints

  • GET /api/recent-changes - Get site-wide recent changes (caption edits and discussion activity) with pagination

Admin Endpoints

  • GET /api/admin/datasets - List all datasets (Admin only)
  • DELETE /api/admin/datasets/:id - Force delete any dataset (Admin only)

Query Parameters

  • ?page=N - Pagination page number
  • ?limit=N - Items per page (10, 25, 50, 100)
  • ?sortBy=field - Sorting field (username/createdAt/publicDatasetCount for users; name/createdAt/imageCount/username for datasets)
  • ?order=ASC/DESC - Sort order (ascending or descending)
  • ?role=ADMIN/DEVELOPER/USER - User role filtering (for /api/users endpoint and /users page)
  • ?tab=public/my-public/my-private - Dataset tab selection (for /datasets page)

πŸ§ͺ Testing

The project includes comprehensive test coverage:

# Frontend tests (from root directory)
npm test

# Backend tests (from backend directory)
cd backend
npm test

# Interactive test UI
npm run test:ui

πŸ“ Project Structure

dataset-canvas/
β”œβ”€β”€ src/                    # Frontend source code
β”‚   β”œβ”€β”€ components/         # React components
β”‚   β”‚   β”œβ”€β”€ ui/            # shadcn/ui components
β”‚   β”‚   β”œβ”€β”€ AppHeader.tsx   # Main navigation header
β”‚   β”‚   β”œβ”€β”€ DatasetBreadcrumb.tsx  # Dataset navigation
β”‚   β”‚   β”œβ”€β”€ LanguageSelector.tsx   # Language switcher
β”‚   β”‚   └── ...            # Other custom components
β”‚   β”œβ”€β”€ pages/             # Route components
β”‚   β”‚   β”œβ”€β”€ Users.tsx      # Users directory page
β”‚   β”‚   β”œβ”€β”€ AllDatasets.tsx # All datasets page
β”‚   β”‚   └── ...            # Other page components
β”‚   β”œβ”€β”€ locales/           # Internationalization files
β”‚   β”‚   β”œβ”€β”€ en/           # English translations
β”‚   β”‚   β”‚   β”œβ”€β”€ common.json
β”‚   β”‚   β”‚   β”œβ”€β”€ navigation.json
β”‚   β”‚   β”‚   └── pages.json
β”‚   β”‚   └── ru/           # Russian translations
β”‚   β”‚       β”œβ”€β”€ common.json
β”‚   β”‚       β”œβ”€β”€ navigation.json
β”‚   β”‚       └── pages.json
β”‚   β”œβ”€β”€ hooks/             # Custom React hooks
β”‚   β”œβ”€β”€ lib/               # Utility functions
β”‚   β”‚   └── i18n.ts       # Internationalization config
β”‚   └── types/             # TypeScript type definitions
β”‚       └── i18next.d.ts   # i18n type definitions
β”œβ”€β”€ backend/               # Backend source code
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ entity/        # TypeORM entities
β”‚   β”‚   β”œβ”€β”€ routes/        # Express routes
β”‚   β”‚   β”‚   β”œβ”€β”€ users.ts   # User management endpoints
β”‚   β”‚   β”‚   └── datasets.ts # Dataset endpoints
β”‚   β”‚   β”œβ”€β”€ middleware/    # Custom middleware
β”‚   β”‚   └── types/         # Backend type definitions
β”‚   └── ...
β”œβ”€β”€ public/                # Static assets
└── ...

πŸš€ Deployment

Production Build

  1. Build the frontend:

    npm run build
  2. Build the backend:

    cd backend
    npm run build
  3. Configure systemd service (Linux):

    [Unit]
    Description=Dataset Canvas Backend
    
    [Service]
    Type=simple
    User=your-user
    WorkingDirectory=/path/to/dataset-canvas/backend
    ExecStart=/usr/bin/node dist/index.js
    EnvironmentFile=/path/to/.env
    Restart=always
    
    [Install]
    WantedBy=multi-user.target

Environment Variables (Production)

DB_HOST=your-production-db-host
DB_PORT=3306
DB_USER=your-production-db-user
DB_PASSWORD=your-secure-password
DB_NAME=dataset_canvas_production
JWT_SECRET=your-very-secure-jwt-secret
NODE_ENV=production

# Optional Unix Socket
DB_SOCKET_PATH=/var/run/mysqld/mysqld.sock

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


Dataset Canvas - Professional dataset management made simple. 🎨✨

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages