Dataset Canvas is a comprehensive web application inspired by Hugging Face's Data Studio, designed for professional dataset management and visualization. Built with modern technologies and international accessibility in mind, it provides a robust platform for managing image datasets with advanced features like multilingual support, personalized theme system, granular permissions, discussion system, comprehensive administrative panel, enhanced authentication architecture, role-based access control, CSV data imports, and intelligent image metadata processing.
- Database-persisted preferences: Each user's theme settings stored in database, eliminating cross-user conflicts
- Three theme modes: Light, Dark, and System (follows OS preferences) with smooth transitions
- Professional dark theme: Inspired by GitHub Dark, Discord, and VS Code with balanced contrast for reduced eye strain
- Hybrid storage: Authenticated users get server-side persistence, anonymous users use localStorage fallback
- Settings dialog: Modern interface for theme selection and preferences with extensible architecture
- Real-time synchronization: Theme changes apply immediately with automatic API sync
- Comprehensive Statistics API: New
/api/datasets/:id/statisticsendpoint providing detailed dataset analysis - Resolution Distribution Analysis: Real-time breakdown of image resolutions with percentage calculations
- Interactive Statistics Display: Expandable resolution lists with progress bars and visual feedback
- Training Compatibility Validation: Automatic check for neural network training requirements (64px divisibility)
- Prompt Length Analytics: Average prompt length calculation for text-to-image datasets
- Smart Visual Indicators: Color-coded compatibility warnings and success messages
- New Admin Interface: Comprehensive
/adminpanel exclusively for administrators - Complete User Management: Change roles, delete users, view system statistics
- Dataset Administration: Force delete any dataset with proper audit logging
- Security Features: Built-in safeguards prevent admin account lockouts
- Centralized Auth Context: Unified user state management across all components
- Automatic Token Management: Axios interceptors handle JWT tokens seamlessly
- Universal Dataset Creation: All authenticated users can now create datasets
- Smart Error Handling: Automatic logout and redirect on authentication failures
- Fixed URL Duplication: Resolved
/api/api/datasets404 errors from conflicting configurations - React Navigation Error: Fixed logout errors by implementing proper React Router navigation
- Authentication Sync: Resolved user state inconsistencies between components
- Multi-language support: Complete interface translation for English (default) and Russian with 100% coverage
- Smart language detection: Automatic browser language detection with persistent user preferences
- Real-time language switching: Dynamic language changes without page refresh
- TypeScript integration: Type-safe translations with full IDE support
- Modern language selector: Intuitive language switcher with flag indicators
- Quality assurance: Comprehensive localization testing with systematic bug fixes for missing translation keys
- Comprehensive user directory (
/users): Complete listing of all system users - Role-based filtering: Filter users by role through navigation menu (
/users?role=ADMIN,/users?role=DEVELOPER,/users?role=USER) - Advanced sorting capabilities: Sort users by name, registration date, or public dataset count
- User profile cards: Professional display with avatars, roles, and statistics
- Direct profile navigation: Click-through to individual user profiles
- Complete role coverage: Navigation menu includes options for all user roles (Administrators, Developers, Regular Users)
- All datasets page (
/datasets): Unified browsing interface for all available datasets - Enhanced three-tab system: Separate tabs for "Public" (all public datasets), "My Public" (user's own public datasets), and "My Private" (user's private datasets)
- Intelligent URL state management: Tab selection preserved in URL parameters (
?tab=public/my-public/my-private) - Multi-criteria sorting: Sort by name, creation date, image count, or author
- Enhanced filtering: Advanced dataset organization with persistent URL parameters
- Seamless navigation: Deep linking support with URL state preservation
- Secure JWT-based authentication with advanced token management and axios interceptors
- Centralized Authentication Context: Global user state management with React Context
- Automatic Token Handling: JWT tokens automatically included in all API requests
- Role-Based Access Control (RBAC):
- Administrator: Full control over all datasets and users, access to admin panel
- Developer: Can create datasets and manage their own (public/private)
- User: Can create datasets and view public content, manage own private datasets
- First user auto-promotion to Administrator role
- Smart Session Management: Automatic logout and redirect on authentication failures
- Comprehensive Admin Interface (
/admin): Exclusive administrative control panel for system management - User Management System:
- View all users with advanced filtering and sorting capabilities
- Change user roles (Administrator, Developer, User) with confirmation dialogs
- Delete users from the system with proper safety controls
- Protection against self-modification to prevent admin lockouts
- Dataset Administration:
- Force delete any dataset regardless of ownership
- Comprehensive dataset overview with owner information and statistics
- Confirmation dialogs and audit logging for all administrative actions
- Security & Compliance:
- All administrative actions are logged with detailed audit trails
- Built-in safeguards prevent system lockout scenarios
- Professional UI with modern tabbed interface and data tables
- Full Localization: Complete translation support for all admin features
- Universal Dataset Creation: All authenticated users can create and manage datasets
- Complete CRUD operations for datasets with proper authorization controls
- Public/Private dataset support with intelligent visibility controls
- CSV data upload with intelligent parsing (filename, url, width, height, prompt columns)
- Advanced Dataset Statistics: Comprehensive analytics including:
- Resolution Distribution: Real-time analysis of image resolutions with percentage breakdowns and interactive expansion
- Training Compatibility Check: Automatic validation for neural network training (64px divisibility check)
- Prompt Analytics: Average prompt length calculation for text-to-image datasets
- Interactive Statistics Display: Expandable resolution lists with progress bars and visual indicators
- Advanced pagination system with URL parameter support (
?p=22, customizable items per page: 10/25/50/100) - Smart dataset organization with separate sections for private and public datasets
- Administrative Override: Admins can manage any dataset regardless of ownership
- Interactive image previews with click-to-expand modals
- Comprehensive metadata presentation:
- Smart aspect ratio calculation using GCD (Greatest Common Divisor)
- Automatic detection of standard ratios (16:9, 4:3, etc.)
- File extension detection from URLs
- Clickable image URLs for direct access
- Optimized table layout with responsive column sizing
- Sticky header/footer interface - dataset info stays visible while scrolling through images
- Revolutionary navigation system: Organized menu structure with logical grouping (Main, Datasets, Community)
- Enhanced visual design: Modern dropdown menus with icons, descriptions, and contextual help
- Unified interface: Consistent navigation throughout the entire application
- Responsive design with Tailwind CSS and shadcn/ui components
- Sticky layout system: Header and pagination remain fixed while table scrolls
- Breadcrumb navigation for easy navigation between views
- Full-screen utilization for optimal data visualization
- Loading states and error handling throughout the application
- Lazy loading implementation: All pages load on-demand using React.lazy() and Suspense
- Significant bundle reduction: Main bundle size reduced from 504KB to 313KB (37% improvement)
- Intelligent code splitting: Automatic chunk optimization for faster loading
- Progressive loading: Improved initial load times with optimized resource distribution
- Per-user theme preferences: Database-stored themes ensure consistent experience across devices and sessions
- Three theme options: Light, Dark, and System (automatically follows OS preferences)
- Professional dark theme: Carefully crafted color palette inspired by industry leaders (GitHub, Discord, VS Code)
- Smooth transitions: Instant theme switching without page reload using CSS variables
- Settings dialog: Modern modal interface with theme selection, language preferences, and future extensibility
- Hybrid storage approach: Server-side persistence for authenticated users, localStorage for anonymous visitors
- API-driven: Dedicated REST endpoints (
GET/PATCH /api/users/me/settings) for preferences management
- MediaWiki-inspired architecture: Flexible permission system supporting unlimited permission types
- Many-to-many relationships: User-permission associations with dedicated junction tables
- Built-in permissions: Comprehensive set including discussion management, caption editing, and content moderation
- Default permissions: Automatic assignment during registration (read/create/reply to discussions, edit own posts)
- Administrator override: Admins automatically inherit all permissions without explicit grants
- Admin panel integration: Full permission management interface with real-time status indicators and grant/revoke actions
- Complete discussion infrastructure: Three TypeORM entities with proper relationships and cascading deletes
- Full CRUD operations: 10 RESTful API endpoints covering all discussion functionality
- Nested reply support: Thread-style conversations with quotations and visual hierarchy
- Edit history tracking: MediaWiki-style diff viewer with color-coded changes for all post edits
- Moderation tools: Lock/unlock, pin/unpin discussions, soft delete posts with admin controls
- Permission-based access: Six granular permissions controlling discussion participation
- Real-time UI: URL-based navigation, expandable threads, clickable usernames, and visual indicators
- Inline caption editor: React component with textarea, character count, and keyboard shortcuts (Ctrl+Enter, Esc)
- Complete revision history: All caption changes tracked in database with timestamps and user attribution
- MediaWiki-style diff viewer: Visual comparison with color-coded additions (green) and deletions (red)
- Permission-based access: Edit button displayed only for users with
edit_captionpermission - Audit trail: Full history of all modifications for compliance and accountability
- Real-time updates: Immediate UI refresh after caption edits across dataset views
- User edit history: Dedicated profile tab (
/users/:username?tab=edits) showing all caption modifications - Recent changes page: Global activity monitor (
/recent-changes) for site-wide edit tracking - Comprehensive metadata: Shows editor, dataset, image key, timestamps, and expandable diffs
- Pagination support: Efficient browsing of large edit histories
- Deep linking: Direct links from edits to specific datasets and images
- Full localization: Relative time formatting ("2 hours ago") in English and Russian
- Dataset likes: GitHub/Telegram-style UI with overlapping avatars and like counts
- Post likes: Like system for discussion posts with compact display
- Interactive feedback: Heart icon with smooth animations (bounce effect, fill transition)
- User attribution: Modal dialogs showing all users who liked with timestamps
- Backend validation: Prevents duplicate likes and self-liking on posts
- Real-time updates: Optimistic UI with automatic counter refresh
- Anonymous support: View-only mode with login prompts for guests
- CSV file storage: Robust backend system with dedicated
DatasetFileentity - Secure handling: Protected upload directories with unique naming and metadata tracking
- Download functionality: Secure file downloads with proper MIME types and authentication
- Version tracking: Infrastructure for managing multiple file versions and upload history
- API endpoints: Dedicated routes for file listing (
/files) and downloading (/files/:id/download)
- Advanced Authentication Architecture: Centralized auth context with React Context API for consistent state management
- Automated HTTP Management: Axios interceptors for automatic JWT token injection, refresh, and error handling
- Database migrations for safe schema management with TypeORM (10+ migrations deployed)
- Comprehensive testing with Vitest for both frontend and backend
- Type-safe development with TypeScript throughout and complete i18n integration
- Production-ready deployment with systemd service configuration
- Centralized logging with Winston and detailed audit trails for admin actions
- Security-first design: Built-in safeguards, input validation, and secure session management
- Development tools: ESLint, testing utilities, development servers, and comprehensive error handling
- Migration system: Robust database schema management with 10+ migrations for safe deployments
- Node.js (v18 or newer recommended)
- Bun (recommended) or npm package manager
- MariaDB or MySQL database server
-
Clone the repository:
git clone https://github.com/PeaceDeadTS/dataset-canvas.git cd dataset-canvas -
Install dependencies:
# Frontend dependencies (including i18n support) # Using Bun (recommended) bun install # Or using npm npm install # Install internationalization dependencies if not already included npm install react-i18next i18next i18next-browser-languagedetector # Backend dependencies cd backend bun install # or npm install cd ..
-
Database setup: Create a
.envfile in thebackenddirectory:# Database Configuration DB_HOST=localhost DB_PORT=3306 DB_USER=your_username DB_PASSWORD=your_password DB_NAME=dataset_canvas # JWT Configuration JWT_SECRET=your_super_secret_jwt_key_here # Optional: Unix Socket (overrides host/port for production) # DB_SOCKET_PATH=/var/run/mysqld/mysqld.sock
-
Run database migrations:
cd backend npm run migration:run
Start both servers simultaneously:
-
Backend server (in
backend/directory):npm run dev # Development with nodemon # or npm start # Production mode
Backend runs on
http://localhost:5000 -
Frontend server (in root directory):
npm run dev
Frontend runs on
http://localhost:5173
- Runtime: Node.js with TypeScript
- Framework: Express.js with custom middleware
- Database: TypeORM with MariaDB/MySQL
- Authentication: JWT with bcrypt password hashing
- File Processing: Multer + CSV-parser for data uploads
- Logging: Winston with file and console outputs
- Testing: Vitest with Supertest for API testing
- Build Tool: Vite for fast development and optimized builds
- Framework: React 18 with TypeScript and lazy loading optimization
- Styling: Tailwind CSS with shadcn/ui component library
- Routing: React Router DOM with URL parameter management and deep linking
- Internationalization: react-i18next with browser detection and TypeScript integration
- Performance: Intelligent code splitting and lazy loading for optimal bundle size
- HTTP Client: Axios with JWT token management
- State Management: Custom hooks with localStorage persistence
- Testing: Vitest with React Testing Library and JSDOM
- Package Manager: Bun (recommended) or npm
- Linting: ESLint with TypeScript rules
- Type Safety: Full TypeScript coverage with custom type definitions
- Database Migrations: TypeORM migration system
- Production: systemd service with environment configuration
POST /api/auth/register- User registrationPOST /api/auth/login- User authentication
GET /api/datasets- List datasets (with role-based filtering and sorting)GET /api/datasets/:id- Get dataset details with paginated imagesGET /api/datasets/:id/statistics- Get comprehensive dataset statistics (resolution distribution, prompt analytics, training compatibility)POST /api/datasets- Create new dataset (Developer/Admin)PUT /api/datasets/:id- Update dataset (Owner/Admin)DELETE /api/datasets/:id- Delete dataset (Owner/Admin)POST /api/datasets/:id/upload- Upload CSV data (Owner/Admin)
GET /api/users- List all users with sorting and role filtering options (sortBy: username/createdAt/publicDatasetCount, role: ADMIN/DEVELOPER/USER)GET /api/users/:username- Get user profile and their datasetsGET /api/users/:id/edits- Get user's edit history (caption edits and discussion activity) with paginationGET /api/users/me/settings- Get current user settings (theme preferences)PATCH /api/users/me/settings- Update user settings (theme: light/dark/system)PUT /api/users/:id/role- Update user role (Admin only)DELETE /api/users/:id- Delete user (Admin only)
GET /api/datasets/:id/discussions- List all discussions for a datasetPOST /api/datasets/:id/discussions- Create new discussion with initial postGET /api/discussions/:id- Get single discussion with all posts and repliesPOST /api/discussions/:id/posts- Add reply to discussionPATCH /api/posts/:id- Edit post with automatic history trackingGET /api/posts/:id/history- Retrieve post edit historyDELETE /api/discussions/:id- Delete discussion (admin only)DELETE /api/posts/:id- Soft delete postPATCH /api/discussions/:id/lock- Lock/unlock discussionPATCH /api/discussions/:id/pin- Pin/unpin discussion
GET /api/datasets/:id/likes- Get all likes for a datasetPOST /api/datasets/:id/likes- Like a datasetDELETE /api/datasets/:id/likes- Unlike a datasetGET /api/posts/:id/likes- Get all likes for a postPOST /api/posts/:id/likes- Like a postDELETE /api/posts/:id/likes- Unlike a post
GET /api/permissions- List all available permissionsGET /api/permissions/user/:userId- Get user's permissionsPOST /api/permissions/grant- Grant permission to userDELETE /api/permissions/revoke- Revoke permission from user
GET /api/recent-changes- Get site-wide recent changes (caption edits and discussion activity) with pagination
GET /api/admin/datasets- List all datasets (Admin only)DELETE /api/admin/datasets/:id- Force delete any dataset (Admin only)
?page=N- Pagination page number?limit=N- Items per page (10, 25, 50, 100)?sortBy=field- Sorting field (username/createdAt/publicDatasetCount for users; name/createdAt/imageCount/username for datasets)?order=ASC/DESC- Sort order (ascending or descending)?role=ADMIN/DEVELOPER/USER- User role filtering (for/api/usersendpoint and/userspage)?tab=public/my-public/my-private- Dataset tab selection (for/datasetspage)
The project includes comprehensive test coverage:
# Frontend tests (from root directory)
npm test
# Backend tests (from backend directory)
cd backend
npm test
# Interactive test UI
npm run test:uidataset-canvas/
βββ src/ # Frontend source code
β βββ components/ # React components
β β βββ ui/ # shadcn/ui components
β β βββ AppHeader.tsx # Main navigation header
β β βββ DatasetBreadcrumb.tsx # Dataset navigation
β β βββ LanguageSelector.tsx # Language switcher
β β βββ ... # Other custom components
β βββ pages/ # Route components
β β βββ Users.tsx # Users directory page
β β βββ AllDatasets.tsx # All datasets page
β β βββ ... # Other page components
β βββ locales/ # Internationalization files
β β βββ en/ # English translations
β β β βββ common.json
β β β βββ navigation.json
β β β βββ pages.json
β β βββ ru/ # Russian translations
β β βββ common.json
β β βββ navigation.json
β β βββ pages.json
β βββ hooks/ # Custom React hooks
β βββ lib/ # Utility functions
β β βββ i18n.ts # Internationalization config
β βββ types/ # TypeScript type definitions
β βββ i18next.d.ts # i18n type definitions
βββ backend/ # Backend source code
β βββ src/
β β βββ entity/ # TypeORM entities
β β βββ routes/ # Express routes
β β β βββ users.ts # User management endpoints
β β β βββ datasets.ts # Dataset endpoints
β β βββ middleware/ # Custom middleware
β β βββ types/ # Backend type definitions
β βββ ...
βββ public/ # Static assets
βββ ...
-
Build the frontend:
npm run build
-
Build the backend:
cd backend npm run build -
Configure systemd service (Linux):
[Unit] Description=Dataset Canvas Backend [Service] Type=simple User=your-user WorkingDirectory=/path/to/dataset-canvas/backend ExecStart=/usr/bin/node dist/index.js EnvironmentFile=/path/to/.env Restart=always [Install] WantedBy=multi-user.target
DB_HOST=your-production-db-host
DB_PORT=3306
DB_USER=your-production-db-user
DB_PASSWORD=your-secure-password
DB_NAME=dataset_canvas_production
JWT_SECRET=your-very-secure-jwt-secret
NODE_ENV=production
# Optional Unix Socket
DB_SOCKET_PATH=/var/run/mysqld/mysqld.sock- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by Hugging Face's Data Studio
- Built with shadcn/ui component library
- Powered by modern web technologies and best practices
Dataset Canvas - Professional dataset management made simple. π¨β¨