Skip to content

nandanarnandu/Synthetic_Data_Generator-GAN-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

🧠📊 Synthetic Data Generator using GANs

Python TensorFlow scikit-learn pandas License: MIT

A complete synthetic data generation solution using Generative Adversarial Networks (GANs) to create privacy-safe synthetic app usage data. Built with Python, TensorFlow, and scikit-learn, perfect for data augmentation and safe model training.


✨ Features

  • 📤 Upload & Manage Data
    Upload your app usage dataset in CSV format using Google Colab file uploader.

  • 🧹 Data Preprocessing
    Drop unneeded columns, normalize numeric data using MinMaxScaler.

  • ⚙️ Generator & Discriminator Models
    Fully configured Keras-based Generator and Discriminator architectures for synthetic data generation.

  • 🔄 Adversarial Training
    Train GAN in an adversarial setup for realistic synthetic data generation.

  • 💾 Model Persistence
    Save generator model (generator.h5), scaler (scaler.pkl), and column mapping (columns.json) for reuse.

  • 📊 Generate Synthetic Samples
    Generate and export new synthetic records after training.


🚀 Quick Start

# Clone this repo
git clone https://github.com/your-username/gan-synthetic-data-generator.git
cd gan-synthetic-data-generator

# Open in Google Colab
# Upload your 'screentime_analysis.csv'

# Install required dependencies
pip install -r requirements.txt

# (Optional) Install huggingface tools
pip install huggingface_hub
apt-get install git-lfs -y
git lfs install

# Run the notebook in Google Colab
# Step by step run cells to preprocess data, build models, train GAN, and generate synthetic data

# After training:
# generator.h5, scaler.pkl, columns.json will be saved automatically

📂 Dataset

The dataset: screentime_analysis.csv Example columns: ⦁ Date: Date (dropped during preprocessing)

App: App name (dropped during preprocessing)

Usage: Time spent using the app

Notifications: Number of notifications received

Times Opened: Number of times the app was opened

Place your CSV inside data/ (e.g., data/ad_users.csv) or upload via the web UI.

📊 Features Used

⦁ App Usage Time

⦁ Notifications Received

⦁ App Open Count

🔍 Techniques Applied

⦁ Data Normalization (MinMaxScaler)

⦁ Generator & Discriminator Deep Neural Networks (DNNs)

⦁ Adversarial Training Loop (GAN)

⦁ Synthetic Data Generation

📌 Sample Output

Example synthetic data output (after training):

[[482.3, 18.5, 12.1], [397.8, 10.3, 7.6], [510.2, 15.8, 11.0]]

Generated to mimic real-world data distribution while ensuring privacy.

📈 Output

generator.h5: Trained Generator model

scaler.pkl: Saved MinMaxScaler for consistent normalization

columns.json: Original column names used in the dataset

🛠️ Tech Stack

Backend: Python, Google Colab

ML/DS: TensorFlow / Keras, Pandas, NumPy, Scikit-learn

Utilities: joblib (model persistence), huggingface_hub (optional model hosting)

💡 Contributions, issues, and feature requests are welcome!


About

Generate realistic synthetic data using Generative Adversarial Networks (GANs) trained on app usage statistics. Ideal for privacy-safe data analysis and machine learning applications.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors