Skip to content

kavyachouhan/ExploreMyData

Repository files navigation

ExploreMyData

Automated Exploratory Data Analysis (EDA) for your datasets. Upload a CSV file, select your ML task, and get comprehensive insights in seconds.

https://www.exploremydata.xyz/

Features

Data Processing

  • Drag and drop CSV file upload
  • Automatic data cleaning and preprocessing
  • Missing value detection and imputation
  • Duplicate row detection
  • Data type inference

ML Task Support

  • Classification - Class distribution, imbalance detection, feature-target correlations
  • Regression - Target distribution, linearity analysis, residual patterns
  • NLP / Text Analysis - Text statistics, vocabulary analysis, label distribution
  • Computer Vision - Image path detection, class balance, augmentation suggestions

Comprehensive EDA

  • Data quality reports with actionable insights
  • Statistical summaries (describe, dtypes, info)
  • Missing value patterns and heatmaps
  • Outlier detection using IQR method
  • Feature correlations and multicollinearity (VIF)
  • Target leakage detection
  • Baseline model performance estimates
  • Text feature analysis for ML engineers

Export Options

  • Save individual charts as PNG images
  • Export full EDA report as PDF
  • Copy Python code snippets to reproduce analysis

Getting Started

Installation

# Clone the repository
git clone https://github.com/kavyachouhan/ExploreMyData.git
cd ExploreMyData

# Install dependencies
pnpm install

# Run development server
pnpm dev

Open http://localhost:3000 to view the application.

Environment Variables

Create a .env.local file in the root directory:

# Appwrite Configuration (Optional - for upload logging)
NEXT_PUBLIC_APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1
NEXT_PUBLIC_APPWRITE_PROJECT_ID=your_project_id
NEXT_PUBLIC_APPWRITE_DATABASE_ID=your_database_id
NEXT_PUBLIC_APPWRITE_UPLOAD_LOGS_COLLECTION_ID=upload_logs

Project Structure

├── app/                    # Next.js App Router pages
│   ├── explore/           # Main explore flow
│   │   ├── insight/       # EDA results page
│   │   └── questions/     # Task selection page
│   ├── privacy/           # Privacy policy
│   └── terms/             # Terms of service
├── components/
│   ├── charts/            # Chart components
│   ├── eda/               # EDA analysis components
│   ├── explore/           # Upload and navigation
│   ├── layout/            # Header, Footer
│   ├── report/            # PDF export
│   └── ui/                # Reusable UI components
├── context/               # React Context providers
├── lib/                   # Utility functions
│   ├── dataUtils.ts       # Data cleaning utilities
│   └── edaUtils.ts        # EDA analysis functions
└── public/                # Static assets

.M

Scripts

pnpm dev          # Start development server
pnpm build        # Build for production
pnpm start        # Start production server
pnpm lint         # Run ESLint

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License.

Contact

For questions or feedback, please open an issue on GitHub.

About

Automated EDA tool for ML datasets. Upload CSV, select task type, get comprehensive insights with exportable reports.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors