InsectID is an open-source project combining deep learning and mobile technology to identify insect species from images. With a focus on Indian biodiversity, the system is currently trained to recognize thousands of species including butterflies, moths, dragonflies, and cicadas.
⚠️ Note: This project is currently in an early prototype stage. Mechanics, visuals, and features are subject to change and improvement.🛑 Disclaimer: This project is developed for educational purposes and personal use only.
Identify insects on the go with the Android application.
- Google Play Store (Latest Beta)
- APK Archives: Google Drive | Dropbox
- Launch the App: Open InsectID on your Android device.
- Manage Models:
- Before identifying, ensure you download the relevant model for your insect type.
- Available Models: Butterfly & Moth, Dragonfly & Damselfly, Cicada.
- Note: Models require internet to download once, but identification works completely offline afterwards.
- Choose Input Method:
- 📷 Camera: Point your camera at an insect to capture a photo.
- 🖼️ Gallery: Select an existing image from your photo library.
- Crop & Focus: For best results, crop the image so the insect fills most of the frame.
- Select Model & Identify: Choose the correct model group from the dropdown/menu and tap the checkmark to analyze.
- Results: The AI will list the most likely species matches with confidence scores.
The current model is trained on 378k+ images covering 4,767 unique species/classes.
| Species Group | Covered (Count) | Estimated (India) |
|---|---|---|
| Moths (Lepidoptera) | 2,899 | 12,000+ |
| Butterflies (Lepidoptera) | 1,501 | 1,300+ |
| Dragonflies/Damselflies (Odonata) | 510 | 760+ |
| Cicadas (Hemiptera) | 300 | 250+ |
| Source | Images | Classes | Region | Notes |
|---|---|---|---|---|
| Moths of India | 44k | 3,364 | India | Primary source for moths |
| Butterflies of India | 66k | 1,554 | India | High quality verified images |
| Indian Odonata | 13k | 737 | India | Includes empty classes |
| Indian Cicadas | 1k | 308 | India | Sparse data |
| iNaturalist | 232k | 4,221 | India | Large volume, mixed quality |
| India Biodiversity | 12k | 1,444 | India | Legacy names, some typos |
| Insecta.pro | 25k | 5,068 | Global | Low res images |
| Wikipedia | 2k | 1,825 | India | Reference only |
Model Checkpoints: Google Drive Raw Datasets: Google Drive
The project provides three main scripts for the end-to-end ML workflow: scraping, training, and testing.
Collects images from various online sources (e.g., MothsofIndia, iNaturalist) based on a species list (species.json).
# Scrape specific types (default: moth)
python scrape.py --types moth butterfly
# Scrape only new/missing species (skip existing directories)
python scrape.py --new-speciesLogs: logs/scrape.{type}.log
Handles data aggregation, validation, and model training (from scratch or incremental).
# Default training (Lepidoptera, Version 1)
python train.py
# Train a specific version (e.g., experimental run)
python train.py -v v2 --max-epochs 50
# Resume training automatically (detects latest checkpoint)
python train.py -v v2
# Advanced: Skip data processing steps for faster startup
python train.py -v v2 --skip-aggregate --skip-validate
# Train on a different model type
python train.py -m odonataLogs: logs/train.{model_name}.{version}.log
Evaluates trained models on test datasets. Supports wildcards for batch testing.
# Test the latest checkpoint (auto-detected)
python test.py
# Test specific version and epoch
python test.py -v v2 -e 20
# Test on multiple directories (supports wildcards)
python test.py --test-dirs insect-dataset/src/test*/lepidoptera
# Detailed output (show predictions per image)
python test.py --print-preds --top-k 1 3 5Logs: logs/test.{model_name}.{version}.log
Generates production assets (TorchScript models, image archives, metadata) from trained checkpoints.
# Publish all assets for a model
python publish.py -m lepidoptera -v v2
# Publish specific assets
python publish.py -m odonata --task images
python publish.py -m cicada --task model
# Publish ExecutionTorch model
python publish.py -m lepidoptera -v v2 --task model --executorchLogs: logs/publish.{model_name}.{version}.log
The project includes custom wrapper libraries mynnlib and mynnlibv2 for easy training and inference.
Ensure you have torch (with CUDA support if available) and other dependencies installed.
pip install -r requirements.txtNote: The requirements.txt includes an extra index URL for PyTorch with CUDA 11.8 support. Adjust if necessary for your system.
import mynnlib
from mynnlib import predict, predict_top_k
# Load Model
model_data = torch.load("path/to/checkpoint.pth", weights_only=False)
# Single Prediction
result = predict("image.jpg", model_data)
print(f"Species: {result}")
# Top-5 Predictions
top5 = predict_top_k("image.jpg", model_data, 5)
print(top5)Use mynnlibv2 for incremental learning capabilities.
import mynnlibv2
from mynnlibv2 import init_model, run_epoch
# Initialize New Model
model_data = init_model(
train_dir="dataset/data",
val_dir="dataset/val",
batch_size=32,
image_size=224,
lr=1e-4
)
# Train Loop
for epoch in range(15):
run_epoch(model_data, output_path="checkpoints/model", robustness_lambda=0.1)- Species Expansion: Cover Beeltes, Hymenoptera (Bees/Wasps), and Orthoptera.
- Lifecycle Handling: Better classification for Larvae, Pupae, and Eggs.
- App Improvements:
- Better screen capture handling.
- User-controlled model downloading (move away from Google Drive).
- Data Cleanup: Resolve taxonomic synonyms and typos in source data.
- Root Classifier: Hierarchical model to first identify Order/Family before Species.





