Insect Species Identification

AI-Powered Identification for Indian Insects

InsectID is an open-source project combining deep learning and mobile technology to identify insect species from images. With a focus on Indian biodiversity, the system is currently trained to recognize thousands of species including butterflies, moths, dragonflies, and cicadas.

⚠️ Note: This project is currently in an early prototype stage. Mechanics, visuals, and features are subject to change and improvement.

🛑 Disclaimer: This project is developed for educational purposes and personal use only.

📱 Mobile App (Android)

Identify insects on the go with the Android application.

Google Play Store (Latest Beta)
APK Archives: Google Drive | Dropbox

🚀 How to Use

Launch the App: Open InsectID on your Android device.
Manage Models:
- Before identifying, ensure you download the relevant model for your insect type.
- Available Models: Butterfly & Moth, Dragonfly & Damselfly, Cicada.
- Note: Models require internet to download once, but identification works completely offline afterwards.
Choose Input Method:
- 📷 Camera: Point your camera at an insect to capture a photo.
- 🖼️ Gallery: Select an existing image from your photo library.
Crop & Focus: For best results, crop the image so the insect fills most of the frame.
Select Model & Identify: Choose the correct model group from the dropdown/menu and tap the checkmark to analyze.
Results: The AI will list the most likely species matches with confidence scores.

🧠 Model & Data Statistics

The current model is trained on 378k+ images covering 4,767 unique species/classes.

Species Group	Covered (Count)	Estimated (India)
Moths (Lepidoptera)	2,899	12,000+
Butterflies (Lepidoptera)	1,501	1,300+
Dragonflies/Damselflies (Odonata)	510	760+
Cicadas (Hemiptera)	300	250+

Dataset Sources

Source	Images	Classes	Region	Notes
Moths of India	44k	3,364	India	Primary source for moths
Butterflies of India	66k	1,554	India	High quality verified images
Indian Odonata	13k	737	India	Includes empty classes
Indian Cicadas	1k	308	India	Sparse data
iNaturalist	232k	4,221	India	Large volume, mixed quality
India Biodiversity	12k	1,444	India	Legacy names, some typos
Insecta.pro	25k	5,068	Global	Low res images
Wikipedia	2k	1,825	India	Reference only

Model Checkpoints: Google Drive Raw Datasets: Google Drive

🖥️ Command Line Tools

The project provides three main scripts for the end-to-end ML workflow: scraping, training, and testing.

1. Scraping Data (`scrape.py`)

Collects images from various online sources (e.g., MothsofIndia, iNaturalist) based on a species list (species.json).

# Scrape specific types (default: moth)
python scrape.py --types moth butterfly

# Scrape only new/missing species (skip existing directories)
python scrape.py --new-species

Logs: logs/scrape.{type}.log

2. Training (`train.py`)

Handles data aggregation, validation, and model training (from scratch or incremental).

# Default training (Lepidoptera, Version 1)
python train.py

# Train a specific version (e.g., experimental run)
python train.py -v v2 --max-epochs 50

# Resume training automatically (detects latest checkpoint)
python train.py -v v2

# Advanced: Skip data processing steps for faster startup
python train.py -v v2 --skip-aggregate --skip-validate

# Train on a different model type
python train.py -m odonata

Logs: logs/train.{model_name}.{version}.log

3. Evaluation (`test.py`)

Evaluates trained models on test datasets. Supports wildcards for batch testing.

# Test the latest checkpoint (auto-detected)
python test.py

# Test specific version and epoch
python test.py -v v2 -e 20

# Test on multiple directories (supports wildcards)
python test.py --test-dirs insect-dataset/src/test*/lepidoptera

# Detailed output (show predictions per image)
python test.py --print-preds --top-k 1 3 5

Logs: logs/test.{model_name}.{version}.log

4. Asset Publishing (`publish.py`)

Generates production assets (TorchScript models, image archives, metadata) from trained checkpoints.

# Publish all assets for a model
python publish.py -m lepidoptera -v v2

# Publish specific assets
python publish.py -m odonata --task images
python publish.py -m cicada --task model

# Publish ExecutionTorch model
python publish.py -m lepidoptera -v v2 --task model --executorch

Logs: logs/publish.{model_name}.{version}.log

🛠️ Python Library Usage (`mynnlib`)

The project includes custom wrapper libraries mynnlib and mynnlibv2 for easy training and inference.

Installation

Ensure you have torch (with CUDA support if available) and other dependencies installed.

pip install -r requirements.txt

Note: The requirements.txt includes an extra index URL for PyTorch with CUDA 11.8 support. Adjust if necessary for your system.

Inference Example

import mynnlib
from mynnlib import predict, predict_top_k

# Load Model
model_data = torch.load("path/to/checkpoint.pth", weights_only=False)

# Single Prediction
result = predict("image.jpg", model_data)
print(f"Species: {result}")

# Top-5 Predictions
top5 = predict_top_k("image.jpg", model_data, 5)
print(top5)

Training Example (Incremental)

Use mynnlibv2 for incremental learning capabilities.

import mynnlibv2
from mynnlibv2 import init_model, run_epoch

# Initialize New Model
model_data = init_model(
    train_dir="dataset/data",
    val_dir="dataset/val",
    batch_size=32,
    image_size=224,
    lr=1e-4
)

# Train Loop
for epoch in range(15):
    run_epoch(model_data, output_path="checkpoints/model", robustness_lambda=0.1)

🗓️ Backlog & Roadmap

Species Expansion: Cover Beeltes, Hymenoptera (Bees/Wasps), and Orthoptera.
Lifecycle Handling: Better classification for Larvae, Pupae, and Eggs.
App Improvements:
- Better screen capture handling.
- User-controlled model downloading (move away from Google Drive).
Data Cleanup: Resolve taxonomic synonyms and typos in source data.
Root Classifier: Hierarchical model to first identify Order/Family before Species.

📚 Related Resources

Blog: Fixing libc++_shared.so conflicts on Android with PyTorch/OpenCV
Issue: KB pagination support for PyTorch Lite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insect Species Identification

📱 Mobile App (Android)

🚀 How to Use

🧠 Model & Data Statistics

Dataset Sources

🖥️ Command Line Tools

1. Scraping Data (`scrape.py`)

2. Training (`train.py`)

3. Evaluation (`test.py`)

4. Asset Publishing (`publish.py`)

🛠️ Python Library Usage (`mynnlib`)

Installation

Inference Example

Training Example (Incremental)

🗓️ Backlog & Roadmap

📚 Related Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
insect-dataset/src		insect-dataset/src
insect-id-app		insect-id-app
logs		logs
models		models
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
mynnlib.py		mynnlib.py
mynnlibv2.py		mynnlibv2.py
publish.py		publish.py
requirements.txt		requirements.txt
scrape.py		scrape.py
species.json		species.json
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Insect Species Identification

📱 Mobile App (Android)

🚀 How to Use

🧠 Model & Data Statistics

Dataset Sources

🖥️ Command Line Tools

1. Scraping Data (scrape.py)

2. Training (train.py)

3. Evaluation (test.py)

4. Asset Publishing (publish.py)

🛠️ Python Library Usage (mynnlib)

Installation

Inference Example

Training Example (Incremental)

🗓️ Backlog & Roadmap

📚 Related Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Scraping Data (`scrape.py`)

2. Training (`train.py`)

3. Evaluation (`test.py`)

4. Asset Publishing (`publish.py`)

🛠️ Python Library Usage (`mynnlib`)

Packages