Crop GT correction and Classification

Welcome to the Crop GT Correction and Classification Repository! This repository contains code for generating crop type predictions and making corrections to ground truth crop types and their locations using satellite imagery.

Usage

crop_type_inference.py: This script performs inference on test or unlabeled data using trained models and satellite data.
gt_correction.py: This script allows for the correction of ground truth crop types and their locations using satellite imagery.
src/: Source code for crop_type_inference.py and gt_correction.py scripts. Modify src scripts as needed for customizations.
training_notebooks/: Jupyter notebooks for demonstration and exploration of the training and the inference pipeline. Use these notebooks to understand and visualize how the pipeline works step by step.

Getting started

You have two options to work with this repository:

Option 1: Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git

Create a conda environment and install necessary dependencies to run the python scripts/notebooks present in this repository. If you do not have conda installed, please follow the instructions from Conda User Guide.
- Create environment with python 3.9
```
conda create --name 'env_name' python=3.9
conda activate 'env_name'
```
- Install all the dependencies
```
cd crop_classification_and_gt_correction
pip install -r requirements.txt
```

Option 2: Install directly via pip

Create a conda environment with python 3.9:

conda create --name 'env_name' python=3.9
conda activate 'env_name'

Install the repository directly via pip:

pip install git+https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git

Additional steps

If you don't have an Earth Engine account, create one by following the instructions provided in this guide
Download the INDIA_DISTRICTS.geojson file to obtain Indian district boundaries from here: https://github.com/datta07/INDIAN-SHAPEFILES/tree/master/INDIA
Open earth engine's code editor and upload the INDIA_DISTRICTS.geojson under assets.
Use the provided scripts (crop_type_inference.py and gt_correction.py) to infer crop types and make corrections to ground truth crop types and their locations.
(Optional) To train your own model on your own data from scratch, follow the instructions in the training_notebooks/ directory. You can also just explore them for demonstration and understanding of the training and inference pipeline

Main scripts/modules

Crop type inference

The crop_type_inference.py script is designed to perform inference on test or unlabeled data using trained models and satellite data. To use this script, please execute the following steps in the order mentioned below:

Instantiate the CropTypeInferencePipeline class from the script. This class requires the path to a CSV file as input, containing a single column named 'geometry' with location geometries, cropping season, year sown, and optionally, the end fortnight for early prediction.
Call the prepare_data method to prepare the data for inference by adding necessary columns and converting the geometries to a format compatible with Google Earth Engine (GEE). This method requires the path to the INDIA_DISTRICTS.geojson file on your local machine.
Use the initialize_ee method to initialize Earth Engine and authenticate with Google Cloud for a given project.
Execute the harvest_raster_data method to harvest satellite data for the given cropping season and year sown, generating raster files. For the kharif season, both NDVI (Normalized Difference Vegetation Index) from Sentinel-2 and SAR (VH) data from Sentinel-1 are required for inference from trained models, while the rabi season only requires NDVI. The script will automatically export the satellite data from GEE to Google Drive or a data bucket based on your preference.
After the satellite data has been exported, manually download the raster data from the Google Drive directory or the data bucket and place it in a local directory. Then, run the extract_raster_data method to extract spectral data from the downloaded raster files in the local directory.
Use the clean_and_filter_data method to clean the extracted spectral data and filter out out-of-distribution data (ood), which includes non-crop or other crops (crop classes the model has not been trained on). Currently, ood filtering techniques exist only for the rabi season, and this function cleans data only for the kharif season.

You can generate conformal predictions(set predictions) by setting conformal=True and point predictions(single-point predictions based on maximum probability) by setting conformal=False in the crop_type_classification method.

Steps for rabi season

from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline

infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2022, end_fn='jan_1f')
infer.prepare_data(ind_dist_path='path/to/india/districts/geojson')
infer.initialize_ee('ee-project-name')
infer.harvest_raster_data(data_type='ndvi', dir_path='raster_data', storage_type='drive')
infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi')
infer.clean_and_filter_data()
infer.crop_type_classification(alpha=0.15, conformal=True|False)

Steps for kharif season

from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline

infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2023, end_fn='aug_1f')
infer.prepare_data(ind_dist_path='path/to/india/districts/geojson')
infer.initialize_ee('ee-project-name')
infer.harvest_raster_data(data_type='ndvi', exp_dir_path='raster_data', storage_type='drive')
infer.harvest_raster_data(data_type='vh', exp_dir_path='raster_data', storage_type='drive')
infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi')
infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='vh')
infer.clean_and_filter_data()
infer.crop_type_classification(alpha=0.15, conformal=True|False)

Note: For the kharif season, you must run the harvest_raster_data and extract_raster_data methods twice, once for NDVI and once for VH data.

GT correction

The gt_correction.py script is designed to perform ground truth correction on already labeled data using trained models and satellite data. The script follows these steps in the specified order:

Create an instance of the GTCorrectionPipeline class from the script. This class requires the path to a CSV file as input, containing two columns: 'geometry' with location geometries and 'crop_type' with capitalized crop type labels for those geometries. It also requires the cropping season, year sown, optionally the end fortnight for early prediction, and the layer to specify how far to look for neighbors.
Follow all the steps mentioned in the crop type inference section, ending with crop_type_classification and conformal=True, as conformal predictions are needed to judge the confidence of a prediction. The method names are the same as this class inherits from the CropTypeInferencePipeline class.

Call the execute_curation function to perform the ground truth correction.

for both kharif and rabi season

from crop_classification_and_gt_correction.gt_correction import GTCorrectionPipeline

gt_correction = GTCorrectionPipeline(data_path='path/to/csv', season='kharif', year_sown=2023, end_fn='aug_1f', layer=1)
# Execute steps 2-7 from crop type inference section
...
...
...
gt_correction.execute_curation()

Trained Models

Crop classification models trained by Wadhwani AI can be found on Hugging Face. These can be directly used for inference on new data samples.

Training notebooks

The notebooks in training_notebooks/ directory demonstrate how to:

Load, explore and preprocess raw datasets
Transform and split datasets for modeling
Train, tune and evaluate machine learning models

In short it explains how we arrived at the final pipeline. There is a data sample(a subset of master pickle with similar data distribution) included for demonstration purposes. You can run all the cells sequentially to understand how the models are built and evaluated on the sample data.

Contributions

We welcome contributions to this repository! Please feel free to open issues for any bugs you encounter or feature requests. For major changes, we recommend opening a pull request with proposed code changes for review.

Acknowledgments

We would like to acknowledge the support and guidance provided by experts at Mahalanobis National Crop Forecast Center (MNCFC). Their domain expertise and insights have been invaluable in building models that can accurately predict crop types using satellite imagery. We would also like to thank the open source community for developing many of the Python libraries and tools that were crucial in building the models.

For any query, please feel free to reach out to us at this email: agri-testers@wadhwaniai.org

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
crop_classification_and_gt_correction		crop_classification_and_gt_correction
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crop GT correction and Classification

Usage

Getting started

Option 1: Clone the Repository

Option 2: Install directly via pip

Additional steps

Main scripts/modules

Crop type inference

GT correction

Trained Models

Training notebooks

Contributions

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crop GT correction and Classification

Usage

Getting started

Option 1: Clone the Repository

Option 2: Install directly via pip

Additional steps

Main scripts/modules

Crop type inference

GT correction

Trained Models

Training notebooks

Contributions

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages