Welcome to the Crop GT Correction and Classification Repository! This repository contains code for generating crop type predictions and making corrections to ground truth crop types and their locations using satellite imagery.
crop_type_inference.py: This script performs inference on test or unlabeled data using trained models and satellite data.gt_correction.py: This script allows for the correction of ground truth crop types and their locations using satellite imagery.src/: Source code forcrop_type_inference.pyandgt_correction.pyscripts. Modify src scripts as needed for customizations.training_notebooks/: Jupyter notebooks for demonstration and exploration of the training and the inference pipeline. Use these notebooks to understand and visualize how the pipeline works step by step.
You have two options to work with this repository:
-
Clone the repository to your local machine:
git clone https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git
-
Create a conda environment and install necessary dependencies to run the python scripts/notebooks present in this repository. If you do not have conda installed, please follow the instructions from Conda User Guide.
- Create environment with python 3.9
conda create --name 'env_name' python=3.9 conda activate 'env_name'
- Install all the dependencies
cd crop_classification_and_gt_correction pip install -r requirements.txt
-
Create a conda environment with python 3.9:
conda create --name 'env_name' python=3.9 conda activate 'env_name'
-
Install the repository directly via pip:
pip install git+https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git
-
If you don't have an Earth Engine account, create one by following the instructions provided in this guide
-
Download the
INDIA_DISTRICTS.geojsonfile to obtain Indian district boundaries from here: https://github.com/datta07/INDIAN-SHAPEFILES/tree/master/INDIA -
Open earth engine's code editor and upload the
INDIA_DISTRICTS.geojsonunder assets. -
Use the provided scripts (
crop_type_inference.pyandgt_correction.py) to infer crop types and make corrections to ground truth crop types and their locations. -
(Optional) To train your own model on your own data from scratch, follow the instructions in the
training_notebooks/directory. You can also just explore them for demonstration and understanding of the training and inference pipeline
The crop_type_inference.py script is designed to perform inference on test or unlabeled data using trained models and satellite data. To use this script, please execute the following steps in the order mentioned below:
-
Instantiate the
CropTypeInferencePipelineclass from the script. This class requires the path to a CSV file as input, containing a single column named 'geometry' with location geometries, cropping season, year sown, and optionally, the end fortnight for early prediction. -
Call the
prepare_datamethod to prepare the data for inference by adding necessary columns and converting the geometries to a format compatible with Google Earth Engine (GEE). This method requires the path to theINDIA_DISTRICTS.geojsonfile on your local machine. -
Use the
initialize_eemethod to initialize Earth Engine and authenticate with Google Cloud for a given project. -
Execute the
harvest_raster_datamethod to harvest satellite data for the given cropping season and year sown, generating raster files. For the kharif season, both NDVI (Normalized Difference Vegetation Index) from Sentinel-2 and SAR (VH) data from Sentinel-1 are required for inference from trained models, while the rabi season only requires NDVI. The script will automatically export the satellite data from GEE to Google Drive or a data bucket based on your preference. -
After the satellite data has been exported, manually download the raster data from the Google Drive directory or the data bucket and place it in a local directory. Then, run the
extract_raster_datamethod to extract spectral data from the downloaded raster files in the local directory. -
Use the
clean_and_filter_datamethod to clean the extracted spectral data and filter out out-of-distribution data (ood), which includes non-crop or other crops (crop classes the model has not been trained on). Currently, ood filtering techniques exist only for the rabi season, and this function cleans data only for the kharif season. -
You can generate conformal predictions(set predictions) by setting conformal=True and point predictions(single-point predictions based on maximum probability) by setting conformal=False in the
crop_type_classificationmethod.- Steps for rabi season
from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2022, end_fn='jan_1f') infer.prepare_data(ind_dist_path='path/to/india/districts/geojson') infer.initialize_ee('ee-project-name') infer.harvest_raster_data(data_type='ndvi', dir_path='raster_data', storage_type='drive') infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi') infer.clean_and_filter_data() infer.crop_type_classification(alpha=0.15, conformal=True|False)
- Steps for kharif season
from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2023, end_fn='aug_1f') infer.prepare_data(ind_dist_path='path/to/india/districts/geojson') infer.initialize_ee('ee-project-name') infer.harvest_raster_data(data_type='ndvi', exp_dir_path='raster_data', storage_type='drive') infer.harvest_raster_data(data_type='vh', exp_dir_path='raster_data', storage_type='drive') infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi') infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='vh') infer.clean_and_filter_data() infer.crop_type_classification(alpha=0.15, conformal=True|False)
Note: For the kharif season, you must run the
harvest_raster_dataandextract_raster_datamethods twice, once for NDVI and once for VH data.
The gt_correction.py script is designed to perform ground truth correction on already labeled data using trained models and satellite data. The script follows these steps in the specified order:
-
Create an instance of the
GTCorrectionPipelineclass from the script. This class requires the path to a CSV file as input, containing two columns: 'geometry' with location geometries and 'crop_type' with capitalized crop type labels for those geometries. It also requires the cropping season, year sown, optionally the end fortnight for early prediction, and the layer to specify how far to look for neighbors. -
Follow all the steps mentioned in the crop type inference section, ending with
crop_type_classificationand conformal=True, as conformal predictions are needed to judge the confidence of a prediction. The method names are the same as this class inherits from theCropTypeInferencePipelineclass. -
Call the
execute_curationfunction to perform the ground truth correction.- for both kharif and rabi season
from crop_classification_and_gt_correction.gt_correction import GTCorrectionPipeline gt_correction = GTCorrectionPipeline(data_path='path/to/csv', season='kharif', year_sown=2023, end_fn='aug_1f', layer=1) # Execute steps 2-7 from crop type inference section ... ... ... gt_correction.execute_curation()
Crop classification models trained by Wadhwani AI can be found on Hugging Face. These can be directly used for inference on new data samples.
The notebooks in training_notebooks/ directory demonstrate how to:
- Load, explore and preprocess raw datasets
- Transform and split datasets for modeling
- Train, tune and evaluate machine learning models
In short it explains how we arrived at the final pipeline. There is a data sample(a subset of master pickle with similar data distribution) included for demonstration purposes. You can run all the cells sequentially to understand how the models are built and evaluated on the sample data.
We welcome contributions to this repository! Please feel free to open issues for any bugs you encounter or feature requests. For major changes, we recommend opening a pull request with proposed code changes for review.
We would like to acknowledge the support and guidance provided by experts at Mahalanobis National Crop Forecast Center (MNCFC). Their domain expertise and insights have been invaluable in building models that can accurately predict crop types using satellite imagery. We would also like to thank the open source community for developing many of the Python libraries and tools that were crucial in building the models.
For any query, please feel free to reach out to us at this email: agri-testers@wadhwaniai.org