Skip to content
This repository was archived by the owner on Dec 23, 2025. It is now read-only.

WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.9 macOS Ubuntu

Crop GT correction and Classification

Welcome to the Crop GT Correction and Classification Repository! This repository contains code for generating crop type predictions and making corrections to ground truth crop types and their locations using satellite imagery.

Usage

  • crop_type_inference.py: This script performs inference on test or unlabeled data using trained models and satellite data.
  • gt_correction.py: This script allows for the correction of ground truth crop types and their locations using satellite imagery.
  • src/: Source code for crop_type_inference.py and gt_correction.py scripts. Modify src scripts as needed for customizations.
  • training_notebooks/: Jupyter notebooks for demonstration and exploration of the training and the inference pipeline. Use these notebooks to understand and visualize how the pipeline works step by step.

Getting started

You have two options to work with this repository:

Option 1: Clone the Repository

  1. Clone the repository to your local machine:

    git clone https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git
  2. Create a conda environment and install necessary dependencies to run the python scripts/notebooks present in this repository. If you do not have conda installed, please follow the instructions from Conda User Guide.

    • Create environment with python 3.9
    conda create --name 'env_name' python=3.9
    conda activate 'env_name'
    • Install all the dependencies
    cd crop_classification_and_gt_correction
    pip install -r requirements.txt

Option 2: Install directly via pip

  1. Create a conda environment with python 3.9:

    conda create --name 'env_name' python=3.9
    conda activate 'env_name'
  2. Install the repository directly via pip:

    pip install git+https://github.com/WadhwaniAI/AI-Enhanced-Crop-Field-Data-Curation.git

Additional steps

  1. If you don't have an Earth Engine account, create one by following the instructions provided in this guide

  2. Download the INDIA_DISTRICTS.geojson file to obtain Indian district boundaries from here: https://github.com/datta07/INDIAN-SHAPEFILES/tree/master/INDIA

  3. Open earth engine's code editor and upload the INDIA_DISTRICTS.geojson under assets.

  4. Use the provided scripts (crop_type_inference.py and gt_correction.py) to infer crop types and make corrections to ground truth crop types and their locations.

  5. (Optional) To train your own model on your own data from scratch, follow the instructions in the training_notebooks/ directory. You can also just explore them for demonstration and understanding of the training and inference pipeline

Main scripts/modules

Crop type inference

The crop_type_inference.py script is designed to perform inference on test or unlabeled data using trained models and satellite data. To use this script, please execute the following steps in the order mentioned below:

  1. Instantiate the CropTypeInferencePipeline class from the script. This class requires the path to a CSV file as input, containing a single column named 'geometry' with location geometries, cropping season, year sown, and optionally, the end fortnight for early prediction.

  2. Call the prepare_data method to prepare the data for inference by adding necessary columns and converting the geometries to a format compatible with Google Earth Engine (GEE). This method requires the path to the INDIA_DISTRICTS.geojson file on your local machine.

  3. Use the initialize_ee method to initialize Earth Engine and authenticate with Google Cloud for a given project.

  4. Execute the harvest_raster_data method to harvest satellite data for the given cropping season and year sown, generating raster files. For the kharif season, both NDVI (Normalized Difference Vegetation Index) from Sentinel-2 and SAR (VH) data from Sentinel-1 are required for inference from trained models, while the rabi season only requires NDVI. The script will automatically export the satellite data from GEE to Google Drive or a data bucket based on your preference.

  5. After the satellite data has been exported, manually download the raster data from the Google Drive directory or the data bucket and place it in a local directory. Then, run the extract_raster_data method to extract spectral data from the downloaded raster files in the local directory.

  6. Use the clean_and_filter_data method to clean the extracted spectral data and filter out out-of-distribution data (ood), which includes non-crop or other crops (crop classes the model has not been trained on). Currently, ood filtering techniques exist only for the rabi season, and this function cleans data only for the kharif season.

  7. You can generate conformal predictions(set predictions) by setting conformal=True and point predictions(single-point predictions based on maximum probability) by setting conformal=False in the crop_type_classification method.

    • Steps for rabi season
    from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline
    
    infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2022, end_fn='jan_1f')
    infer.prepare_data(ind_dist_path='path/to/india/districts/geojson')
    infer.initialize_ee('ee-project-name')
    infer.harvest_raster_data(data_type='ndvi', dir_path='raster_data', storage_type='drive')
    infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi')
    infer.clean_and_filter_data()
    infer.crop_type_classification(alpha=0.15, conformal=True|False)
    • Steps for kharif season
    from crop_classification_and_gt_correction.crop_type_inference import CropTypeInferencePipeline
    
    infer = CropTypeInferencePipeline(data_path='path/to/csv', season='rabi', year_sown=2023, end_fn='aug_1f')
    infer.prepare_data(ind_dist_path='path/to/india/districts/geojson')
    infer.initialize_ee('ee-project-name')
    infer.harvest_raster_data(data_type='ndvi', exp_dir_path='raster_data', storage_type='drive')
    infer.harvest_raster_data(data_type='vh', exp_dir_path='raster_data', storage_type='drive')
    infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='ndvi')
    infer.extract_raster_data(raster_dir_path='path/to/raster/file/directory', data_type='vh')
    infer.clean_and_filter_data()
    infer.crop_type_classification(alpha=0.15, conformal=True|False)

Note: For the kharif season, you must run the harvest_raster_data and extract_raster_data methods twice, once for NDVI and once for VH data.

GT correction

The gt_correction.py script is designed to perform ground truth correction on already labeled data using trained models and satellite data. The script follows these steps in the specified order:

  1. Create an instance of the GTCorrectionPipeline class from the script. This class requires the path to a CSV file as input, containing two columns: 'geometry' with location geometries and 'crop_type' with capitalized crop type labels for those geometries. It also requires the cropping season, year sown, optionally the end fortnight for early prediction, and the layer to specify how far to look for neighbors.

  2. Follow all the steps mentioned in the crop type inference section, ending with crop_type_classification and conformal=True, as conformal predictions are needed to judge the confidence of a prediction. The method names are the same as this class inherits from the CropTypeInferencePipeline class.

  3. Call the execute_curation function to perform the ground truth correction.

    • for both kharif and rabi season
    from crop_classification_and_gt_correction.gt_correction import GTCorrectionPipeline
    
    gt_correction = GTCorrectionPipeline(data_path='path/to/csv', season='kharif', year_sown=2023, end_fn='aug_1f', layer=1)
    # Execute steps 2-7 from crop type inference section
    ...
    ...
    ...
    gt_correction.execute_curation()

Trained Models

Crop classification models trained by Wadhwani AI can be found on Hugging Face. These can be directly used for inference on new data samples.

Training notebooks

The notebooks in training_notebooks/ directory demonstrate how to:

  1. Load, explore and preprocess raw datasets
  2. Transform and split datasets for modeling
  3. Train, tune and evaluate machine learning models

In short it explains how we arrived at the final pipeline. There is a data sample(a subset of master pickle with similar data distribution) included for demonstration purposes. You can run all the cells sequentially to understand how the models are built and evaluated on the sample data.

Contributions

We welcome contributions to this repository! Please feel free to open issues for any bugs you encounter or feature requests. For major changes, we recommend opening a pull request with proposed code changes for review.

Acknowledgments

We would like to acknowledge the support and guidance provided by experts at Mahalanobis National Crop Forecast Center (MNCFC). Their domain expertise and insights have been invaluable in building models that can accurately predict crop types using satellite imagery. We would also like to thank the open source community for developing many of the Python libraries and tools that were crucial in building the models.

For any query, please feel free to reach out to us at this email: agri-testers@wadhwaniai.org

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors