Faba bean feature extraction pipeline from WGRF-faba bean images

Overview

This work provides a workflow for running faba bean feature extraction pipeline to extract the dimensional, shape and color of faba bean seeds in the .csv file from the faba bean images. It presents a methodology for seed image segmentation and feature extraction using advanced deep learning and image processing techniques. The Segment Anything Model 2.1 (SAM2.1) has been used for precise segmentation, while OpenCV, Scikit-Image, and Matplotlib-colors are employed to analyze the dimensional, spatial, shape, and color properties of segmented seeds. The pipeline also gives the seed count in an image and annotated binary images. The pipeline has been specifically developed based on the spatial coordinates of faba bean seeds, colorcard, label, ruler and coin.

Faba bean Images

The images of faba beans were captured according to the Standard Operating Protocol (Figure 1).

Figure 1. Example of Faba bean images Vf1-1-2 (image shape=6000, 4000, 3) with faba bean seeds, colorcard, coin, label and ruler

Segmentanything 2.1 (MetaAI) Model used for image segmentation

Segment Anything Model 2 (SAM 2.1) is an advanced segmentation model designed to work seamlessly with both images and videos, treating a single image as a one-frame video. This work introduces a new task, model, and dataset aimed at improving segmentation performance. SAM 2 trained on SA-V dataset provides strong performance across a wide range of tasks. In image segmentation, SAM2 model is reported to be more accurate and 6 times faster than the Segment Anything Model (SAM).

💡 Uniqueness/Novelty

The novelty of this work lies in the utilization of SegmentAnything 2.1 for image segmentation. While researchers have traditionally relied on OpenCV and scikit-image libraries for segmentation tasks, this study leverages SegmentAnything 2.1 to generate binary masks and metadata file which have been leveraged further for feature extraction from faba bean images.

🔥 A Quick Overview

Figure 2: Flowchart for Faba bean feature extraction pipeline

📝 Details of Steps (Figure 2):

Step1: Image/Images are used as input and SAM2.1 model generates the binary masks (.png) and metadata file (.csv) for each image in the Output dir SAM
Step2: The Output dir SAM (from Step2) is used as input for this step and data analysis, feature extraction using sci-kit image library and feature engineering gives the .csv file with dimensional and shape features in another output dir FE
Step3: Both the output dir FE (from Step2) and the images (used as input in Step1) will be used as input for this step and the color labels and RGB values will be extracted using colormath library to give .csv file in the same Final output dir FE (from Step2).

📚 Final Output Files

After running the faba bean feature extraction pipeline, there will be 2 output directories-

Output dir SAM will contain subfolders (Faba-Seed-CC_Vf_N-N_N) with masks (N.png) and metadata file (metadata.csv) for each image.
Output dir FE will contain : a. The .csv file of dimensional and shape features (Fava_bean_Features_extraction.csv) b. The .csv file of dimensional, shape, RGB values, Color names and TGW(g) (FE_Color.csv) c. Seed Count (.xlsx) (Seed Count.xlsx) d. Annotated Binary image (.png) with contours around beans (Faba-Seed-CC_Vf_N-N_N_combined_mask.png)

The features that have been extracted through this pipeline are:

Dimensional features (19): Area_mm2_SAM,Length_mm_SAM, Width_mm_SAM, perimeter_mm_SAM, Area-SAM_taubin(mm2), Length-SAM_taubin(mm), Width-SAM_taubin(mm), Perimeter-SAM_taubin(mm), Area-SAM_minEnc(mm2), Length-SAM_minEnc(mm), Width-SAM_minEnc(mm), Perimeter-SAM_minEnc(mm), centroid-0, centroid-1, bbox-0, bbox-1, bbox-2, bbox-3, Area_pix_SAM, Eccentricity, equivalent_diameter_area, perimeter, solidity, area_convex, extent, Axis Major Length(pix)_SAM, Axis Minor Length(pix)_SAM, Aspect_Ratio, Roundness, Compactness, Circularity_SAM
Shape features (4): Shape, Shapefactor1, Shapefactor2, Shapefactor3, Shapefactor4
Color (2): RGB value, color_seeds
Mass prediction (1): TGW(g)
Seed count: Number of seeds in image

Smp-UNet(Trained using SAM2.1 generate Masks) Pipeline for Faba bean fetaure extraction from images

We present a deep learning–based workflow for automated segmentation, enumeration, and phenotypic feature extraction of Faba bean seeds from RGB images. The pipeline utilizes SAM2.1 to generate initial instance-level masks, which serve as supervisory labels for training SMP-UNet encoder–decoder models dedicated to seed and coin segmentation. Segmentation of the reference coin enables pixel-to-millimeter scale calibration, while seed masks are post-processed for automated counting and extraction of morphological traits.

A Quick Overview

Figure 3: Deep learning–based workflow for automated faba bean seed segmentation and phenotypic feature extraction. (A.) Training workflow illustrating the generation of binary segmentation masks using SAM2.1 for seeds and reference coins, followed by dataset preparation, data augmentation, and supervised training of two independent SMP-UNet models with MIT-B0 encoders for seed and coin segmentation. (B.) Inference and analysis workflow showing application of pretrained models to unseen images, coin-based pixel-to-millimeter scale calibration, seed segmentation and post-processing, connected-component analysis for seed separation and counting, and extraction of morphological features including area, length, width, and perimeter.

Outputs

Binary masks (PNG)
Seed counts (CSV)
Per-seed morphological measurements (CSV)

Methods Summary

The training process begins with RGB images containing Faba bean seeds alongside a reference coin. SAM2.1 generates preliminary instance-level masks for both seeds and coins, which are subsequently filtered and merged to produce binary ground truth masks. These masks are used to train two independent SMP-UNet models with MIT-B0 encoders, enabling precise segmentation of seeds and coins.

Workflow

After inference, pretrained models are applied to previously unseen images to generate binary segmentation masks. Coin segmentation outputs provide robust pixel-to-millimeter calibration, ensuring quantitative trait measurements are physically meaningful. Seed masks are further post-processed to isolate individual seeds, followed by automated seed counting and extraction of morphological features, including area, length, width, and perimeter.

This pipeline produces fast, scalable, reproducible image- and object-level outputs, facilitating downstream phenotypic analyses and enabling high-throughput assessment of Faba bean seed traits.

🙏 Acknowledgements

🤝 We extend our sincere appreciation to our mentors, collaborators, and colleagues at Agriculture and Agri-Food Canada (AAFC) for their continued guidance, support, and valuable contributions throughout this project:

• Rodrigo Ortega Polo – Project Lead and Biology Study Leader (Bioinformatics), Lethbridge Research and Development Centre, AAFC

• Nathaniel Lim – Acting Project Manager, AAFC

• Xiaohui Yang – Project Co-Lead, Lethbridge Research and Development Centre, AAFC

• Nicholas Larkan – Research Scientist (Pulse Crop Genomics), Saskatoon Research and Development Centre, AAFC

• Etienne Low-Decarie – Manager, Biological Informatics Centre of Excellence (BICoE), AAFC

• Jackson Eyres – Bioinformatics Team Lead (BICoE) and Supervisor, AAFC

• Mathew Richards – Bioinformatics Programmer, Lethbridge Research and Development Centre, AAFC

• Harpreet Kaur Bargota – Bioinformatics Programmer Analyst/Biologist, Lethbridge Research and Development Centre, AAFC

• Hao Nan Tobey Wang – Research Biologist, Lethbridge Research and Development Centre, AAFC

• Parisa Daeijvad – Ph.D. Research Student, Lethbridge Research and Development Centre, AAFC

🌾We also gratefully acknowledge the Western Grains Research Foundation (WGRF), Canada, for their funding and support, which made this work possible.

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
Data		Data
Docker		Docker
Output		Output
Workflow Smp-UNet		Workflow Smp-UNet
helper_scripts		helper_scripts
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CITATION.cff		CITATION.cff
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README-FR.md		README-FR.md
README.md		README.md
SECURITY.md		SECURITY.md
Step1_SAM2.1.py		Step1_SAM2.1.py
Step2_SAM2.1.py		Step2_SAM2.1.py
Step3_color.py		Step3_color.py
environment.yml		environment.yml
instructions.md		instructions.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Faba bean feature extraction pipeline from WGRF-faba bean images

Overview

Faba bean Images

Segmentanything 2.1 (MetaAI) Model used for image segmentation

💡 Uniqueness/Novelty

🔥 A Quick Overview

📝 Details of Steps (Figure 2):

📚 Final Output Files

Smp-UNet(Trained using SAM2.1 generate Masks) Pipeline for Faba bean fetaure extraction from images

A Quick Overview

Outputs

Methods Summary

Workflow

🙏 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Faba bean feature extraction pipeline from WGRF-faba bean images

Overview

Faba bean Images

Segmentanything 2.1 (MetaAI) Model used for image segmentation

💡 Uniqueness/Novelty

🔥 A Quick Overview

📝 Details of Steps (Figure 2):

📚 Final Output Files

Smp-UNet(Trained using SAM2.1 generate Masks) Pipeline for Faba bean fetaure extraction from images

A Quick Overview

Outputs

Methods Summary

Workflow

🙏 Acknowledgements

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages