Do we want to clean up and/or partition the training imagery set at all?

A question occurs to me . . . as of now, we have lumped ALL our training imagery together into a single dataset. From the perspective of trying to increase UIE efficacy in terms of color, etc., a couple of questions occurred to me. Specifically, @keenanjohnson, do you think there'd be any value in removing extraneous instances from our imagery, and/or partionining our data based on broad substrate condition? 

For example, a quick review of the images revealed instances of photos that may be a bit too bright, or where the ROV is too close to the seafloor, or where the ROV is too far up above the seafloor (approximate examples of these are below . . . the "too bright" images aren't really that bad, but the "too close" or "too far from the seafloor" examples could potentially be removed).

<img width="33%" alt="Image" src="https://github.com/user-attachments/assets/a7f67a4b-2a88-4711-90e5-224b815f0059" />
<img width="33%" alt="Image" src="https://github.com/user-attachments/assets/64a1b37f-9415-46ee-845e-da3b1e15f379" />
<img width="33%" alt="Image" src="https://github.com/user-attachments/assets/7893d167-44a2-4e56-89ba-f515372b667f" />


 What do you think? Could the UIE trained model potentially benefit from removing these extraneous instances? 

Second question - could we benefit at all from partitioning the dataset? For example, about 3/5th of our imagery are from relatively flat areas, and then ~ 2/5 are from the Elliott Bay Marina breakwater, where the breakwater riprap causes extensive depth variation to appear in these images (e.g., instances where the ROV is high up above the seafloor). Would it be worth testing a "flat, relatively soft sediment model", and a "breakwater riprap with lots of depth variation" model? The thinking here is that perhaps a model trained specifically on, e.g., breakwater riprap structure, may provide more consistent results when applied to unseen imagery. 

<img width="306" height="263" alt="Image" src="https://github.com/user-attachments/assets/4d402598-965e-4f24-bfcf-2bf49bbbd957" />

All of these points broadly fall under the category of "things we can start to play with now that we can train our own models," per https://github.com/Seattle-Aquarium/CCR_image_processing/issues/13!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we want to clean up and/or partition the training imagery set at all? #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Do we want to clean up and/or partition the training imagery set at all? #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions