When subsequent images of a window or control are sent to the model in the context of a chat session, we should investigate which changes the model can detect reliably and meaningful way to communicate those changes to the user. I plan to work on this.