Currently, simulated clicks are limited by factors like screen resolution, window size, and window position. It would be much better if we could locate and interact with target images (via screenshots) or specific text within the game. For example, clicking or performing other actions after finding a target image or text. What do you think?