Add dynamic OpenAI model fetching with vision-only filtering #173

DBOYttt · 2025-11-24T17:38:43Z

Implement dynamic model discovery from OpenAI API with 1-hour caching
Filter models to only include vision-capable models (GPT-4o, GPT-4 variants)
Exclude O1/O3 models that don't support image inputs
Add OpenAIModule import to TasksModule for dependency injection
Make model selector scrollable in UI (max-height: 300px)

This fixes task execution failures when using non-vision models like O3-mini with computer-use agents that send screenshots.
This pull request enhances how available OpenAI models are managed and surfaced in the Bytebot agent. The most significant changes include dynamically fetching and caching OpenAI models that support vision (image inputs), improving fallback logic, and updating the models list to prioritize relevant options. Additionally, there are minor UI improvements to the select dropdown component.

Dynamic OpenAI Model Management:

Added a new method getAvailableModels in OpenAIService to fetch available models from the OpenAI API, filter for those supporting vision, cache them for one hour, and provide a fallback to a hardcoded list if needed. This ensures the agent always offers up-to-date and relevant model options.
Updated the hardcoded OPENAI_MODELS list to include only models that support vision (image input), with revised names, titles, and context windows.

Integration with Task Controller:

Modified the TasksController to fetch OpenAI models dynamically using the new getAvailableModels method, with a fallback to the hardcoded list if fetching fails. Models from other providers are still included based on API key presence.
Updated the TasksModule to import OpenAIModule so that OpenAIService can be injected into TasksController.

UI Improvement:

Improved the select dropdown in SelectContent by limiting its maximum height and enabling vertical scrolling, enhancing usability when many models are available.

- Implement dynamic model discovery from OpenAI API with 1-hour caching - Filter models to only include vision-capable models (GPT-4o, GPT-4 variants) - Exclude O1/O3 models that don't support image inputs - Add OpenAIModule import to TasksModule for dependency injection - Make model selector scrollable in UI (max-height: 300px) This fixes task execution failures when using non-vision models like O3-mini with computer-use agents that send screenshots. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Create cursor-overlay.ts utility with SVG-based cursor generation - Modify screendump() to capture cursor position and overlay cursor - Cursor is rendered as black arrow with white outline for visibility - Fallback to screenshot without cursor if overlay fails This enables users to see the mouse position in screenshots sent to the API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull request overview

This PR enhances the Bytebot agent system by implementing dynamic OpenAI model discovery with vision-capability filtering, addressing runtime failures when non-vision models are used with computer-use agents that send screenshots. The changes include a new model fetching service with 1-hour caching, improved UI scrollability for model selection, and cursor overlay functionality for screenshots.

Key Changes:

Dynamic OpenAI model fetching with intelligent filtering for vision-capable models (GPT-4 variants) and 1-hour result caching
Hardcoded fallback model list updated to include only vision-capable models (gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4)
Screenshot cursor overlay feature to draw mouse cursor position on captured screenshots

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
packages/bytebot-agent/src/openai/openai.service.ts	Adds `getAvailableModels()` method to fetch and cache OpenAI models from API, plus helper methods for model title formatting and context window estimation
packages/bytebot-agent/src/openai/openai.constants.ts	Updates hardcoded model list to include only vision-capable GPT-4 variants as fallback
packages/bytebot-agent/src/tasks/tasks.controller.ts	Integrates dynamic model fetching with error handling and fallback to hardcoded models
packages/bytebot-agent/src/tasks/tasks.module.ts	Imports OpenAIModule to enable OpenAIService dependency injection
packages/bytebot-ui/src/components/ui/select.tsx	Adds max-height and scrolling to select dropdown for better UX with many models
packages/bytebotd/src/nut/cursor-overlay.ts	New file implementing cursor image creation and overlay functionality
packages/bytebotd/src/nut/nut.service.ts	Updates screendump method to optionally overlay cursor position on screenshots

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T13:27:57Z

packages/bytebotd/src/nut/cursor-overlay.ts

+ * @param cursorSize The size of the cursor (default 24)
+ * @returns A Buffer containing the screenshot with cursor overlay
+ */
+export async function overlayeCursorOnScreenshot(


Typo in function name: 'overlaye' should be 'overlay'. The function should be named overlayCursorOnScreenshot instead of overlayeCursorOnScreenshot.

Suggested change

export async function overlayeCursorOnScreenshot(

export async function overlayCursorOnScreenshot(

Copilot · 2025-12-03T13:27:58Z

packages/bytebot-agent/src/openai/openai.service.ts

+      const models = modelsList.data;
+
+      // Filter for relevant chat models that support vision (images/screenshots)
+      // Exclude O1 and O3 models as they don't support image inputs


The comment states "Exclude O1 and O3 models as they don't support image inputs", but the filtering logic below only checks for models starting with 'gpt-'. O1 and O3 models (which would have IDs like 'o1-...' or 'o3-...') are already implicitly excluded by the first filter condition model.id.startsWith('gpt-'). The comment should be clarified to explain that O1/O3 models are excluded because they don't start with 'gpt-', or the comment should be removed if it's redundant.

Suggested change

// Exclude O1 and O3 models as they don't support image inputs

// Only include models whose IDs start with 'gpt-' (O1 and O3 models are excluded by this filter)

Copilot · 2025-12-03T13:27:58Z

packages/bytebot-agent/src/openai/openai.service.ts

+    if (modelId.includes('o1')) return 128000;
+    if (modelId.includes('o3')) return 200000;


The getContextWindow method includes logic for 'o1' and 'o3' models (lines 140-141), but these models are filtered out in getAvailableModels (line 76) because they don't start with 'gpt-'. Since these models are intentionally excluded from the available models list, this dead code should be removed to avoid confusion.

Suggested change

if (modelId.includes('o1')) return 128000;

if (modelId.includes('o3')) return 200000;

Copilot · 2025-12-03T13:27:58Z

packages/bytebot-agent/src/openai/openai.service.ts

+    // Convert model IDs like "gpt-4o-mini" to "GPT-4o Mini"
+    return modelId
+      .split('-')
+      .map((part) => {
+        if (part === 'gpt') return 'GPT';
+        if (part.match(/^\d/)) return part; // Keep numbers as-is
+        return part.charAt(0).toUpperCase() + part.slice(1);
+      })
+      .join('-')
+      .replace(/-/g, ' ');


[nitpick] The formatModelTitle method transforms model IDs like "gpt-4o-mini" to "GPT 4o Mini". However, this transformation logic may not handle all edge cases correctly. For example, "gpt-4o-2024-05-13" would become "GPT 4o 2024 05 13" with spaces between date components. Consider adding test cases or documentation for expected behavior with dated model IDs.

Suggested change

// Convert model IDs like "gpt-4o-mini" to "GPT-4o Mini"

return modelId

.split('-')

.map((part) => {

if (part === 'gpt') return 'GPT';

if (part.match(/^\d/)) return part; // Keep numbers as-is

return part.charAt(0).toUpperCase() + part.slice(1);

})

.join('-')

.replace(/-/g, ' ');

// Convert model IDs like "gpt-4o-mini" to "GPT 4o Mini"

// If the model ID ends with a date (e.g., "2024-05-13"), keep the date together as a single part

const parts = modelId.split('-');

// Check if the last three parts form a date (YYYY-MM-DD)

const len = parts.length;

let formattedParts: string[];

if (

len >= 3 &&

/^\d{4}$/.test(parts[len - 3]) &&

/^\d{2}$/.test(parts[len - 2]) &&

/^\d{2}$/.test(parts[len - 1])

) {

// Group the last three parts as a date

const datePart = `${parts[len - 3]}-${parts[len - 2]}-${parts[len - 1]}`;

formattedParts = parts.slice(0, len - 3).concat([datePart]);

} else {

formattedParts = parts;

}

return formattedParts

.map((part) => {

if (part === 'gpt') return 'GPT';

if (part.match(/^\d/)) return part; // Keep numbers as-is

// If part is a date (YYYY-MM-DD), keep as-is

if (/^\d{4}-\d{2}-\d{2}$/.test(part)) return part;

return part.charAt(0).toUpperCase() + part.slice(1);

})

.join(' ');

Copilot · 2025-12-03T13:27:59Z

packages/bytebot-agent/src/openai/openai.service.ts

+          (model) =>
+            model.id.startsWith('gpt-') &&
+            !model.id.startsWith('gpt-3.5') && // Exclude GPT-3.5 (no vision support)
+            !model.id.includes('instruct'), // Exclude instruct models


The filter assumes all models starting with 'gpt-' (except gpt-3.5) support vision. However, not all GPT-4 models may support vision (e.g., older base 'gpt-4' vs 'gpt-4-vision-preview' or 'gpt-4-turbo'). Consider using a more explicit allowlist of known vision-capable model patterns (e.g., contains 'gpt-4o', 'gpt-4-turbo', 'gpt-4-vision', 'gpt-4v') or checking model capabilities via the OpenAI API if available. This would prevent non-vision models from being included and causing runtime errors when screenshots are sent.

Suggested change

(model) =>

model.id.startsWith('gpt-') &&

!model.id.startsWith('gpt-3.5') && // Exclude GPT-3.5 (no vision support)

!model.id.includes('instruct'), // Exclude instruct models

(model) => {

// Only include known vision-capable models

const id = model.id;

// Add to this list as new vision-capable models are released

return (

(

id.includes('gpt-4o') ||

id.includes('gpt-4-turbo') ||

id.includes('gpt-4-vision') ||

id.includes('gpt-4v')

) &&

!id.includes('instruct')

);

},

Copilot · 2025-12-03T13:27:59Z

packages/bytebotd/src/nut/nut.service.ts

 } from '@nut-tree-fork/nut-js';
 import { spawn } from 'child_process';
 import * as path from 'path';
+import { overlayeCursorOnScreenshot } from './cursor-overlay';


Typo in imported function name: 'overlaye' should be 'overlay'. The import should be overlayCursorOnScreenshot instead of overlayeCursorOnScreenshot.

Suggested change

import { overlayeCursorOnScreenshot } from './cursor-overlay';

import { overlayCursorOnScreenshot } from './cursor-overlay';

Copilot · 2025-12-03T13:27:59Z

packages/bytebotd/src/nut/nut.service.ts

+      // Overlay cursor if position was captured
+      if (includeCursor && cursorPosition) {
+        try {
+          const withCursor = await overlayeCursorOnScreenshot(


Typo in function call: 'overlaye' should be 'overlay'. The function call should be overlayCursorOnScreenshot instead of overlayeCursorOnScreenshot.

Copilot · 2025-12-03T13:28:00Z

packages/bytebot-agent/src/openai/openai.service.ts

+            if (name.includes('gpt-4.1')) return 1;
+            if (name.includes('gpt-4')) return 2;
+            if (name.includes('gpt-5')) return 3;


The sort priority includes 'gpt-4.1' at priority 1 and 'gpt-5' at priority 3, but according to the filter logic (line 76), only models starting with 'gpt-' and excluding 'gpt-3.5' are included. Since GPT-4.1 and GPT-5 are hypothetical future models that may not exist yet, consider whether these priority cases are necessary. If they are intended for future-proofing, a comment explaining this would be helpful.

Copilot · 2025-12-03T13:28:00Z

packages/bytebotd/src/nut/cursor-overlay.ts

+  // Ensure cursor position is within screenshot bounds
+  const safeX = Math.max(0, Math.min(cursorX, width - 1));
+  const safeY = Math.max(0, Math.min(cursorY, height - 1));


The cursor position bounds check should account for the cursor size to prevent the cursor from being clipped at the edges. Currently, Math.min(cursorX, width - 1) allows the cursor to be placed at width - 1, but since the cursor has a size (default 24px), part of it will extend beyond the image boundary. Consider using Math.min(cursorX, width - cursorSize) and Math.min(cursorY, height - cursorSize) instead.

Suggested change

// Ensure cursor position is within screenshot bounds

const safeX = Math.max(0, Math.min(cursorX, width - 1));

const safeY = Math.max(0, Math.min(cursorY, height - 1));

// Ensure cursor position is within screenshot bounds (account for cursor size)

const safeX = Math.max(0, Math.min(cursorX, width - cursorSize));

const safeY = Math.max(0, Math.min(cursorY, height - cursorSize));

The Microsoft APT repository was unreliable, causing build failures. Changed to download .deb package directly from code.visualstudio.com for both amd64 and arm64 architectures.

- Add -cursor arrow -cursorpos flags to x11vnc configuration - Enable showDotCursor in react-vnc VncViewer component - Ensures cursor is visible in live desktop preview Fixes issue where cursor was not visible to the AI agent during task execution, causing it to get stuck on positioning.

- Add logic to parse model name and determine provider (openai/anthropic/google) - Handle model names stored as strings in database - Fallback to OpenAI's available models list for unknown models Fixes "No service found for model provider: undefined" error that prevented task execution.

- Add instructions about cursor visibility in screenshots - Remind agent to use computer_cursor_position when having trouble - Discourage repeatedly clicking same coordinates if not working Helps agent handle positioning issues more intelligently.

- Handle both string and object formats for task.model - Check type before attempting to parse model name - Use proper TypeScript casting through unknown This fixes the "modelName.startsWith is not a function" error that was causing immediate task failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

DBOYttt and others added 2 commits November 24, 2025 17:29

Copilot AI review requested due to automatic review settings December 3, 2025 13:22

Copilot started reviewing on behalf of DBOYttt December 3, 2025 13:23 View session

Copilot finished reviewing on behalf of DBOYttt December 3, 2025 13:26

Copilot AI reviewed Dec 3, 2025

View reviewed changes

DBOYttt and others added 5 commits December 6, 2025 14:54

Fix VS Code installation in Docker using direct .deb download

149470f

The Microsoft APT repository was unreliable, causing build failures. Changed to download .deb package directly from code.visualstudio.com for both amd64 and arm64 architectures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dynamic OpenAI model fetching with vision-only filtering #173

Add dynamic OpenAI model fetching with vision-only filtering #173

Uh oh!

DBOYttt commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	export async function overlayeCursorOnScreenshot(
	export async function overlayCursorOnScreenshot(

	// Exclude O1 and O3 models as they don't support image inputs
	// Only include models whose IDs start with 'gpt-' (O1 and O3 models are excluded by this filter)

		if (modelId.includes('o1')) return 128000;
		if (modelId.includes('o3')) return 200000;

-    // Convert model IDs like "gpt-4o-mini" to "GPT-4o Mini"
-    return modelId
-      .split('-')
-      .map((part) => {
-        if (part === 'gpt') return 'GPT';
-        if (part.match(/^\d/)) return part; // Keep numbers as-is
-        return part.charAt(0).toUpperCase() + part.slice(1);
-      })
-      .join('-')
-      .replace(/-/g, ' ');
+    // Convert model IDs like "gpt-4o-mini" to "GPT 4o Mini"
+    // If the model ID ends with a date (e.g., "2024-05-13"), keep the date together as a single part
+    const parts = modelId.split('-');
+    // Check if the last three parts form a date (YYYY-MM-DD)
+    const len = parts.length;
+    let formattedParts: string[];
+    if (
+      len >= 3 &&
+      /^\d{4}$/.test(parts[len - 3]) &&
+      /^\d{2}$/.test(parts[len - 2]) &&
+      /^\d{2}$/.test(parts[len - 1])
+    ) {
+      // Group the last three parts as a date
+      const datePart = `${parts[len - 3]}-${parts[len - 2]}-${parts[len - 1]}`;
+      formattedParts = parts.slice(0, len - 3).concat([datePart]);
+    } else {
+      formattedParts = parts;
+    }
+    return formattedParts
+      .map((part) => {
+        if (part === 'gpt') return 'GPT';
+        if (part.match(/^\d/)) return part; // Keep numbers as-is
+        // If part is a date (YYYY-MM-DD), keep as-is
+        if (/^\d{4}-\d{2}-\d{2}$/.test(part)) return part;
+        return part.charAt(0).toUpperCase() + part.slice(1);
+      })
+      .join(' ');

-          (model) =>
-            model.id.startsWith('gpt-') &&
-            !model.id.startsWith('gpt-3.5') && // Exclude GPT-3.5 (no vision support)
-            !model.id.includes('instruct'), // Exclude instruct models
+          (model) => {
+            // Only include known vision-capable models
+            const id = model.id;
+            // Add to this list as new vision-capable models are released
+            return (
+              (
+                id.includes('gpt-4o') ||
+                id.includes('gpt-4-turbo') ||
+                id.includes('gpt-4-vision') ||
+                id.includes('gpt-4v')
+              ) &&
+              !id.includes('instruct')
+            );
+          },

	import { overlayeCursorOnScreenshot } from './cursor-overlay';
	import { overlayCursorOnScreenshot } from './cursor-overlay';

Add dynamic OpenAI model fetching with vision-only filtering #173

Are you sure you want to change the base?

Add dynamic OpenAI model fetching with vision-only filtering #173

Uh oh!

Conversation

DBOYttt commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant