As a student who has tried Perplexity Comet (absolute BUNS), I regularly use BrowserOS for extremely academic purposes. On a conceptual level, it works amazing. It gets the job done, period. However, the LLM (in my case, Gemini 3 Flash) still decides to crashout early on a regular basis. On longer running tasks of over 6-7 minutes in duration, the model will sometimes terminate and start talking nonsense to itself (see below). It sometimes even starts talking in completely random languages not even visually present on the webpage.
Your last action was browser_click_element(nodeId:12,tabId:1484001459) which successfully clicked the "Next" button. You're on Q2/10. Complete the assignment as requested.
Overall, performance on longer running tasks is poor. As a result, I still have to keep BrowserOS towards the top of my mental stack; not a "set once and forget" mentality, degrading UX. Implementation of some sort of context compression/summarization would be effective; however, "summarizing" events that happened on a webpage without losing critical information vital to interpreting users' relative instructions might be difficult. Even using Gemini 3 Flash's 1M context window is problematic.
Currently, I simply repaste my original user prompt in over and over again. A temporary solution might be to improve the UX of prompting the LLM.
For instance,
- providing an effective way of setting system prompts
- switching between multiple unique system prompt personas, or
- adding a way to queue future agent prompts to be followed upon completion/failure
would all be ways to allow prompts to be better utilized and improve UX. Expanding on #348, individual prompts themselves could be saved, collected, and be run as "scripts", a step below workflow, directly in the browser.
Key takeaways:
- try to better manage context
- make it easier to prompt the agent
- overall, reduce necessary amount of human interaction/thought
BrowserOS is absolute peak bc it's not locked down to any specific brand of models, if you could make longer running tasks execute more reliably that would be amazing!
As a student who has tried Perplexity Comet (absolute BUNS), I regularly use BrowserOS for extremely academic purposes. On a conceptual level, it works amazing. It gets the job done, period. However, the LLM (in my case, Gemini 3 Flash) still decides to crashout early on a regular basis. On longer running tasks of over 6-7 minutes in duration, the model will sometimes terminate and start talking nonsense to itself (see below). It sometimes even starts talking in completely random languages not even visually present on the webpage.
Overall, performance on longer running tasks is poor. As a result, I still have to keep BrowserOS towards the top of my mental stack; not a "set once and forget" mentality, degrading UX. Implementation of some sort of context compression/summarization would be effective; however, "summarizing" events that happened on a webpage without losing critical information vital to interpreting users' relative instructions might be difficult. Even using Gemini 3 Flash's 1M context window is problematic.
Currently, I simply repaste my original user prompt in over and over again. A temporary solution might be to improve the UX of prompting the LLM.
For instance,
would all be ways to allow prompts to be better utilized and improve UX. Expanding on #348, individual prompts themselves could be saved, collected, and be run as "scripts", a step below workflow, directly in the browser.
Key takeaways:
BrowserOS is absolute peak bc it's not locked down to any specific brand of models, if you could make longer running tasks execute more reliably that would be amazing!