The AI Browser Automation Framework
Read the Docs
If you're looking for other languages, you can find them here
Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.
Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.
-
Choose when to write code vs. natural language: use AI when you want to navigate unfamiliar pages, and use code when you know exactly what you want to do.
-
Go from AI-driven to repeatable workflows: Stagehand lets you preview AI actions before running them, and also helps you easily cache repeatable actions to save time and tokens.
-
Write once, run forever: Stagehand's auto-caching combined with self-healing remembers previous actions, runs without LLM inference, and knows when to involve AI whenever the website changes and your automation breaks.
implementation("com.browserbase.api:stagehand-kotlin:3.1.1")<dependency>
<groupId>com.browserbase.api</groupId>
<artifactId>stagehand-kotlin</artifactId>
<version>3.1.1</version>
</dependency>This library requires Java 8 or later.
A complete working example is available at examples/basic.kt.
To run it, first export the required environment variables, then use Gradle:
export BROWSERBASE_API_KEY="your-bb-api-key"
export BROWSERBASE_PROJECT_ID="your-bb-project-uuid"
export MODEL_API_KEY="sk-proj-your-llm-api-key"
./gradlew runThis example demonstrates the complete workflow of using Stagehand:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
import com.browserbase.api.models.sessions.SessionStartParams
import com.browserbase.api.models.sessions.SessionNavigateParams
import com.browserbase.api.models.sessions.SessionObserveParams
import com.browserbase.api.models.sessions.SessionActParams
import com.browserbase.api.models.sessions.SessionExtractParams
import com.browserbase.api.models.sessions.SessionExecuteParams
import com.browserbase.api.models.sessions.SessionEndParams
import com.browserbase.api.models.sessions.ActionParam
fun main() {
// Create a new Stagehand client using environment variables
// Configures using BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID, and MODEL_API_KEY
val client: StagehandClient = StagehandOkHttpClient.fromEnv()
// Start a new browser session
val startParams = SessionStartParams.builder()
.modelName("openai/gpt-4o")
.build()
val startResponse = client.sessions().start(startParams)
println("Session started: ${startResponse.data.sessionId}")
val sessionId = startResponse.data.sessionId
// Navigate to Hacker News
val navigateParams = SessionNavigateParams.builder()
.id(sessionId)
.url("https://news.ycombinator.com")
.build()
client.sessions().navigate(navigateParams)
println("Navigated to Hacker News")
// Use Observe to find possible actions on the page
val observeParams = SessionObserveParams.builder()
.id(sessionId)
.instruction("find the link to view comments for the top post")
.build()
val observeResponse = client.sessions().observe(observeParams)
val actions = observeResponse.data.result
println("Found ${actions.size} possible actions")
if (actions.isEmpty()) {
println("No actions found")
return
}
// Take the first action returned by Observe
val action = actions[0]
println("Acting on: ${action.description}")
// Pass the structured action to Act
val actParams = SessionActParams.builder()
.id(sessionId)
.input(
SessionActParams.Input.ofAction(
ActionParam.builder()
.description(action.description)
.selector(action.selector)
.method(action.method ?: "click")
.arguments(action.arguments)
.build()
)
)
.build()
val actResponse = client.sessions().act(actParams)
println("Act completed: ${actResponse.data.result.message}")
// Extract data from the page
// We're now on the comments page, so extract the top comment text
val extractParams = SessionExtractParams.builder()
.id(sessionId)
.instruction("extract the text of the top comment on this page")
.schema(
mapOf(
"type" to "object",
"properties" to mapOf(
"commentText" to mapOf(
"type" to "string",
"description" to "The text content of the top comment"
),
"author" to mapOf(
"type" to "string",
"description" to "The username of the comment author"
)
),
"required" to listOf("commentText")
)
)
.build()
val extractResponse = client.sessions().extract(extractParams)
println("Extracted data: ${extractResponse.data.result}")
// Get the author from the extracted data
@Suppress("UNCHECKED_CAST")
val extractedData = extractResponse.data.result as Map<String, Any>
val author = extractedData["author"] as String
println("Looking up profile for author: $author")
// Use the Agent to find the author's profile
// Execute runs an autonomous agent that can navigate and interact with pages
val executeParams = SessionExecuteParams.builder()
.id(sessionId)
.executeOptions(
SessionExecuteParams.ExecuteOptions.builder()
.instruction(
"Find any personal website, GitHub, LinkedIn, or other best profile URL for the Hacker News user '$author'. " +
"Click on their username to go to their profile page and look for any links they have shared. " +
"Use Google Search with their username or other details from their profile if you don't find any direct links."
)
.maxSteps(15.0)
.build()
)
.agentConfig(
SessionExecuteParams.AgentConfig.builder()
.model(
SessionExecuteParams.ModelConfig.ofModelConfigObject(
SessionExecuteParams.ModelConfig.ModelConfigObject.builder()
.modelName("openai/gpt-4.1-mini")
.apiKey(System.getenv("MODEL_API_KEY"))
.build()
)
)
.cua(false)
.build()
)
.build()
val executeResponse = client.sessions().execute(executeParams)
println("Agent completed: ${executeResponse.data.result.message}")
println("Agent success: ${executeResponse.data.result.success}")
println("Agent actions taken: ${executeResponse.data.result.actions?.size ?: 0}")
// End the session to cleanup browser resources
val endParams = SessionEndParams.builder()
.id(sessionId)
.build()
client.sessions().end(endParams)
println("Session ended")
}Configure the client using system properties or environment variables:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
// Configures using the `stagehand.browserbaseApiKey`, `stagehand.browserbaseProjectId`, `stagehand.modelApiKey` and `stagehand.baseUrl` system properties
// Or configures using the `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `MODEL_API_KEY` and `STAGEHAND_BASE_URL` environment variables
val client: StagehandClient = StagehandOkHttpClient.fromEnv()Or manually:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
val client: StagehandClient = StagehandOkHttpClient.builder()
.browserbaseApiKey("My Browserbase API Key")
.browserbaseProjectId("My Browserbase Project ID")
.modelApiKey("My Model API Key")
.build()Or using a combination of the two approaches:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
val client: StagehandClient = StagehandOkHttpClient.builder()
// Configures using the `stagehand.browserbaseApiKey`, `stagehand.browserbaseProjectId`, `stagehand.modelApiKey` and `stagehand.baseUrl` system properties
// Or configures using the `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `MODEL_API_KEY` and `STAGEHAND_BASE_URL` environment variables
.fromEnv()
.browserbaseApiKey("My Browserbase API Key")
.build()See this table for the available options:
| Setter | System property | Environment variable | Required | Default value |
|---|---|---|---|---|
browserbaseApiKey |
stagehand.browserbaseApiKey |
BROWSERBASE_API_KEY |
true | - |
browserbaseProjectId |
stagehand.browserbaseProjectId |
BROWSERBASE_PROJECT_ID |
true | - |
modelApiKey |
stagehand.modelApiKey |
MODEL_API_KEY |
true | - |
baseUrl |
stagehand.baseUrl |
STAGEHAND_BASE_URL |
true | "https://api.stagehand.browserbase.com" |
System properties take precedence over environment variables.
Tip
Don't create more than one client in the same application. Each client has a connection pool and thread pools, which are more efficient to share between requests.
To temporarily use a modified client configuration, while reusing the same connection and thread pools, call withOptions() on any client or service:
import com.browserbase.api.client.StagehandClient
val clientWithOptions: StagehandClient = client.withOptions {
it.baseUrl("https://example.com")
it.maxRetries(42)
}The withOptions() method does not affect the original client or service.
To send a request to the Stagehand API, build an instance of some Params class and pass it to the corresponding client method. When the response is received, it will be deserialized into an instance of a Kotlin class.
For example, client.sessions().act(...) should be called with an instance of SessionActParams, and it will return an instance of SessionActResponse.
Each class in the SDK has an associated builder or factory method for constructing it.
Each class is immutable once constructed. If the class has an associated builder, then it has a toBuilder() method, which can be used to convert it back to a builder for making a modified copy.
Because each class is immutable, builder modification will never affect already built class instances.
The default client is synchronous. To switch to asynchronous execution, call the async() method:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
import com.browserbase.api.models.sessions.SessionActParams
import com.browserbase.api.models.sessions.SessionActResponse
// Configures using the `stagehand.browserbaseApiKey`, `stagehand.browserbaseProjectId`, `stagehand.modelApiKey` and `stagehand.baseUrl` system properties
// Or configures using the `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `MODEL_API_KEY` and `STAGEHAND_BASE_URL` environment variables
val client: StagehandClient = StagehandOkHttpClient.fromEnv()
val params: SessionActParams = SessionActParams.builder()
.id("00000000-your-session-id-000000000000")
.input("click the first link on the page")
.build()
val response: SessionActResponse = client.async().sessions().act(params)Or create an asynchronous client from the beginning:
import com.browserbase.api.client.StagehandClientAsync
import com.browserbase.api.client.okhttp.StagehandOkHttpClientAsync
import com.browserbase.api.models.sessions.SessionActParams
import com.browserbase.api.models.sessions.SessionActResponse
// Configures using the `stagehand.browserbaseApiKey`, `stagehand.browserbaseProjectId`, `stagehand.modelApiKey` and `stagehand.baseUrl` system properties
// Or configures using the `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `MODEL_API_KEY` and `STAGEHAND_BASE_URL` environment variables
val client: StagehandClientAsync = StagehandOkHttpClientAsync.fromEnv()
val params: SessionActParams = SessionActParams.builder()
.id("00000000-your-session-id-000000000000")
.input("click the first link on the page")
.build()
val response: SessionActResponse = client.sessions().act(params)The asynchronous client supports the same options as the synchronous one, except most methods are suspending.
The SDK defines methods that return response "chunk" streams, where each chunk can be individually processed as soon as it arrives instead of waiting on the full response. Streaming methods generally correspond to SSE or JSONL responses.
Some of these methods may have streaming and non-streaming variants, but a streaming method will always have a Streaming suffix in its name, even if it doesn't have a non-streaming variant.
These streaming methods return StreamResponse for synchronous clients:
client.sessions().actStreaming(params).use { response ->
response.asSequence().forEach { println(it) }
println("No more chunks!")
}The SDK defines methods that deserialize responses into instances of Kotlin classes. However, these methods don't provide access to the response headers, status code, or the raw response body.
To access this data, prefix any HTTP method call on a client or service with withRawResponse():
import com.browserbase.api.core.http.Headers
import com.browserbase.api.core.http.HttpResponseFor
import com.browserbase.api.models.sessions.SessionStartParams
import com.browserbase.api.models.sessions.SessionStartResponse
val params: SessionStartParams = SessionStartParams.builder()
.modelName("openai/gpt-5-nano")
.build()
val response: HttpResponseFor<SessionStartResponse> = client.sessions().withRawResponse().start(params)
val statusCode: Int = response.statusCode()
val headers: Headers = response.headers()You can still deserialize the response into an instance of a Kotlin class if needed:
import com.browserbase.api.models.sessions.SessionStartResponse
val parsedResponse: SessionStartResponse = response.parse()The SDK throws custom unchecked exception types:
-
StagehandServiceException: Base class for HTTP errors. See this table for which exception subclass is thrown for each HTTP status code:Status Exception 400 BadRequestException401 UnauthorizedException403 PermissionDeniedException404 NotFoundException422 UnprocessableEntityException429 RateLimitException5xx InternalServerExceptionothers UnexpectedStatusCodeExceptionSseExceptionis thrown for errors encountered during SSE streaming after a successful initial HTTP response. -
StagehandIoException: I/O networking errors. -
StagehandRetryableException: Generic error indicating a failure that could be retried by the client. -
StagehandInvalidDataException: Failure to interpret successfully parsed data. For example, when accessing a property that's supposed to be required, but the API unexpectedly omitted it from the response. -
StagehandException: Base class for all exceptions. Most errors will result in one of the previously mentioned ones, but completely generic errors may be thrown using the base class.
The SDK uses the standard OkHttp logging interceptor.
Enable logging by setting the STAGEHAND_LOG environment variable to info:
export STAGEHAND_LOG=infoOr to debug for more verbose logging:
export STAGEHAND_LOG=debugAlthough the SDK uses reflection, it is still usable with ProGuard and R8 because stagehand-kotlin-core is published with a configuration file containing keep rules.
ProGuard and R8 should automatically detect and use the published rules, but you can also manually copy the keep rules if necessary.
The SDK depends on Jackson for JSON serialization/deserialization. It is compatible with version 2.13.4 or higher, but depends on version 2.18.2 by default.
The SDK throws an exception if it detects an incompatible Jackson version at runtime (e.g. if the default version was overridden in your Maven or Gradle config).
If the SDK threw an exception, but you're certain the version is compatible, then disable the version check using the checkJacksonVersionCompatibility on StagehandOkHttpClient or StagehandOkHttpClientAsync.
Caution
We make no guarantee that the SDK works correctly when the Jackson version check is disabled.
Also note that there are bugs in older Jackson versions that can affect the SDK. We don't work around all Jackson bugs (example) and expect users to upgrade Jackson for those instead.
The SDK automatically retries 2 times by default, with a short exponential backoff between requests.
Only the following error types are retried:
- Connection errors (for example, due to a network connectivity problem)
- 408 Request Timeout
- 409 Conflict
- 429 Rate Limit
- 5xx Internal
The API may also explicitly instruct the SDK to retry or not retry a request.
To set a custom number of retries, configure the client using the maxRetries method:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
val client: StagehandClient = StagehandOkHttpClient.builder()
.fromEnv()
.maxRetries(4)
.build()Requests time out after 1 minute by default.
To set a custom timeout, configure the method call using the timeout method:
import com.browserbase.api.models.sessions.SessionStartResponse
val response: SessionStartResponse = client.sessions().start(
params, RequestOptions.builder().timeout(Duration.ofSeconds(30)).build()
)Or configure the default for all method calls at the client level:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
import java.time.Duration
val client: StagehandClient = StagehandOkHttpClient.builder()
.fromEnv()
.timeout(Duration.ofSeconds(30))
.build()To route requests through a proxy, configure the client using the proxy method:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
import java.net.InetSocketAddress
import java.net.Proxy
val client: StagehandClient = StagehandOkHttpClient.builder()
.fromEnv()
.proxy(Proxy(
Proxy.Type.HTTP, InetSocketAddress(
"https://example.com", 8080
)
))
.build()Note
Most applications should not call these methods, and instead use the system defaults. The defaults include special optimizations that can be lost if the implementations are modified.
To configure how HTTPS connections are secured, configure the client using the sslSocketFactory, trustManager, and hostnameVerifier methods:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
val client: StagehandClient = StagehandOkHttpClient.builder()
.fromEnv()
// If `sslSocketFactory` is set, then `trustManager` must be set, and vice versa.
.sslSocketFactory(yourSSLSocketFactory)
.trustManager(yourTrustManager)
.hostnameVerifier(yourHostnameVerifier)
.build()The SDK consists of three artifacts:
stagehand-kotlin-core- Contains core SDK logic
- Does not depend on OkHttp
- Exposes
StagehandClient,StagehandClientAsync,StagehandClientImpl, andStagehandClientAsyncImpl, all of which can work with any HTTP client
stagehand-kotlin-client-okhttp- Depends on OkHttp
- Exposes
StagehandOkHttpClientandStagehandOkHttpClientAsync, which provide a way to constructStagehandClientImplandStagehandClientAsyncImpl, respectively, using OkHttp
stagehand-kotlin- Depends on and exposes the APIs of both
stagehand-kotlin-coreandstagehand-kotlin-client-okhttp - Does not have its own logic
- Depends on and exposes the APIs of both
This structure allows replacing the SDK's default HTTP client without pulling in unnecessary dependencies.
Customized OkHttpClient
Tip
Try the available network options before replacing the default client.
To use a customized OkHttpClient:
- Replace your
stagehand-kotlindependency withstagehand-kotlin-core - Copy
stagehand-kotlin-client-okhttp'sOkHttpClientclass into your code and customize it - Construct
StagehandClientImplorStagehandClientAsyncImpl, similarly toStagehandOkHttpClientorStagehandOkHttpClientAsync, using your customized client
To use a completely custom HTTP client:
- Replace your
stagehand-kotlindependency withstagehand-kotlin-core - Write a class that implements the
HttpClientinterface - Construct
StagehandClientImplorStagehandClientAsyncImpl, similarly toStagehandOkHttpClientorStagehandOkHttpClientAsync, using your new client class
The SDK is typed for convenient usage of the documented API. However, it also supports working with undocumented or not yet supported parts of the API.
To set undocumented parameters, call the putAdditionalHeader, putAdditionalQueryParam, or putAdditionalBodyProperty methods on any Params class:
import com.browserbase.api.core.JsonValue
import com.browserbase.api.models.sessions.SessionActParams
val params: SessionActParams = SessionActParams.builder()
.putAdditionalHeader("Secret-Header", "42")
.putAdditionalQueryParam("secret_query_param", "42")
.putAdditionalBodyProperty("secretProperty", JsonValue.from("42"))
.build()These can be accessed on the built object later using the _additionalHeaders(), _additionalQueryParams(), and _additionalBodyProperties() methods.
To set undocumented parameters on nested headers, query params, or body classes, call the putAdditionalProperty method on the nested class:
import com.browserbase.api.core.JsonValue
import com.browserbase.api.models.sessions.SessionActParams
val params: SessionActParams = SessionActParams.builder()
.options(SessionActParams.Options.builder()
.putAdditionalProperty("secretProperty", JsonValue.from("42"))
.build())
.build()These properties can be accessed on the nested built object later using the _additionalProperties() method.
To set a documented parameter or property to an undocumented or not yet supported value, pass a JsonValue object to its setter:
import com.browserbase.api.core.JsonValue
import com.browserbase.api.models.sessions.SessionActParams
val params: SessionActParams = SessionActParams.builder()
.input(JsonValue.from(42))
.build()The most straightforward way to create a JsonValue is using its from(...) method:
import com.browserbase.api.core.JsonValue
// Create primitive JSON values
val nullValue: JsonValue = JsonValue.from(null)
val booleanValue: JsonValue = JsonValue.from(true)
val numberValue: JsonValue = JsonValue.from(42)
val stringValue: JsonValue = JsonValue.from("Hello World!")
// Create a JSON array value equivalent to `["Hello", "World"]`
val arrayValue: JsonValue = JsonValue.from(listOf(
"Hello", "World"
))
// Create a JSON object value equivalent to `{ "a": 1, "b": 2 }`
val objectValue: JsonValue = JsonValue.from(mapOf(
"a" to 1, "b" to 2
))
// Create an arbitrarily nested JSON equivalent to:
// {
// "a": [1, 2],
// "b": [3, 4]
// }
val complexValue: JsonValue = JsonValue.from(mapOf(
"a" to listOf(
1, 2
), "b" to listOf(
3, 4
)
))Normally a Builder class's build method will throw IllegalStateException if any required parameter or property is unset.
To forcibly omit a required parameter or property, pass JsonMissing:
import com.browserbase.api.core.JsonMissing
import com.browserbase.api.models.sessions.SessionActParams
val params: SessionActParams = SessionActParams.builder()
.input("Click the login button")
.id(JsonMissing.of())
.build()To access undocumented response properties, call the _additionalProperties() method:
import com.browserbase.api.core.JsonBoolean
import com.browserbase.api.core.JsonNull
import com.browserbase.api.core.JsonNumber
import com.browserbase.api.core.JsonValue
val additionalProperties: Map<String, JsonValue> = client.sessions().act(params)._additionalProperties()
val secretPropertyValue: JsonValue = additionalProperties.get("secretProperty")
val result = when (secretPropertyValue) {
is JsonNull -> "It's null!"
is JsonBoolean -> "It's a boolean!"
is JsonNumber -> "It's a number!"
// Other types include `JsonMissing`, `JsonString`, `JsonArray`, and `JsonObject`
else -> "It's something else!"
}To access a property's raw JSON value, which may be undocumented, call its _ prefixed method:
import com.browserbase.api.core.JsonField
import com.browserbase.api.models.sessions.SessionActParams
val input: JsonField<SessionActParams.Input> = client.sessions().act(params)._input()
if (input.isMissing()) {
// The property is absent from the JSON response
} else if (input.isNull()) {
// The property was set to literal null
} else {
// Check if value was provided as a string
// Other methods include `asNumber()`, `asBoolean()`, etc.
val jsonString: String? = input.asString();
// Try to deserialize into a custom type
val myObject: MyClass = input.asUnknown()!!.convert(MyClass::class.java)
}In rare cases, the API may return a response that doesn't match the expected type. For example, the SDK may expect a property to contain a String, but the API could return something else.
By default, the SDK will not throw an exception in this case. It will throw StagehandInvalidDataException only if you directly access the property.
If you would prefer to check that the response is completely well-typed upfront, then either call validate():
import com.browserbase.api.models.sessions.SessionActResponse
val response: SessionActResponse = client.sessions().act(params).validate()Or configure the method call to validate the response using the responseValidation method:
import com.browserbase.api.models.sessions.SessionActResponse
val response: SessionActResponse = client.sessions().act(
params, RequestOptions.builder().responseValidation(true).build()
)Or configure the default for all method calls at the client level:
import com.browserbase.api.client.StagehandClient
import com.browserbase.api.client.okhttp.StagehandOkHttpClient
val client: StagehandClient = StagehandOkHttpClient.builder()
.fromEnv()
.responseValidation(true)
.build()Kotlin enum classes are not trivially forwards compatible. Using them in the SDK could cause runtime exceptions if the API is updated to respond with a new enum value.
Using JsonField<T> enables a few features:
- Allowing usage of undocumented API functionality
- Lazily validating the API response against the expected shape
- Representing absent vs explicitly null values
Why don't you use data classes?
It is not backwards compatible to add new fields to a data class and we don't want to introduce a breaking change every time we add a field to a class.
Checked exceptions are widely considered a mistake in the Java programming language. In fact, they were omitted from Kotlin for this reason.
Checked exceptions:
- Are verbose to handle
- Encourage error handling at the wrong level of abstraction, where nothing can be done about the error
- Are tedious to propagate due to the function coloring problem
- Don't play well with lambdas (also due to the function coloring problem)
This package generally follows SemVer conventions, though certain backwards-incompatible changes may be released as minor versions:
- Changes to library internals which are technically public but not intended or documented for external use. (Please open a GitHub issue to let us know if you are relying on such internals.)
- Changes that we do not expect to impact the vast majority of users in practice.
We take backwards-compatibility seriously and work hard to ensure you can rely on a smooth upgrade experience.
We are keen for your feedback; please open an issue with questions, bugs, or suggestions.