Skip to content

Conversation

@jay-sahnan
Copy link
Contributor

Summary

Adds a new Extend + Browserbase template that automates downloading expense receipts from a web portal and parsing them with Extend AI for structured data extraction (vendor, date, totals, line items, payment method).

Includes both TypeScript (typescript/extend-browserbase/) and Python (python/extend-browserbase/) implementations with identical functionality.

Uses the observe -> act pattern to find and click all individual download buttons, polls Browserbase's Downloads API for the session ZIP, extracts files, then uploads to Extend for schema-based receipt extraction. Results are saved as JSON and CSV.

Copy link
Contributor

@shrey150 shrey150 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the template works, but there's a significant amount of unnecessary complexity that should be cleaned up before merging — especially since this is a template people will clone and learn from.

The biggest finding: both extend-ai and @browserbasehq/sdk already have built-in retry with exponential backoff (default 2 retries for 408/429/5xx). The hand-rolled retry logic throughout this PR is redundant and should be removed in favor of the SDK defaults (or bumping maxRetries if needed). Both SDKs also expose typed error classes (ExtendError and Browserbase.APIError) — use those instead of string-matching on error messages.

print("Session not found, returning empty result")
return 0
# HTML error response - session may not be ready yet, keep retrying
if "Unexpected token '<'" in error_message or "<html" in error_message:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the TS version — @browserbasehq/sdk likely exposes typed errors. The string matching on "Session with given id not found", "-32001", "Unexpected token '<'", and "<html" is brittle. Check if the Python SDK has an equivalent of Browserbase.APIError with a status code property.

}

// Process in batches of 9 to balance speed and reliability
for (let i = 0; i < filePaths.length; i += 9) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 9? If this is an Extend rate limit, document it. If arbitrary, extract to a named constant with a comment explaining the choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[shrey] +1 on avoiding a magic number here

@jay-sahnan jay-sahnan requested a review from shrey150 February 10, 2026 00:09
Copy link
Contributor

@shrey150 shrey150 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, consider pinning version numbers to avoid extra engineering work

@@ -12,7 +12,8 @@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we don't want to a pin a version number here? e.g. if we set "stagehand": "^3.0.0" so if for example we release v4 w/ breaking changes, this script remains functional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants