Skip to content

Commit bf36dd9

Browse files
author
Sentience Dev
committed
Merge pull request #44 from SentienceAPI/agent_abstraction
Phase 1/2: Add agent abstraction layer
2 parents 62194b0 + 3572f0a commit bf36dd9

14 files changed

+3350
-9
lines changed

README.md

Lines changed: 194 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,194 @@ npm run build
1212
npx playwright install chromium
1313
```
1414

15-
## Quick Start
15+
## Quick Start: Choose Your Abstraction Level
16+
17+
Sentience SDK offers **4 levels of abstraction** - choose based on your needs:
18+
19+
### 💬 Level 4: Conversational Agent (Highest Abstraction) - **NEW in v0.3.0**
20+
21+
Complete automation with natural conversation. Just describe what you want, and the agent plans and executes everything:
22+
23+
```typescript
24+
import { SentienceBrowser, ConversationalAgent, OpenAIProvider } from 'sentience-ts';
25+
26+
const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
27+
const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o');
28+
const agent = new ConversationalAgent({ llmProvider: llm, browser });
29+
30+
// Navigate to starting page
31+
await browser.getPage().goto('https://amazon.com');
32+
33+
// ONE command does it all - automatic planning and execution!
34+
const response = await agent.execute(
35+
"Search for 'wireless mouse' and tell me the price of the top result"
36+
);
37+
console.log(response); // "I found the top result for wireless mouse on Amazon. It's priced at $24.99..."
38+
39+
// Follow-up questions maintain context
40+
const followUp = await agent.chat("Add it to cart");
41+
console.log(followUp);
42+
43+
await browser.close();
44+
```
45+
46+
**When to use:** Complex multi-step tasks, conversational interfaces, maximum convenience
47+
**Code reduction:** 99% less code - describe goals in natural language
48+
**Requirements:** OpenAI or Anthropic API key
49+
50+
### 🤖 Level 3: Agent (Natural Language Commands) - **Recommended for Most Users**
51+
52+
Zero coding knowledge needed. Just write what you want in plain English:
53+
54+
```typescript
55+
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentience-ts';
56+
57+
const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
58+
const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o-mini');
59+
const agent = new SentienceAgent(browser, llm);
60+
61+
await browser.getPage().goto('https://www.amazon.com');
62+
63+
// Just natural language commands - agent handles everything!
64+
await agent.act('Click the search box');
65+
await agent.act("Type 'wireless mouse' into the search field");
66+
await agent.act('Press Enter key');
67+
await agent.act('Click the first product result');
68+
69+
// Automatic token tracking
70+
console.log(`Tokens used: ${agent.getTokenStats().totalTokens}`);
71+
await browser.close();
72+
```
73+
74+
**When to use:** Quick automation, non-technical users, rapid prototyping
75+
**Code reduction:** 95-98% less code vs manual approach
76+
**Requirements:** OpenAI API key (or Anthropic for Claude)
77+
78+
### 🔧 Level 2: Direct SDK (Technical Control)
79+
80+
Full control with semantic selectors. For technical users who want precision:
81+
82+
```typescript
83+
import { SentienceBrowser, snapshot, find, click, typeText, press } from 'sentience-ts';
84+
85+
const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
86+
await browser.getPage().goto('https://www.amazon.com');
87+
88+
// Get semantic snapshot
89+
const snap = await snapshot(browser);
90+
91+
// Find elements using query DSL
92+
const searchBox = find(snap, 'role=textbox text~"search"');
93+
await click(browser, searchBox!.id);
94+
95+
// Type and submit
96+
await typeText(browser, searchBox!.id, 'wireless mouse');
97+
await press(browser, 'Enter');
98+
99+
await browser.close();
100+
```
101+
102+
**When to use:** Need precise control, debugging, custom workflows
103+
**Code reduction:** Still 80% less code vs raw Playwright
104+
**Requirements:** Only Sentience API key
105+
106+
### ⚙️ Level 1: Raw Playwright (Maximum Control)
107+
108+
For when you need complete low-level control (rare):
109+
110+
```typescript
111+
import { chromium } from 'playwright';
112+
113+
const browser = await chromium.launch();
114+
const page = await browser.newPage();
115+
await page.goto('https://www.amazon.com');
116+
await page.fill('#twotabsearchtextbox', 'wireless mouse');
117+
await page.press('#twotabsearchtextbox', 'Enter');
118+
await browser.close();
119+
```
120+
121+
**When to use:** Very specific edge cases, custom browser configs
122+
**Tradeoffs:** No semantic intelligence, brittle selectors, more code
123+
124+
---
125+
126+
## Agent Layer Examples
127+
128+
### Google Search (6 lines of code)
129+
130+
```typescript
131+
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentience-ts';
132+
133+
const browser = await SentienceBrowser.create({ apiKey: apiKey });
134+
const llm = new OpenAIProvider(openaiKey, 'gpt-4o-mini');
135+
const agent = new SentienceAgent(browser, llm);
136+
137+
await browser.getPage().goto('https://www.google.com');
138+
await agent.act('Click the search box');
139+
await agent.act("Type 'mechanical keyboards' into the search field");
140+
await agent.act('Press Enter key');
141+
await agent.act('Click the first non-ad search result');
142+
143+
await browser.close();
144+
```
145+
146+
**See full example:** [examples/agent-google-search.ts](examples/agent-google-search.ts)
147+
148+
### Using Anthropic Claude Instead of GPT
149+
150+
```typescript
151+
import { SentienceAgent, AnthropicProvider } from 'sentience-ts';
152+
153+
// Swap OpenAI for Anthropic - same API!
154+
const llm = new AnthropicProvider(
155+
process.env.ANTHROPIC_API_KEY!,
156+
'claude-3-5-sonnet-20241022'
157+
);
158+
159+
const agent = new SentienceAgent(browser, llm);
160+
await agent.act('Click the search button'); // Works exactly the same
161+
```
162+
163+
**BYOB (Bring Your Own Brain):** OpenAI, Anthropic, or implement `LLMProvider` for any model.
164+
165+
**See full example:** [examples/agent-with-anthropic.ts](examples/agent-with-anthropic.ts)
166+
167+
### Amazon Shopping (98% code reduction)
168+
169+
**Before (manual approach):** 350 lines
170+
**After (agent layer):** 6 lines
171+
172+
```typescript
173+
await agent.act('Click the search box');
174+
await agent.act("Type 'wireless mouse' into the search field");
175+
await agent.act('Press Enter key');
176+
await agent.act('Click the first visible product in the search results');
177+
await agent.act("Click the 'Add to Cart' button");
178+
```
179+
180+
**See full example:** [examples/agent-amazon-shopping.ts](examples/agent-amazon-shopping.ts)
181+
182+
---
183+
184+
## Installation for Agent Layer
185+
186+
```bash
187+
# Install core SDK
188+
npm install sentience-ts
189+
190+
# Install LLM provider (choose one or both)
191+
npm install openai # For GPT-4, GPT-4o, GPT-4o-mini
192+
npm install @anthropic-ai/sdk # For Claude 3.5 Sonnet
193+
194+
# Set API keys
195+
export SENTIENCE_API_KEY="your-sentience-key"
196+
export OPENAI_API_KEY="your-openai-key" # OR
197+
export ANTHROPIC_API_KEY="your-anthropic-key"
198+
```
199+
200+
---
201+
202+
## Direct SDK Quick Start
16203

17204
```typescript
18205
import { SentienceBrowser, snapshot, find, click } from './src';
@@ -349,6 +536,12 @@ element.z_index // CSS stacking order
349536

350537
See the `examples/` directory for complete working examples:
351538

539+
### Agent Layer (Level 3 - Natural Language)
540+
- **`agent-google-search.ts`** - Google search automation with natural language commands
541+
- **`agent-amazon-shopping.ts`** - Amazon shopping bot (6 lines vs 350 lines manual code)
542+
- **`agent-with-anthropic.ts`** - Using Anthropic Claude instead of OpenAI GPT
543+
544+
### Direct SDK (Level 2 - Technical Control)
352545
- **`hello.ts`** - Extension bridge verification
353546
- **`basic-agent.ts`** - Basic snapshot and element inspection
354547
- **`query-demo.ts`** - Query engine demonstrations

examples/agent-amazon-shopping.ts

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
/**
2+
* Example: Amazon Shopping using SentienceAgent
3+
*
4+
* Demonstrates complex multi-step automation with the agent layer.
5+
* Reduces 300+ lines of manual code to ~20 lines of natural language commands.
6+
*
7+
* Run with:
8+
* npx ts-node examples/agent-amazon-shopping.ts
9+
*/
10+
11+
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from '../src';
12+
13+
async function main() {
14+
// Set up environment
15+
const sentienceKey = process.env.SENTIENCE_API_KEY;
16+
const openaiKey = process.env.OPENAI_API_KEY;
17+
18+
if (!openaiKey) {
19+
console.error('❌ Error: OPENAI_API_KEY environment variable not set');
20+
console.log('Set it with: export OPENAI_API_KEY="your-key-here"');
21+
process.exit(1);
22+
}
23+
24+
// Initialize browser and agent
25+
const browser = await SentienceBrowser.create({
26+
apiKey: sentienceKey,
27+
headless: false
28+
});
29+
30+
const llm = new OpenAIProvider(openaiKey, 'gpt-4o-mini');
31+
const agent = new SentienceAgent(browser, llm, 50, true);
32+
33+
try {
34+
console.log('🛒 Amazon Shopping Demo with SentienceAgent\n');
35+
36+
// Navigate to Amazon
37+
await browser.getPage().goto('https://www.amazon.com');
38+
await browser.getPage().waitForLoadState('networkidle');
39+
await new Promise(resolve => setTimeout(resolve, 2000));
40+
41+
// Search for product
42+
console.log('Step 1: Searching for wireless mouse...\n');
43+
await agent.act('Click the search box');
44+
await agent.act("Type 'wireless mouse' into the search field");
45+
await agent.act('Press Enter key');
46+
47+
// Wait for search results
48+
await new Promise(resolve => setTimeout(resolve, 4000));
49+
50+
// Select a product
51+
console.log('Step 2: Selecting a product...\n');
52+
await agent.act('Click the first visible product in the search results');
53+
54+
// Wait for product page to load
55+
await new Promise(resolve => setTimeout(resolve, 5000));
56+
57+
// Add to cart
58+
console.log('Step 3: Adding to cart...\n');
59+
await agent.act("Click the 'Add to Cart' button");
60+
61+
// Wait for cart confirmation
62+
await new Promise(resolve => setTimeout(resolve, 3000));
63+
64+
console.log('\n✅ Shopping automation completed!\n');
65+
66+
// Print execution summary
67+
const stats = agent.getTokenStats();
68+
const history = agent.getHistory();
69+
70+
console.log('📊 Execution Summary:');
71+
console.log(` Actions executed: ${history.length}`);
72+
console.log(` Total tokens: ${stats.totalTokens}`);
73+
console.log(` Avg tokens per action: ${Math.round(stats.totalTokens / history.length)}`);
74+
75+
console.log('\n📜 Action History:');
76+
history.forEach((entry, i) => {
77+
const status = entry.success ? '✅' : '❌';
78+
console.log(` ${i + 1}. ${status} ${entry.goal} (${entry.durationMs}ms)`);
79+
});
80+
81+
console.log('\n💡 Code Comparison:');
82+
console.log(' Old approach: ~350 lines (manual snapshots, prompts, filtering)');
83+
console.log(' Agent approach: ~6 lines (natural language commands)');
84+
console.log(' Reduction: 98%');
85+
86+
} catch (error: any) {
87+
console.error('❌ Error:', error.message);
88+
} finally {
89+
await browser.close();
90+
}
91+
}
92+
93+
// Run if executed directly
94+
if (require.main === module) {
95+
main().catch(console.error);
96+
}

examples/agent-google-search.ts

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/**
2+
* Example: Google Search using SentienceAgent
3+
*
4+
* Demonstrates high-level agent abstraction with natural language commands.
5+
* No manual snapshot filtering or prompt engineering required.
6+
*
7+
* Run with:
8+
* npx ts-node examples/agent-google-search.ts
9+
*/
10+
11+
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from '../src';
12+
13+
async function main() {
14+
// Initialize browser
15+
const browser = await SentienceBrowser.create({
16+
apiKey: process.env.SENTIENCE_API_KEY,
17+
headless: false
18+
});
19+
20+
// Initialize LLM provider (OpenAI GPT-4o-mini for cost efficiency)
21+
const llm = new OpenAIProvider(
22+
process.env.OPENAI_API_KEY!,
23+
'gpt-4o-mini'
24+
);
25+
26+
// Create agent
27+
const agent = new SentienceAgent(browser, llm, 50, true);
28+
29+
try {
30+
console.log('🔍 Google Search Demo with SentienceAgent\n');
31+
32+
// Navigate to Google
33+
await browser.getPage().goto('https://www.google.com');
34+
await browser.getPage().waitForLoadState('networkidle');
35+
36+
// Use agent to perform search - just natural language commands!
37+
await agent.act('Click the search box');
38+
await agent.act("Type 'best mechanical keyboards 2024' into the search field");
39+
await agent.act('Press Enter key');
40+
41+
// Wait for results
42+
await new Promise(resolve => setTimeout(resolve, 3000));
43+
44+
// Click first result
45+
await agent.act('Click the first non-ad search result');
46+
47+
// Wait for page load
48+
await new Promise(resolve => setTimeout(resolve, 2000));
49+
50+
console.log('\n✅ Search completed successfully!\n');
51+
52+
// Print token usage stats
53+
const stats = agent.getTokenStats();
54+
console.log('📊 Token Usage:');
55+
console.log(` Total tokens: ${stats.totalTokens}`);
56+
console.log(` Prompt tokens: ${stats.totalPromptTokens}`);
57+
console.log(` Completion tokens: ${stats.totalCompletionTokens}`);
58+
console.log('\n📜 Action Breakdown:');
59+
stats.byAction.forEach((action, i) => {
60+
console.log(` ${i + 1}. ${action.goal}: ${action.totalTokens} tokens`);
61+
});
62+
63+
} finally {
64+
await browser.close();
65+
}
66+
}
67+
68+
// Run if executed directly
69+
if (require.main === module) {
70+
main().catch(console.error);
71+
}

0 commit comments

Comments
 (0)