Skip to content

Commit dbef14b

Browse files
waleedlatif1claude
andauthored
feat(knowledge): connectors, user exclusions, expanded tools & airtable integration (#3230)
* feat(knowledge): connectors, user exclusions, expanded tools & airtable integration * improvements * removed redundant util * ack PR comments * remove module level cache, use syncContext between paginated calls to avoid redundant schema fetches * regen migrations, ack PR comments * ack PR comment * added tests * ack comments * ack comments * feat(db): add knowledge connector migration after merge Generated migration 0162 for knowledge_connector and knowledge_connector_sync_log tables after resolving merge conflicts with feat/mothership-copilot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(connectors): audit fixes for sync engine, connectors, and knowledge tools - Extract shared computeContentHash to connectors/utils.ts (dedup across 7 connectors) - Include error'd connectors in cron auto-retry query - Add syncContext caching for Confluence (cloudId, spaceId) - Batch Confluence label fetches with concurrency limit of 10 - Enforce maxPages in Confluence v2 path - Clean up stale storage files on document update - Retry stuck documents (pending/failed) after sync completes - Soft-delete documents and reclaim tag slots on connector deletion - Add incremental sync support to ConnectorConfig interface - Fix offset:0 falsy check in list_documents tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * perf(connectors): deep audit — extract shared utils, fix pagination, optimize API calls - Extract shared htmlToPlainText to connectors/utils.ts (dedup Confluence + Google Drive) - Add syncContext caching for Jira cloudId, Notion/Linear/Google Drive cumulative limits - Fix cumulative maxPages/maxIssues/maxFiles enforcement across pagination pages - Bump Notion page_size from 20 to 100 (5x fewer API round-trips) - Batch Notion child page fetching with concurrency=5 (was serial N+1) - Bump Confluence v2 limit from 50 to 250 (v2 API supports it) - Pass syncContext through Confluence CQL path for cumulative tracking - Upgrade GitHub tree truncation warning to error level - Fix sync-engine test mock to include inArray export Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(connectors): extract tag helpers, fix Notion maxPages, rewrite broken tests - Add parseTagDate and joinTagArray helpers to connectors/utils.ts - Update all 7 connectors to use shared tag mapping helpers (removes 12+ duplication instances) - Fix Notion listFromParentPage cumulative maxPages check (was using local count) - Rewrite 3 broken connector route test files to use vi.hoisted() + static vi.mock() pattern instead of deprecated vi.doMock/vi.resetModules (all 86 tests now pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(connectors): add loading skeletons, delete pending state, and pause feedback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(knowledge): escape LIKE wildcards, guard restore from un-deleting, fix offset=0 - Escape %, _, \ in tag filter LIKE patterns to prevent incorrect matches - Add isNull(deletedAt) guard to restore operation to prevent un-deleting soft-deleted docs - Change offset check from falsy to != null so offset=0 is not dropped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a368827 commit dbef14b

File tree

94 files changed

+23165
-133
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+23165
-133
lines changed

.claude/commands/add-connector.md

Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
---
2+
description: Add a knowledge base connector for syncing documents from an external source
3+
argument-hint: <service-name> [api-docs-url]
4+
---
5+
6+
# Add Connector Skill
7+
8+
You are an expert at adding knowledge base connectors to Sim. A connector syncs documents from an external source (Confluence, Google Drive, Notion, etc.) into a knowledge base.
9+
10+
## Your Task
11+
12+
When the user asks you to create a connector:
13+
1. Use Context7 or WebFetch to read the service's API documentation
14+
2. Create the connector directory and config
15+
3. Register it in the connector registry
16+
17+
## Directory Structure
18+
19+
Create files in `apps/sim/connectors/{service}/`:
20+
```
21+
connectors/{service}/
22+
├── index.ts # Barrel export
23+
└── {service}.ts # ConnectorConfig definition
24+
```
25+
26+
## ConnectorConfig Structure
27+
28+
```typescript
29+
import { createLogger } from '@sim/logger'
30+
import { {Service}Icon } from '@/components/icons'
31+
import { fetchWithRetry } from '@/lib/knowledge/documents/utils'
32+
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
33+
34+
const logger = createLogger('{Service}Connector')
35+
36+
export const {service}Connector: ConnectorConfig = {
37+
id: '{service}',
38+
name: '{Service}',
39+
description: 'Sync documents from {Service} into your knowledge base',
40+
version: '1.0.0',
41+
icon: {Service}Icon,
42+
43+
oauth: {
44+
required: true,
45+
provider: '{service}', // Must match OAuthService in lib/oauth/types.ts
46+
requiredScopes: ['read:...'],
47+
},
48+
49+
configFields: [
50+
// Rendered dynamically by the add-connector modal UI
51+
// Supports 'short-input' and 'dropdown' types
52+
],
53+
54+
listDocuments: async (accessToken, sourceConfig, cursor) => {
55+
// Paginate via cursor, extract text, compute SHA-256 hash
56+
// Return { documents: ExternalDocument[], nextCursor?, hasMore }
57+
},
58+
59+
getDocument: async (accessToken, sourceConfig, externalId) => {
60+
// Return ExternalDocument or null
61+
},
62+
63+
validateConfig: async (accessToken, sourceConfig) => {
64+
// Return { valid: true } or { valid: false, error: 'message' }
65+
},
66+
67+
// Optional: map source metadata to semantic tag keys (translated to slots by sync engine)
68+
mapTags: (metadata) => {
69+
// Return Record<string, unknown> with keys matching tagDefinitions[].id
70+
},
71+
}
72+
```
73+
74+
## ConfigField Types
75+
76+
The add-connector modal renders these automatically — no custom UI needed.
77+
78+
```typescript
79+
// Text input
80+
{
81+
id: 'domain',
82+
title: 'Domain',
83+
type: 'short-input',
84+
placeholder: 'yoursite.example.com',
85+
required: true,
86+
}
87+
88+
// Dropdown (static options)
89+
{
90+
id: 'contentType',
91+
title: 'Content Type',
92+
type: 'dropdown',
93+
required: false,
94+
options: [
95+
{ label: 'Pages only', id: 'page' },
96+
{ label: 'Blog posts only', id: 'blogpost' },
97+
{ label: 'All content', id: 'all' },
98+
],
99+
}
100+
```
101+
102+
## ExternalDocument Shape
103+
104+
Every document returned from `listDocuments`/`getDocument` must include:
105+
106+
```typescript
107+
{
108+
externalId: string // Source-specific unique ID
109+
title: string // Document title
110+
content: string // Extracted plain text
111+
mimeType: 'text/plain' // Always text/plain (content is extracted)
112+
contentHash: string // SHA-256 of content (change detection)
113+
sourceUrl?: string // Link back to original (stored on document record)
114+
metadata?: Record<string, unknown> // Source-specific data (fed to mapTags)
115+
}
116+
```
117+
118+
## Content Hashing (Required)
119+
120+
The sync engine uses content hashes for change detection:
121+
122+
```typescript
123+
async function computeContentHash(content: string): Promise<string> {
124+
const data = new TextEncoder().encode(content)
125+
const hashBuffer = await crypto.subtle.digest('SHA-256', data)
126+
return Array.from(new Uint8Array(hashBuffer)).map(b => b.toString(16).padStart(2, '0')).join('')
127+
}
128+
```
129+
130+
## tagDefinitions — Declared Tag Definitions
131+
132+
Declare which tags the connector populates using semantic IDs. Shown in the add-connector modal as opt-out checkboxes.
133+
On connector creation, slots are **dynamically assigned** via `getNextAvailableSlot` — connectors never hardcode slot names.
134+
135+
```typescript
136+
tagDefinitions: [
137+
{ id: 'labels', displayName: 'Labels', fieldType: 'text' },
138+
{ id: 'version', displayName: 'Version', fieldType: 'number' },
139+
{ id: 'lastModified', displayName: 'Last Modified', fieldType: 'date' },
140+
],
141+
```
142+
143+
Each entry has:
144+
- `id`: Semantic key matching a key returned by `mapTags` (e.g. `'labels'`, `'version'`)
145+
- `displayName`: Human-readable name shown in the UI (e.g. "Labels", "Last Modified")
146+
- `fieldType`: `'text'` | `'number'` | `'date'` | `'boolean'` — determines which slot pool to draw from
147+
148+
Users can opt out of specific tags in the modal. Disabled IDs are stored in `sourceConfig.disabledTagIds`.
149+
The assigned mapping (`semantic id → slot`) is stored in `sourceConfig.tagSlotMapping`.
150+
151+
## mapTags — Metadata to Semantic Keys
152+
153+
Maps source metadata to semantic tag keys. Required if `tagDefinitions` is set.
154+
The sync engine calls this automatically and translates semantic keys to actual DB slots
155+
using the `tagSlotMapping` stored on the connector.
156+
157+
Return keys must match the `id` values declared in `tagDefinitions`.
158+
159+
```typescript
160+
mapTags: (metadata: Record<string, unknown>): Record<string, unknown> => {
161+
const result: Record<string, unknown> = {}
162+
163+
// Validate arrays before casting — metadata may be malformed
164+
const labels = Array.isArray(metadata.labels) ? (metadata.labels as string[]) : []
165+
if (labels.length > 0) result.labels = labels.join(', ')
166+
167+
// Validate numbers — guard against NaN
168+
if (metadata.version != null) {
169+
const num = Number(metadata.version)
170+
if (!Number.isNaN(num)) result.version = num
171+
}
172+
173+
// Validate dates — guard against Invalid Date
174+
if (typeof metadata.lastModified === 'string') {
175+
const date = new Date(metadata.lastModified)
176+
if (!Number.isNaN(date.getTime())) result.lastModified = date
177+
}
178+
179+
return result
180+
}
181+
```
182+
183+
## External API Calls — Use `fetchWithRetry`
184+
185+
All external API calls must use `fetchWithRetry` from `@/lib/knowledge/documents/utils` instead of raw `fetch()`. This provides exponential backoff with retries on 429/502/503/504 errors. It returns a standard `Response` — all `.ok`, `.json()`, `.text()` checks work unchanged.
186+
187+
For `validateConfig` (user-facing, called on save), pass `VALIDATE_RETRY_OPTIONS` to cap wait time at ~7s. Background operations (`listDocuments`, `getDocument`) use the built-in defaults (5 retries, ~31s max).
188+
189+
```typescript
190+
import { VALIDATE_RETRY_OPTIONS, fetchWithRetry } from '@/lib/knowledge/documents/utils'
191+
192+
// Background sync — use defaults
193+
const response = await fetchWithRetry(url, {
194+
method: 'GET',
195+
headers: { Authorization: `Bearer ${accessToken}` },
196+
})
197+
198+
// validateConfig — tighter retry budget
199+
const response = await fetchWithRetry(url, { ... }, VALIDATE_RETRY_OPTIONS)
200+
```
201+
202+
## sourceUrl
203+
204+
If `ExternalDocument.sourceUrl` is set, the sync engine stores it on the document record. Always construct the full URL (not a relative path).
205+
206+
## Sync Engine Behavior (Do Not Modify)
207+
208+
The sync engine (`lib/knowledge/connectors/sync-engine.ts`) is connector-agnostic. It:
209+
1. Calls `listDocuments` with pagination until `hasMore` is false
210+
2. Compares `contentHash` to detect new/changed/unchanged documents
211+
3. Stores `sourceUrl` and calls `mapTags` on insert/update automatically
212+
4. Handles soft-delete of removed documents
213+
214+
You never need to modify the sync engine when adding a connector.
215+
216+
## OAuth Credential Reuse
217+
218+
Connectors reuse the existing OAuth infrastructure. The `oauth.provider` must match an `OAuthService` from `apps/sim/lib/oauth/types.ts`. Check existing providers before adding a new one.
219+
220+
## Icon
221+
222+
The `icon` field on `ConnectorConfig` is used throughout the UI — in the connector list, the add-connector modal, and as the document icon in the knowledge base table (replacing the generic file type icon for connector-sourced documents). The icon is read from `CONNECTOR_REGISTRY[connectorType].icon` at runtime — no separate icon map to maintain.
223+
224+
If the service already has an icon in `apps/sim/components/icons.tsx` (from a tool integration), reuse it. Otherwise, ask the user to provide the SVG.
225+
226+
## Registering
227+
228+
Add one line to `apps/sim/connectors/registry.ts`:
229+
230+
```typescript
231+
import { {service}Connector } from '@/connectors/{service}'
232+
233+
export const CONNECTOR_REGISTRY: ConnectorRegistry = {
234+
// ... existing connectors ...
235+
{service}: {service}Connector,
236+
}
237+
```
238+
239+
## Reference Implementation
240+
241+
See `apps/sim/connectors/confluence/confluence.ts` for a complete example with:
242+
- Multiple config field types (text + dropdown)
243+
- Label fetching and CQL search filtering
244+
- Blogpost + page content types
245+
- `mapTags` mapping labels, version, and dates to semantic keys
246+
247+
## Checklist
248+
249+
- [ ] Created `connectors/{service}/{service}.ts` with full ConnectorConfig
250+
- [ ] Created `connectors/{service}/index.ts` barrel export
251+
- [ ] `oauth.provider` matches an existing OAuthService in `lib/oauth/types.ts`
252+
- [ ] `listDocuments` handles pagination and computes content hashes
253+
- [ ] `sourceUrl` set on each ExternalDocument (full URL, not relative)
254+
- [ ] `metadata` includes source-specific data for tag mapping
255+
- [ ] `tagDefinitions` declared for each semantic key returned by `mapTags`
256+
- [ ] `mapTags` implemented if source has useful metadata (labels, dates, versions)
257+
- [ ] `validateConfig` verifies the source is accessible
258+
- [ ] All external API calls use `fetchWithRetry` (not raw `fetch`)
259+
- [ ] All optional config fields validated in `validateConfig`
260+
- [ ] Icon exists in `components/icons.tsx` (or asked user to provide SVG)
261+
- [ ] Registered in `connectors/registry.ts`

apps/docs/content/docs/en/tools/airtable.mdx

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,4 +204,37 @@ Update multiple existing records in an Airtable table
204204
|`recordCount` | number | Number of records updated |
205205
|`updatedRecordIds` | array | List of updated record IDs |
206206

207+
### `airtable_list_bases`
208+
209+
List all bases the authenticated user has access to
210+
211+
#### Input
212+
213+
| Parameter | Type | Required | Description |
214+
| --------- | ---- | -------- | ----------- |
215+
216+
#### Output
217+
218+
| Parameter | Type | Description |
219+
| --------- | ---- | ----------- |
220+
| `bases` | json | Array of Airtable bases with id, name, and permissionLevel |
221+
| `metadata` | json | Operation metadata including total bases count |
222+
223+
### `airtable_get_base_schema`
224+
225+
Get the schema of all tables, fields, and views in an Airtable base
226+
227+
#### Input
228+
229+
| Parameter | Type | Required | Description |
230+
| --------- | ---- | -------- | ----------- |
231+
| `baseId` | string | Yes | Airtable base ID \(starts with "app", e.g., "appXXXXXXXXXXXXXX"\) |
232+
233+
#### Output
234+
235+
| Parameter | Type | Description |
236+
| --------- | ---- | ----------- |
237+
| `tables` | json | Array of table schemas with fields and views |
238+
| `metadata` | json | Operation metadata including total tables count |
239+
207240

0 commit comments

Comments
 (0)