Skip to content

Commit db47da5

Browse files
v0.7.10: models sorting, compress/decompress file block tools, new enrichment providers, deepseek models, db performance
2 parents 56a88a2 + 597d7ea commit db47da5

374 files changed

Lines changed: 70180 additions & 3139 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/memory-load-check/SKILL.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,35 @@ Read these when doing a deeper pass:
4949
- cap downloads and parsed output separately
5050
- preserve partial results when a later item exceeds the cap
5151
- never read untrusted response bodies without a byte cap
52+
- KB connector file downloads in `apps/sim/connectors/utils.ts`
53+
- `CONNECTOR_MAX_FILE_BYTES`: shared per-file cap (aligned with the manual KB upload limit)
54+
- `readBodyWithLimit`: stream a download body to a Buffer with a hard byte cap (null on overflow)
55+
- `stubOrSkipBySize`: listing-time skip when the reported size exceeds the cap
56+
- `markSkipped` / `sizeLimitSkipReason`: surface oversized files as failed (skipped) KB rows
57+
- `ConnectorFileTooLargeError`: thrown mid-download when the listing under-reported size
5258
- Large workflow value payloads
5359
- prefer durable references/manifests over inlining large arrays or files
5460
- materialize refs only behind an explicit byte budget
5561

62+
## KB Connector File Size Handling
63+
64+
The connector size pattern in `apps/sim/connectors/utils.ts` (`CONNECTOR_MAX_FILE_BYTES` + `readBodyWithLimit` + `stubOrSkipBySize`/`markSkipped`) exists for one risk: a knowledge-base connector downloading **arbitrary, user-controlled file bytes** that the source does not hard-cap. Apply it by that risk, not by the connector's name.
65+
66+
Use the pattern when the connector downloads file content via a stream/`download_url` where the user controls the size:
67+
- file-storage connectors: Dropbox, OneDrive, SharePoint, Google Drive, S3, GitHub, GitLab, Azure DevOps
68+
- any connector that fetches a file via a download URL even if it is not a "storage" service (e.g. the Zoom transcript `.vtt`)
69+
70+
For those, require all three:
71+
- stream the body with `readBodyWithLimit(resp, CONNECTOR_MAX_FILE_BYTES)` — never raw `response.text()`/`response.arrayBuffer()`
72+
- skip oversize at listing (`stubOrSkipBySize` with the reported size) and again at fetch time (overflow -> `markSkipped`), since the listing size can be missing or under-reported
73+
- never drop/truncate silently — oversized files become content-less failed rows carrying `skippedReason`, so they stay visible in the KB UI instead of vanishing from the index
74+
75+
Skip the pattern when the source already bounds the payload:
76+
- pure API/structured-data connectors (Jira, Linear, Notion, Confluence, Sentry, Slack, Zendesk, Gmail, ...) — paginated JSON/text; apply normal pagination + concurrency bounds instead of a per-file byte cap
77+
- native-document connectors capped by the platform (Google Docs ~50 MB, Google Sheets via `MAX_ROWS`, Evernote ~25 MB/note) — a 100 MB cap can never fire, and wrapping a `response.json()`/Thrift parse in `readBodyWithLimit` is cargo-culting
78+
79+
Litmus test: "Can a user make this one fetch arbitrarily large, with nothing upstream stopping it?" Yes -> use the pattern. No (platform hard-cap, or already paginated) -> a per-file byte cap adds noise, not safety. Borderline: a user-configured/self-hosted endpoint with no platform cap (e.g. Obsidian) — bound it only if the content is genuinely unbounded.
80+
5681
## Review Workflow
5782

5883
1. Identify every changed data source:
@@ -96,6 +121,7 @@ Read these when doing a deeper pass:
96121
- fetches all pages from an external API before processing
97122
- reads an entire file, HTTP response, or stream without a max byte budget
98123
- checks size only after `Buffer.concat`, `arrayBuffer`, `text`, `JSON.parse`, or parse expansion
124+
- a KB connector silently drops or truncates an oversized file instead of recording it as a failed (skipped) row
99125
- chunks only after loading the complete dataset
100126
- paginates with unbounded/deep `OFFSET` on a mutable or large table
101127
- creates one queue job per row without batching or a queue-level concurrency key

.github/CODEOWNERS

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,14 @@
2727
/apps/sim/app/workspace/*/home/hooks/preview/ @simstudioai/mothership
2828
/apps/sim/app/workspace/*/home/hooks/stream/ @simstudioai/mothership
2929
/apps/sim/hooks/queries/tasks.ts @simstudioai/mothership
30+
31+
# Dependency manifests and package-manager config. Any change here — adding,
32+
# removing, or bumping a dependency, or altering install/security settings —
33+
# requires review to guard against supply-chain risk. (CODEOWNERS gates file
34+
# changes, the closest proxy GitHub offers for "new dependency added".)
35+
package.json @simstudioai/deps
36+
**/package.json @simstudioai/deps
37+
bun.lock @simstudioai/deps
38+
**/bun.lock @simstudioai/deps
39+
bunfig.toml @simstudioai/deps
40+
.npmrc @simstudioai/deps

.github/workflows/ci.yml

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -90,26 +90,26 @@ jobs:
9090
ecr_repo_secret: ECR_REALTIME
9191
steps:
9292
- name: Checkout code
93-
uses: actions/checkout@v4
93+
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
9494

9595
- name: Configure AWS credentials
96-
uses: aws-actions/configure-aws-credentials@v6
96+
uses: aws-actions/configure-aws-credentials@e7f100cf4c008499ea8adda475de1042d6975c7b # v6
9797
with:
9898
role-to-assume: ${{ secrets.DEV_AWS_ROLE_TO_ASSUME }}
9999
aws-region: ${{ secrets.DEV_AWS_REGION }}
100100

101101
- name: Login to Amazon ECR
102102
id: login-ecr
103-
uses: aws-actions/amazon-ecr-login@v2
103+
uses: aws-actions/amazon-ecr-login@d539f0932e70871a027e9d5a9d8fc38589180a64 # v2
104104

105105
- name: Login to Docker Hub
106-
uses: docker/login-action@v4
106+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
107107
with:
108108
username: ${{ secrets.DOCKERHUB_USERNAME }}
109109
password: ${{ secrets.DOCKERHUB_TOKEN }}
110110

111111
- name: Set up Docker Buildx
112-
uses: useblacksmith/setup-docker-builder@v1
112+
uses: useblacksmith/setup-docker-builder@ab5c1da94f53f5cd75c1038092aa276dddfccbba # v1
113113

114114
- name: Resolve ECR repo name
115115
id: ecr-repo
@@ -118,7 +118,7 @@ jobs:
118118
ECR_REPO: ${{ matrix.ecr_repo_secret == 'ECR_APP' && secrets.ECR_APP || matrix.ecr_repo_secret == 'ECR_MIGRATIONS' && secrets.ECR_MIGRATIONS || matrix.ecr_repo_secret == 'ECR_REALTIME' && secrets.ECR_REALTIME || '' }}
119119

120120
- name: Build and push
121-
uses: useblacksmith/build-push-action@v2
121+
uses: useblacksmith/build-push-action@fb9e3e6a9299c78462bfadd0d93352c316adc9b8 # v2
122122
with:
123123
context: .
124124
file: ${{ matrix.dockerfile }}
@@ -155,34 +155,34 @@ jobs:
155155
ecr_repo_secret: ECR_REALTIME
156156
steps:
157157
- name: Checkout code
158-
uses: actions/checkout@v6
158+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
159159

160160
- name: Configure AWS credentials
161-
uses: aws-actions/configure-aws-credentials@v6
161+
uses: aws-actions/configure-aws-credentials@e7f100cf4c008499ea8adda475de1042d6975c7b # v6
162162
with:
163163
role-to-assume: ${{ github.ref == 'refs/heads/main' && secrets.AWS_ROLE_TO_ASSUME || secrets.STAGING_AWS_ROLE_TO_ASSUME }}
164164
aws-region: ${{ github.ref == 'refs/heads/main' && secrets.AWS_REGION || secrets.STAGING_AWS_REGION }}
165165

166166
- name: Login to Amazon ECR
167167
id: login-ecr
168-
uses: aws-actions/amazon-ecr-login@v2
168+
uses: aws-actions/amazon-ecr-login@d539f0932e70871a027e9d5a9d8fc38589180a64 # v2
169169

170170
- name: Login to Docker Hub
171-
uses: docker/login-action@v4
171+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
172172
with:
173173
username: ${{ secrets.DOCKERHUB_USERNAME }}
174174
password: ${{ secrets.DOCKERHUB_TOKEN }}
175175

176176
- name: Login to GHCR
177177
if: github.ref == 'refs/heads/main'
178-
uses: docker/login-action@v4
178+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
179179
with:
180180
registry: ghcr.io
181181
username: ${{ github.repository_owner }}
182182
password: ${{ secrets.GITHUB_TOKEN }}
183183

184184
- name: Set up Docker Buildx
185-
uses: useblacksmith/setup-docker-builder@v1
185+
uses: useblacksmith/setup-docker-builder@ab5c1da94f53f5cd75c1038092aa276dddfccbba # v1
186186

187187
- name: Resolve ECR repo name
188188
id: ecr-repo
@@ -222,7 +222,7 @@ jobs:
222222
echo "tags=${TAGS}" >> $GITHUB_OUTPUT
223223
224224
- name: Build and push images
225-
uses: useblacksmith/build-push-action@v2
225+
uses: useblacksmith/build-push-action@fb9e3e6a9299c78462bfadd0d93352c316adc9b8 # v2
226226
with:
227227
context: .
228228
file: ${{ matrix.dockerfile }}
@@ -254,17 +254,17 @@ jobs:
254254

255255
steps:
256256
- name: Checkout code
257-
uses: actions/checkout@v6
257+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
258258

259259
- name: Login to GHCR
260-
uses: docker/login-action@v4
260+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
261261
with:
262262
registry: ghcr.io
263263
username: ${{ github.repository_owner }}
264264
password: ${{ secrets.GITHUB_TOKEN }}
265265

266266
- name: Set up Docker Buildx
267-
uses: useblacksmith/setup-docker-builder@v1
267+
uses: useblacksmith/setup-docker-builder@ab5c1da94f53f5cd75c1038092aa276dddfccbba # v1
268268

269269
- name: Generate ARM64 tags
270270
id: meta
@@ -282,7 +282,7 @@ jobs:
282282
echo "tags=${TAGS}" >> $GITHUB_OUTPUT
283283
284284
- name: Build and push ARM64 to GHCR
285-
uses: useblacksmith/build-push-action@v2
285+
uses: useblacksmith/build-push-action@fb9e3e6a9299c78462bfadd0d93352c316adc9b8 # v2
286286
with:
287287
context: .
288288
file: ${{ matrix.dockerfile }}
@@ -309,7 +309,7 @@ jobs:
309309

310310
steps:
311311
- name: Login to GHCR
312-
uses: docker/login-action@v4
312+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
313313
with:
314314
registry: ghcr.io
315315
username: ${{ github.repository_owner }}
@@ -349,10 +349,10 @@ jobs:
349349
outputs:
350350
docs_changed: ${{ steps.filter.outputs.docs }}
351351
steps:
352-
- uses: actions/checkout@v6
352+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
353353
with:
354354
fetch-depth: 2 # Need at least 2 commits to detect changes
355-
- uses: dorny/paths-filter@v4
355+
- uses: dorny/paths-filter@fbd0ab8f3e69293af611ebaee6363fc25e6d187d # v4
356356
id: filter
357357
with:
358358
filters: |
@@ -379,14 +379,14 @@ jobs:
379379
contents: write
380380
steps:
381381
- name: Checkout code
382-
uses: actions/checkout@v6
382+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
383383
with:
384384
fetch-depth: 0
385385

386386
- name: Setup Bun
387-
uses: oven-sh/setup-bun@v2
387+
uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2
388388
with:
389-
bun-version: latest
389+
bun-version: 1.3.13
390390

391391
- name: Install dependencies
392392
run: bun install --frozen-lockfile

.github/workflows/companion-pr-check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
companion:
3232
runs-on: ubuntu-latest
3333
steps:
34-
- uses: actions/github-script@v7
34+
- uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7
3535
env:
3636
CROSS_REPO_TOKEN: ${{ secrets.CROSS_REPO_TOKEN }}
3737
with:

.github/workflows/docs-embeddings.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,20 +15,20 @@ jobs:
1515

1616
steps:
1717
- name: Checkout code
18-
uses: actions/checkout@v6
18+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
1919

2020
- name: Setup Bun
21-
uses: oven-sh/setup-bun@v2
21+
uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2
2222
with:
2323
bun-version: 1.3.13
2424

2525
- name: Setup Node
26-
uses: actions/setup-node@v6
26+
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6
2727
with:
2828
node-version: latest
2929

3030
- name: Cache Bun dependencies
31-
uses: actions/cache@v5
31+
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5
3232
with:
3333
path: |
3434
~/.bun/install/cache

.github/workflows/i18n.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,19 @@ jobs:
1414

1515
steps:
1616
- name: Checkout repository
17-
uses: actions/checkout@v6
17+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
1818
with:
1919
ref: staging
2020
token: ${{ secrets.GH_PAT }}
2121
fetch-depth: 0
2222

2323
- name: Setup Bun
24-
uses: oven-sh/setup-bun@v2
24+
uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2
2525
with:
2626
bun-version: 1.3.13
2727

2828
- name: Cache Bun dependencies
29-
uses: actions/cache@v5
29+
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5
3030
with:
3131
path: |
3232
~/.bun/install/cache
@@ -58,7 +58,7 @@ jobs:
5858
5959
- name: Create Pull Request with translations
6060
if: steps.changes.outputs.changes == 'true'
61-
uses: peter-evans/create-pull-request@v5
61+
uses: peter-evans/create-pull-request@4e1beaa7521e8b457b572c090b25bd3db56bf1c5 # v5
6262
with:
6363
token: ${{ secrets.GH_PAT }}
6464
commit-message: "feat(i18n): update translations"
@@ -115,17 +115,17 @@ jobs:
115115

116116
steps:
117117
- name: Checkout repository
118-
uses: actions/checkout@v6
118+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
119119
with:
120120
ref: staging
121121

122122
- name: Setup Bun
123-
uses: oven-sh/setup-bun@v2
123+
uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2
124124
with:
125125
bun-version: 1.3.13
126126

127127
- name: Cache Bun dependencies
128-
uses: actions/cache@v5
128+
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5
129129
with:
130130
path: |
131131
~/.bun/install/cache

.github/workflows/images.yml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -31,34 +31,34 @@ jobs:
3131

3232
steps:
3333
- name: Checkout code
34-
uses: actions/checkout@v6
34+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
3535

3636
- name: Configure AWS credentials
37-
uses: aws-actions/configure-aws-credentials@v6
37+
uses: aws-actions/configure-aws-credentials@e7f100cf4c008499ea8adda475de1042d6975c7b # v6
3838
with:
3939
role-to-assume: ${{ github.ref == 'refs/heads/main' && secrets.AWS_ROLE_TO_ASSUME || github.ref == 'refs/heads/dev' && secrets.DEV_AWS_ROLE_TO_ASSUME || secrets.STAGING_AWS_ROLE_TO_ASSUME }}
4040
aws-region: ${{ github.ref == 'refs/heads/main' && secrets.AWS_REGION || github.ref == 'refs/heads/dev' && secrets.DEV_AWS_REGION || secrets.STAGING_AWS_REGION }}
4141

4242
- name: Login to Amazon ECR
4343
id: login-ecr
44-
uses: aws-actions/amazon-ecr-login@v2
44+
uses: aws-actions/amazon-ecr-login@d539f0932e70871a027e9d5a9d8fc38589180a64 # v2
4545

4646
- name: Login to Docker Hub
47-
uses: docker/login-action@v4
47+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
4848
with:
4949
username: ${{ secrets.DOCKERHUB_USERNAME }}
5050
password: ${{ secrets.DOCKERHUB_TOKEN }}
5151

5252
- name: Login to GHCR
5353
if: github.ref == 'refs/heads/main'
54-
uses: docker/login-action@v4
54+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
5555
with:
5656
registry: ghcr.io
5757
username: ${{ github.repository_owner }}
5858
password: ${{ secrets.GITHUB_TOKEN }}
5959

6060
- name: Set up Docker Buildx
61-
uses: useblacksmith/setup-docker-builder@v1
61+
uses: useblacksmith/setup-docker-builder@ab5c1da94f53f5cd75c1038092aa276dddfccbba # v1
6262

6363
- name: Generate tags
6464
id: meta
@@ -90,7 +90,7 @@ jobs:
9090
echo "tags=${TAGS}" >> $GITHUB_OUTPUT
9191
9292
- name: Build and push images
93-
uses: useblacksmith/build-push-action@v2
93+
uses: useblacksmith/build-push-action@fb9e3e6a9299c78462bfadd0d93352c316adc9b8 # v2
9494
with:
9595
context: .
9696
file: ${{ matrix.dockerfile }}
@@ -117,17 +117,17 @@ jobs:
117117

118118
steps:
119119
- name: Checkout code
120-
uses: actions/checkout@v6
120+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
121121

122122
- name: Login to GHCR
123-
uses: docker/login-action@v4
123+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
124124
with:
125125
registry: ghcr.io
126126
username: ${{ github.repository_owner }}
127127
password: ${{ secrets.GITHUB_TOKEN }}
128128

129129
- name: Set up Docker Buildx
130-
uses: useblacksmith/setup-docker-builder@v1
130+
uses: useblacksmith/setup-docker-builder@ab5c1da94f53f5cd75c1038092aa276dddfccbba # v1
131131

132132
- name: Generate ARM64 tags
133133
id: meta
@@ -136,7 +136,7 @@ jobs:
136136
echo "tags=${IMAGE}:latest-arm64,${IMAGE}:${{ github.sha }}-arm64" >> $GITHUB_OUTPUT
137137
138138
- name: Build and push ARM64 to GHCR
139-
uses: useblacksmith/build-push-action@v2
139+
uses: useblacksmith/build-push-action@fb9e3e6a9299c78462bfadd0d93352c316adc9b8 # v2
140140
with:
141141
context: .
142142
file: ${{ matrix.dockerfile }}
@@ -160,7 +160,7 @@ jobs:
160160

161161
steps:
162162
- name: Login to GHCR
163-
uses: docker/login-action@v4
163+
uses: docker/login-action@650006c6eb7dba73a995cc03b0b2d7f5ca915bee # v4
164164
with:
165165
registry: ghcr.io
166166
username: ${{ github.repository_owner }}

0 commit comments

Comments
 (0)