Skip to content

Commit 18b4cb1

Browse files
mikolalysenkoclaude
andcommitted
feat(vendor): hold patch blobs in memory — vendoring writes no .socket/blobs or temp files
Vendor flows (vendor, scan --vendor, --detached) no longer persist patch content anywhere on disk: a vendored project's .socket holds only manifest.json and vendor/. - core: PatchSources.mem_blobs overlay, checked before the on-disk blob read in the apply pipeline's blob strategy. - core: harvest_artifact_blobs — re-stage afterHash blobs from the committed vendor artifact itself (uuid-matched against the ledger, every blob self-verified by its own git-sha256), so in-sync re-runs and fresh clones of vendored projects stage with no network. - cli: stage_vendor_sources_in_memory replaces the disk stager in all vendor flows; missing content is fetched per patch via the proxy-aware patch-view endpoint straight into memory. - cli: DownloadParams.persist_blobs — scan passes !args.vendor so the scan --vendor download phase writes only the manifest. - e2e: .socket-stays-lean assertions (manifest mode, detached, fresh clone) + no-blobs detached idempotency; core harvest unit tests (tgz, dir-shaped, stale-uuid, escaping-path fail-closed). - docs: CLI contract "Patch sources stay in memory" section. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 64c59f1 commit 18b4cb1

25 files changed

Lines changed: 996 additions & 248 deletions

crates/socket-patch-cli/CLI_CONTRACT.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Beyond the globals above, each subcommand defines a small set of local arguments
8282

8383
`scan --sync` is sugar for `--apply --prune` — the canonical single-flag bot invocation. `scan --json --sync --yes` discovers, applies, and reconciles state in one pass.
8484

85-
`scan --vendor` swaps the in-place apply for the vendor pipeline: discover → download (manifest written, as `--apply`) → vendor every patched dependency via the same engine as the `vendor` command (under the same lock). The whole manifest is vendored, so a package vendored at an older patch uuid is **re-vendored automatically** (its old uuid dir is removed — `vendor_stale_artifact_removed`); same-uuid re-runs are `already_vendored` skips. With `--prune`, GC runs **before** the vendor step so stale manifest entries don't fail vendoring with `package_not_installed`. JSON output gains a `download` sub-object (the download phase; no `applied` field — nothing is applied in place) and a `vendor` sub-object (a full vendor Envelope). `--dry-run` previews per-patch `would_vendor` | `would_revendor` (+`oldUuid`) | `already_vendored` without network downloads or disk writes. Interactive mode prompts "Download and vendor N patch(es)?".
85+
`scan --vendor` swaps the in-place apply for the vendor pipeline: discover → download (manifest written, as `--apply`) → vendor every patched dependency via the same engine as the `vendor` command (under the same lock). The whole manifest is vendored, so a package vendored at an older patch uuid is **re-vendored automatically** (its old uuid dir is removed — `vendor_stale_artifact_removed`); same-uuid re-runs are `already_vendored` skips. With `--prune`, GC runs **before** the vendor step so stale manifest entries don't fail vendoring with `package_not_installed`. JSON output gains a `download` sub-object (the download phase; no `applied` field — nothing is applied in place) and a `vendor` sub-object (a full vendor Envelope). The download phase writes only `.socket/manifest.json`; patch blobs are held in memory (see "Patch sources stay in memory" under the vendor contract). `--dry-run` previews per-patch `would_vendor` | `would_revendor` (+`oldUuid`) | `already_vendored` without network downloads or disk writes. Interactive mode prompts "Download and vendor N patch(es)?".
8686

8787
`scan --vendor --detached` performs the same vendoring **without ever writing `.socket/manifest.json`**: records are fetched into memory (`download.detached: true`), the artifacts are built + wired, and the ledger entry carries `detached: true` plus an embedded copy of the patch record (`record`) as the verification source. Detached patches are invisible to apply/rollback/repair (nothing is in the manifest), exempt from `vendor`'s manifest reconcile, and exit via `remove <purl>` (which reverts them) or `vendor --revert`. Idempotent re-runs reuse the embedded record and skip the patch-view fetch entirely.
8888

@@ -326,6 +326,14 @@ machines with **no socket-patch installed and no Socket API access** (registry a
326326
unvendored dependencies may still be needed). Every mechanism below was validated against the real
327327
package managers (`spikes/PHASE0-FINDINGS.txt`).
328328

329+
**Patch sources stay in memory (v3.4)**: vendoring never writes `.socket/blobs/`, `.socket/diffs/`,
330+
or temporary patch files. Pre-existing `.socket/` artifacts (from a prior `apply`/`get`/`repair`)
331+
are read in place; already-vendored purls re-stage patch content from the committed artifact itself
332+
(uuid-matched against the ledger, every harvested blob self-verified by its afterHash — so in-sync
333+
re-runs and fresh clones of vendored projects need no network); anything still missing is fetched
334+
into memory via the patch-view endpoint. A vendored project's `.socket/` holds only
335+
`manifest.json` (omitted in detached mode) and `vendor/`.
336+
329337
### Path convention + patch-UUID recovery (stable)
330338

331339
```text

crates/socket-patch-cli/src/commands/apply.rs

Lines changed: 31 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ use socket_patch_core::crawlers::{
55
};
66
use socket_patch_core::manifest::operations::read_manifest;
77
use socket_patch_core::manifest::schema::PatchRecord;
8-
use socket_patch_core::patch::apply::{MismatchPolicy,
9-
apply_package_patch, verify_file_patch, ApplyResult, PatchSources, VerifyStatus,
8+
use socket_patch_core::patch::apply::{
9+
apply_package_patch, verify_file_patch, ApplyResult, MismatchPolicy, PatchSources, VerifyStatus,
1010
};
1111
/// Files whose pre-apply content matched NEITHER hash and were (or would
1212
/// be) overwritten with the verified patched content — the promoted
@@ -94,10 +94,7 @@ async fn ensure_blobs_for_mismatches(
9494
}
9595
let (client, _) = get_api_client_with_overrides(args.common.api_client_overrides()).await;
9696
let _ = socket_patch_core::api::blob_fetcher::fetch_blobs_by_hash(
97-
&needed,
98-
blobs_path,
99-
&client,
100-
None,
97+
&needed, blobs_path, &client, None,
10198
)
10299
.await;
103100
}
@@ -732,17 +729,13 @@ pub async fn run(args: ApplyArgs) -> i32 {
732729
};
733730
println!(
734731
" {}{}",
735-
socket_patch_core::utils::purl::normalize_purl(
736-
&result.package_key
737-
),
732+
socket_patch_core::utils::purl::normalize_purl(&result.package_key),
738733
suffix
739734
);
740735
} else if all_files_already_patched(result) {
741736
println!(
742737
" {} (already patched)",
743-
socket_patch_core::utils::purl::normalize_purl(
744-
&result.package_key
745-
)
738+
socket_patch_core::utils::purl::normalize_purl(&result.package_key)
746739
);
747740
}
748741
}
@@ -1102,6 +1095,7 @@ async fn apply_patches_inner(
11021095
blobs_path: &blobs_path,
11031096
packages_path: Some(&packages_path),
11041097
diffs_path: Some(&diffs_path),
1098+
mem_blobs: None,
11051099
};
11061100
let result = apply_package_patch(
11071101
variant_purl,
@@ -1186,38 +1180,38 @@ async fn apply_patches_inner(
11861180
blobs_path: &blobs_path,
11871181
packages_path: Some(&packages_path),
11881182
diffs_path: Some(&diffs_path),
1183+
mem_blobs: None,
11891184
};
11901185
// Local go redirects to a project-local patched copy under
11911186
// `.socket/go-patches/` wired via a `go.mod` `replace` (the module
11921187
// cache is `go.sum`-verified, so in-place patching can't build).
11931188
// Everything else — npm/pypi/gem and cargo (vendored or registry
11941189
// cache) — patches in place via `apply_package_patch`. Without the
11951190
// `golang` feature `try_local_go_apply` is an inert `None`.
1196-
let result =
1197-
match try_local_go_apply(
1198-
purl,
1199-
pkg_path,
1200-
patch,
1201-
&sources,
1202-
&args.common,
1203-
mismatch_policy(args.force, args.common.strict),
1204-
)
1191+
let result = match try_local_go_apply(
1192+
purl,
1193+
pkg_path,
1194+
patch,
1195+
&sources,
1196+
&args.common,
1197+
mismatch_policy(args.force, args.common.strict),
1198+
)
1199+
.await
1200+
{
1201+
Some(r) => r,
1202+
None => {
1203+
apply_package_patch(
1204+
purl,
1205+
pkg_path,
1206+
&patch.files,
1207+
&sources,
1208+
Some(&patch.uuid),
1209+
args.common.dry_run,
1210+
mismatch_policy(args.force, args.common.strict),
1211+
)
12051212
.await
1206-
{
1207-
Some(r) => r,
1208-
None => {
1209-
apply_package_patch(
1210-
purl,
1211-
pkg_path,
1212-
&patch.files,
1213-
&sources,
1214-
Some(&patch.uuid),
1215-
args.common.dry_run,
1216-
mismatch_policy(args.force, args.common.strict),
1217-
)
1218-
.await
1219-
}
1220-
};
1213+
}
1214+
};
12211215

12221216
warn_mismatch_overwrites(&result, &args.common);
12231217
if !result.success {
@@ -1434,7 +1428,7 @@ mod tests {
14341428
.enumerate()
14351429
.map(|(i, status)| VerifyResult {
14361430
file: format!("package/f{i}.js"),
1437-
status: status.clone(),
1431+
status: *status,
14381432
message: None,
14391433
current_hash: None,
14401434
expected_hash: None,

crates/socket-patch-cli/src/commands/fetch_stage.rs

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
//! cache is `repair`'s job, keeping these commands read-only against
88
//! `.socket/`).
99
10+
use std::collections::HashMap;
1011
use std::path::{Path, PathBuf};
1112

1213
use socket_patch_core::api::blob_fetcher::{
@@ -36,6 +37,7 @@ impl StagedSources {
3637
blobs_path: &self.blobs,
3738
packages_path: Some(&self.packages),
3839
diffs_path: Some(&self.diffs),
40+
mem_blobs: None,
3941
}
4042
}
4143
}
@@ -199,6 +201,7 @@ pub async fn stage_patch_sources(
199201
blobs_path: &stage_blobs,
200202
packages_path: Some(&stage_packages),
201203
diffs_path: Some(&stage_diffs),
204+
mem_blobs: None,
202205
};
203206
let fetch_result =
204207
fetch_missing_sources(manifest, &sources, download_mode, &client, None).await;
@@ -244,3 +247,191 @@ pub async fn stage_patch_sources(
244247
_stage: Some(stage),
245248
}))
246249
}
250+
251+
/// In-memory staged sources for the VENDOR flows.
252+
///
253+
/// Existing `.socket/` artifacts are read in place (never copied, never
254+
/// rewritten); patch content that is missing locally is fetched into
255+
/// MEMORY via the patch view endpoint — vendoring writes no
256+
/// `.socket/blobs` entries and no temporary files. The committed
257+
/// `.socket/vendor/` artifact is the patch; nothing else should land on
258+
/// disk.
259+
pub struct MemStagedSources {
260+
blobs: PathBuf,
261+
diffs: PathBuf,
262+
packages: PathBuf,
263+
mem: HashMap<String, Vec<u8>>,
264+
}
265+
266+
impl MemStagedSources {
267+
/// Borrow as the core pipeline's source set (memory overlay first,
268+
/// on-disk artifacts as the read-only fallback).
269+
pub fn as_patch_sources(&self) -> PatchSources<'_> {
270+
PatchSources {
271+
blobs_path: &self.blobs,
272+
packages_path: Some(&self.packages),
273+
diffs_path: Some(&self.diffs),
274+
mem_blobs: Some(&self.mem),
275+
}
276+
}
277+
}
278+
279+
/// The in-memory staging outcome (mirror of [`StageOutcome`]).
280+
pub enum MemStageOutcome {
281+
Ready(MemStagedSources),
282+
Unavailable,
283+
}
284+
285+
/// Stage patch sources for a VENDOR run without writing anything:
286+
/// per-record availability follows the same rule as
287+
/// [`stage_patch_sources`] (all after-blobs on disk, or a diff/package
288+
/// archive on disk), and records with no usable local source have their
289+
/// full per-file content fetched into memory from the patch view
290+
/// endpoint (`blobContent`). Offline runs with missing sources are
291+
/// `Unavailable` with the same diagnostics as the disk stager.
292+
pub async fn stage_vendor_sources_in_memory(
293+
common: &GlobalArgs,
294+
manifest: &PatchManifest,
295+
socket_dir: &Path,
296+
project_root: &Path,
297+
) -> Result<MemStageOutcome, String> {
298+
let blobs = socket_dir.join("blobs");
299+
let diffs = socket_dir.join("diffs");
300+
let packages = socket_dir.join("packages");
301+
302+
let missing_blobs = get_missing_blobs(manifest, &blobs).await;
303+
let missing_diff_archives = get_missing_archives(manifest, &diffs).await;
304+
let missing_package_archives = get_missing_archives(manifest, &packages).await;
305+
306+
let mut to_fetch: Vec<(&str, &str)> = manifest
307+
.patches
308+
.iter()
309+
.filter_map(|(purl, record)| {
310+
let all_blobs_present = record
311+
.files
312+
.values()
313+
.all(|f| !missing_blobs.contains(&f.after_hash));
314+
let diff_present = !missing_diff_archives.contains(&record.uuid);
315+
let pkg_present = !missing_package_archives.contains(&record.uuid);
316+
if all_blobs_present || diff_present || pkg_present {
317+
None
318+
} else {
319+
Some((purl.as_str(), record.uuid.as_str()))
320+
}
321+
})
322+
.collect();
323+
324+
if to_fetch.is_empty() {
325+
return Ok(MemStageOutcome::Ready(MemStagedSources {
326+
blobs,
327+
diffs,
328+
packages,
329+
mem: HashMap::new(),
330+
}));
331+
}
332+
333+
// The committed vendor artifact IS the patched content: harvest its
334+
// afterHash blobs into memory so in-sync re-runs and fresh clones of
335+
// already-vendored projects stage with no network and no disk blobs.
336+
let mut mem =
337+
socket_patch_core::patch::vendor::harvest_artifact_blobs(project_root, &manifest.patches)
338+
.await;
339+
if !mem.is_empty() {
340+
to_fetch.retain(|(purl, _)| {
341+
manifest.patches.get(*purl).is_none_or(|record| {
342+
!record.files.values().all(|f| {
343+
!missing_blobs.contains(&f.after_hash) || mem.contains_key(&f.after_hash)
344+
})
345+
})
346+
});
347+
if to_fetch.is_empty() {
348+
return Ok(MemStageOutcome::Ready(MemStagedSources {
349+
blobs,
350+
diffs,
351+
packages,
352+
mem,
353+
}));
354+
}
355+
}
356+
357+
if common.offline {
358+
if !common.silent && !common.json {
359+
eprintln!(
360+
"Error: {} patch(es) have no local source and --offline is set:",
361+
to_fetch.len()
362+
);
363+
for (purl, _) in to_fetch.iter().take(5) {
364+
eprintln!(" - {}", purl);
365+
}
366+
if to_fetch.len() > 5 {
367+
eprintln!(" ... and {} more", to_fetch.len() - 5);
368+
}
369+
eprintln!("Run \"socket-patch repair\" to download missing artifacts.");
370+
}
371+
return Ok(MemStageOutcome::Unavailable);
372+
}
373+
374+
if !common.silent && !common.json {
375+
println!(
376+
"Fetching {} patch(es)' content (kept in memory)...",
377+
to_fetch.len()
378+
);
379+
}
380+
381+
let (client, _) = get_api_client_with_overrides(common.api_client_overrides()).await;
382+
let mut failed: Vec<&str> = Vec::new();
383+
for (purl, uuid) in &to_fetch {
384+
match client.fetch_patch(common.org.as_deref(), uuid).await {
385+
Ok(Some(patch)) => {
386+
let mut complete = true;
387+
for (file, info) in &patch.files {
388+
let (Some(b64), Some(hash)) = (&info.blob_content, &info.after_hash) else {
389+
if !common.silent && !common.json {
390+
eprintln!(" [error] {purl}: no blob content served for {file}");
391+
}
392+
complete = false;
393+
break;
394+
};
395+
// Same key guard as the disk writer: the hash names the
396+
// lookup key the apply pipeline gates writes on.
397+
if hash.len() != 64 || !hash.bytes().all(|b| b.is_ascii_hexdigit()) {
398+
complete = false;
399+
break;
400+
}
401+
match super::get::base64_decode(b64) {
402+
Ok(bytes) => {
403+
mem.insert(hash.clone(), bytes);
404+
}
405+
Err(_) => {
406+
complete = false;
407+
break;
408+
}
409+
}
410+
}
411+
if !complete {
412+
failed.push(purl);
413+
}
414+
}
415+
_ => failed.push(purl),
416+
}
417+
}
418+
if !failed.is_empty() {
419+
if !common.silent && !common.json {
420+
eprintln!(
421+
"Error: could not fetch patch content for {} patch(es):",
422+
failed.len()
423+
);
424+
for purl in failed.iter().take(5) {
425+
eprintln!(" - {}", purl);
426+
}
427+
}
428+
return Ok(MemStageOutcome::Unavailable);
429+
}
430+
431+
Ok(MemStageOutcome::Ready(MemStagedSources {
432+
blobs,
433+
diffs,
434+
packages,
435+
mem,
436+
}))
437+
}

0 commit comments

Comments
 (0)