Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
6a74e16
Fix IOStallWatchdog blind-read handling: preserve timer + trip on per…
TjitsevdM May 16, 2026
722249a
Add one-ULP epsilon to _build_spikedata inferred length
TjitsevdM May 17, 2026
dff9697
Defensive cleanups: tmp cleanup, ravel, numpy NaN, device-index warn,…
TjitsevdM May 17, 2026
8a86aa8
Add HIGH-item tests for refactor/remove-globals; strip _GlobalsStub shim
TjitsevdM May 17, 2026
f769c2e
Add HIGH-item tests, batch 2: branch test coverage closeout
TjitsevdM May 18, 2026
f58dfde
Honor curation_history in Compiler via include_failed_units opt-in
TjitsevdM May 18, 2026
e25eefd
Add MED-priority tests for branch refactor/remove-globals coverage
TjitsevdM May 18, 2026
a57e74f
Resolve cluster→channel via cluster_info.tsv ch + templates fallback
TjitsevdM May 18, 2026
0a24732
Add MED-priority Core spikedata boundary tests
TjitsevdM May 18, 2026
dda9b16
Use np.lib.format.open_memmap for waveform memmap allocation
TjitsevdM May 18, 2026
a3da17a
Add MED-priority I/O, MCP, and Batch Jobs boundary tests
TjitsevdM May 18, 2026
99ded3a
Flush waveform memmap per-unit so writes are durable and visible to I…
TjitsevdM May 18, 2026
3cf885a
Add MED-priority boundary tests across I/O, MCP, batch, canary, curation
TjitsevdM May 18, 2026
5c2d849
Document channel-numbering assumption in compare_sorter docstring
TjitsevdM May 18, 2026
888636b
Consolidate save_traces: port trace_io polish, delete dead trace_io.py
TjitsevdM May 18, 2026
808ac0d
Add MED tests for watchdog/preflight NaN gaps + reference trace zero-…
TjitsevdM May 18, 2026
55acbb4
Add optional out_namespace to MCP concatenate_units
TjitsevdM May 18, 2026
6f9a9ef
Document destructive default in pcm_stack_threshold + None sentinel
TjitsevdM May 18, 2026
57c0d8a
Boundary guards: non-uniform ISI grid raises, PCM threshold preserve_…
TjitsevdM May 18, 2026
609aa09
Round-trip start_time through NWB; pre-read both attrs for pynwb path
TjitsevdM May 18, 2026
9714871
Add MCP MED-priority boundary tests + adapt _resampled_isi non-unifor…
TjitsevdM May 18, 2026
6945961
Expand _dump_dict schema: None, tuple, set, frozenset, string ndarray
TjitsevdM May 18, 2026
6adad53
Pin IOStallWatchdog blind-read trip contract (6 tests)
TjitsevdM May 18, 2026
0d91204
Defensive cleanups: classifier dedup, banner constants, KSE cluster_i…
TjitsevdM May 18, 2026
d903ac6
Pin WaveformExtractor pre-allocation + Phy channel_map contracts (13 …
TjitsevdM May 18, 2026
c69f7c2
Adapt NWB exporter tests to round-trip contract (commit 609aa09)
TjitsevdM May 18, 2026
0482cfa
Pin parallel-session source contracts: Compiler / save_traces / class…
TjitsevdM May 19, 2026
7fd6038
Pin parallel-session source contracts: preserve_nan / out_namespace /…
TjitsevdM May 19, 2026
121f20a
Pin _atomic_write_pickle tmp cleanup + _resolve_device_index warning …
TjitsevdM May 21, 2026
37078a0
Pin include_failed_units integration + plot_curation_bar deprecation-…
TjitsevdM May 21, 2026
8dd7d13
Sanitize numpy scalars + ndarrays for JSON in MCP; cap inline size
TjitsevdM May 21, 2026
cbdec22
Strict ValueError on config-param NaN/Inf in compute_inactivity_timeo…
TjitsevdM May 21, 2026
42fe7af
Raise ValueError on NaN/Inf threshold in PairwiseCompMatrix.to_networkx
TjitsevdM May 21, 2026
ef13649
Reject NaN/Inf threshold in PairwiseCompMatrix.to_networkx
TjitsevdM May 21, 2026
6457904
Add boundary tests: channel_raster N=0, spike_shuffle all-empty, get_…
TjitsevdM May 21, 2026
31d6d8d
Validate raw_data/raw_time shape match in _read_raw_arrays
TjitsevdM May 21, 2026
42d1341
Guard bin_size_ms vs window in align_to_events(kind='rate')
TjitsevdM May 21, 2026
1ef249c
Guard get_frac_active edges: inverted + wrong-shape inputs raise
TjitsevdM May 21, 2026
8948ba1
Reject double-__enter__ on all three watchdogs (Host/Gpu/IOStall)
TjitsevdM May 21, 2026
1eacab5
Pin _dump_dict rejection paths, NWB start_time loader, _sanitize_for_…
TjitsevdM May 21, 2026
d955e46
Short-circuit empty times in _resampled_isi (no more IndexError)
TjitsevdM May 21, 2026
b1428b9
Suppress shuffle_z_score all-NaN noise via narrow catch_warnings
TjitsevdM May 21, 2026
d6b39ad
Pin double-enter guard for GpuMemoryWatchdog and IOStallWatchdog (sib…
TjitsevdM May 21, 2026
b0ae8de
Reject empty-channel/non-2D input in _build_reference_trace
TjitsevdM May 21, 2026
6e6efde
Log warning per missing legacy WaveformConfig key in extractor init
TjitsevdM May 21, 2026
ee8161a
Pin _sanitize_for_json numpy-scalar coercion + MCP tool-schema contracts
TjitsevdM May 21, 2026
1fbc683
Add folder_count_mismatch finding to run_preflight
TjitsevdM May 21, 2026
925a50b
Reject S=0 in RateSliceStack.__init__ symmetric with T=0
TjitsevdM May 21, 2026
a8ad4bc
Pin remaining 4 triage items: _sanitize_for_json 0-D + cap, _resample…
TjitsevdM May 21, 2026
0782c28
Symmetric salvage warning in SpikeData.append
TjitsevdM May 21, 2026
3ca0ff9
Pin two self-flagged test gaps: run_preflight folder-count-mismatch +…
TjitsevdM May 21, 2026
0ade2ad
Idempotent delete_job on both paths in KubernetesBatchJobBackend
TjitsevdM May 21, 2026
18e9d0b
Vectorise _signal_reached_baseline via np.convolve
TjitsevdM May 21, 2026
0dff071
Track inode in LogInactivityWatchdog for rotation detection
TjitsevdM May 22, 2026
dd07ab7
Fix _sanitize_for_json 0-D ndarray TypeError; route through scalar br…
TjitsevdM May 22, 2026
a83bf26
Move Maxwell .h5 dispatch logic into maxwell_io.load_maxwell_with_fal…
TjitsevdM May 22, 2026
2cbad76
Refactor Tee: explicit _TeeWriter wrapper, no MethodType monkey-patch
TjitsevdM May 22, 2026
120555c
docs: batch-job base-image rebuild workflow + fix Sphinx docstring wa…
TjitsevdM May 22, 2026
e55e89f
test: pin RateData.frames empty-times guard and SpikeData length prec…
TjitsevdM May 23, 2026
3d8d890
spikedata: add boundary guards for raster offset, oversized kernels, …
TjitsevdM May 24, 2026
6d2df3d
rt_sort: hard-code keep_good_only=False instead of reading Kilosort c…
TjitsevdM May 24, 2026
295e165
test: edge-case batch — boundary contracts, log finders, curation, pc…
TjitsevdM May 24, 2026
062943b
style: apply black to test files
TjitsevdM May 24, 2026
8574931
test: fix CI — gate Sanitize-for-json tests on MCP_SERVER_AVAILABLE; …
TjitsevdM May 24, 2026
a60ee49
test: stub fit_gplvm boundary test instead of running JAX on degenera…
TjitsevdM May 24, 2026
de9cd32
test: drop sparse-stPR test causing native abort in CI; fix burst tes…
TjitsevdM May 24, 2026
6e80797
test: drop suspect-cluster of MED tests causing CI native abort
TjitsevdM May 24, 2026
f91cf6c
ci: switch pytest to -v to identify which test triggers the Linux hea…
TjitsevdM May 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ jobs:
pip install scikit-learn networkx pandas matplotlib tqdm

- name: Run tests
run: pytest -q
run: pytest -v --tb=short -p no:cacheprovider


18 changes: 16 additions & 2 deletions docs/source/guides/batch_jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,18 @@ Build reusable base images for CPU and GPU workloads:

.. code-block:: bash

docker build -f docker/analysis-base/Dockerfile.cpu -t spikelab/analysis-base:cpu .
docker build -f docker/analysis-base/Dockerfile.gpu -t spikelab/analysis-base:gpu .
bash scripts/build_base_image.sh cpu spikelab/analysis-base:cpu
bash scripts/build_base_image.sh gpu spikelab/analysis-base:gpu

The base image bakes in the SpikeLab source via ``COPY src ./src`` and
``pip install -e .``. It is a frozen snapshot — published SpikeLab releases do
not update an existing image automatically. Rebuild whenever the library
source has changed and you need that change reflected on the cluster.

When iterating on a feature branch, build under a developer-scoped tag (e.g.,
``ghcr.io/<org>/spikelab-analysis-base:${USER}-$(git rev-parse --short HEAD)``)
and pass it explicitly via ``--image`` so concurrent developers do not clobber
each other's shared ``:cpu`` / ``:gpu`` tags.

Temporary images
^^^^^^^^^^^^^^^^
Expand All @@ -232,6 +242,10 @@ Build and push a temporary image for a single run:
bash scripts/build_temp_image.sh gpu ghcr.io/<org>/spikelab-analysis-temp:<tag>
bash scripts/push_temp_image.sh ghcr.io/<org>/spikelab-analysis-temp:<tag>

This layers analysis-time files on top of an existing ``analysis-base`` image
without rebuilding it. Use this when only the analysis script changed; if
``src/spikelab/`` itself changed, rebuild the base image first (see above).

Reference this tag in the ``ContainerSpec`` when creating your ``JobSpec``.


Expand Down
27 changes: 27 additions & 0 deletions scripts/build_base_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail

if [[ $# -lt 2 ]]; then
echo "Usage: $0 <cpu|gpu> <image-tag>"
echo "Example: $0 cpu ghcr.io/acme/spikelab-analysis-base:dev-abc1234"
exit 1
fi

profile="$1"
image_tag="$2"

case "$profile" in
cpu) dockerfile="docker/analysis-base/Dockerfile.cpu" ;;
gpu) dockerfile="docker/analysis-base/Dockerfile.gpu" ;;
*)
echo "Error: profile must be 'cpu' or 'gpu', got '$profile'"
exit 1
;;
esac

docker build \
-f "${dockerfile}" \
-t "${image_tag}" \
.

echo "BUILT_IMAGE=${image_tag}"
26 changes: 26 additions & 0 deletions src/spikelab/batch_jobs/INSTRUCTIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,38 @@ These scripts are in the SpikeLab repository under `scripts/` and `docker/`. The
- `python scripts/generate_job_config.py --image <image-tag> --profile <cpu|gpu> --output configs/batch-temp-job.yaml`
5. Confirm image is pullable from target cluster/namespace before deploy.

### When SpikeLab source has changed (developer iteration)

The `build_temp_image.sh` workflow above layers analysis code on top of an existing `analysis-base` image. It does **not** capture changes to `src/spikelab/` itself. If the user has modified the SpikeLab library (e.g., they are on a feature branch with new methods that the submitted script depends on), the `analysis-base` image must be rebuilt first — otherwise the running container exposes a stale API and the job will fail with `AttributeError` or run against outdated behavior.

In that case, rebuild and push a **developer-scoped base image** before submitting, and pass it explicitly via `--image`:

```bash
# From SpikeLab repo root. Use ${USER:-${USERNAME}} for Linux/Mac/Windows compatibility.
USER_TAG="ghcr.io/<org>/spikelab-analysis-base:${USER:-${USERNAME}}-$(git rev-parse --short HEAD)"

bash scripts/build_base_image.sh cpu "${USER_TAG}" # or 'gpu'
bash scripts/push_temp_image.sh "${USER_TAG}"

# Submit using the freshly built image
spikelab-batch-jobs deploy-job \
--profile <profile> \
--job-config <path> \
--image "${USER_TAG}"
```

Notes:
- The Dockerfile uses `COPY src ./src`, so **uncommitted edits in `src/spikelab/` are also baked into the image**. This is useful for fast iteration but can be surprising — confirm `git status` reflects the state you intend to ship.
- Use a developer-scoped tag (username + short SHA) rather than the shared `:cpu`/`:gpu` tags so concurrent developers do not clobber each other's images.
- The shared `ghcr.io/braingeneers/spikelab-analysis-base:cpu` / `:gpu` tags are static snapshots — they do **not** track new SpikeLab releases automatically. Always rebuild when the library source has changed locally.

## Fixed Workflow

1. **Preflight checks**
- Run `kubectl version --client`.
- Run `kubectl config current-context`.
- Validate registry/image tag exists and is pushed.
- If `git status` shows changes to `src/spikelab/`, the cluster-side image is stale relative to local code. Rebuild and push a developer-scoped base image before submitting (see "When SpikeLab source has changed" under Container Prep) and pass the resulting tag via `--image`.
- Optionally verify S3 access if asked by the user.
2. **Validate inputs**
- Ensure `--job-config` is present.
Expand Down
28 changes: 22 additions & 6 deletions src/spikelab/batch_jobs/backend_k8s.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,17 +82,33 @@ def apply_manifest(self, manifest_path_or_str: str) -> str:
return payload["metadata"]["name"]

def delete_job(self, name: str) -> None:
"""Delete a job and its pods."""
"""Delete a job and its pods. Idempotent: missing jobs are a no-op.

Matches the ``kubectl --ignore-not-found=true`` semantic on
the fallback path so the two delete paths behave the same
way for the missing-job case. Previously the Python
kubernetes-client path propagated ``ApiException(404)``
verbatim while the kubectl path exited cleanly.
"""
if self._batch_api is None:
self._run_kubectl(
["delete", "job", name, "-n", self.namespace, "--ignore-not-found=true"]
)
return
self._batch_api.delete_namespaced_job(
name=name,
namespace=self.namespace,
body=client.V1DeleteOptions(propagation_policy="Background"),
)
try:
self._batch_api.delete_namespaced_job(
name=name,
namespace=self.namespace,
body=client.V1DeleteOptions(propagation_policy="Background"),
)
except client.exceptions.ApiException as exc:
if exc.status == 404:
# Missing job — idempotent no-op, matches kubectl
# ``--ignore-not-found`` behaviour. Any other API
# error (403 Forbidden, 500 Server Error, etc.)
# still propagates.
return
raise

def job_status(self, name: str) -> str:
"""Return one of Pending/Running/Complete/Failed/Unknown."""
Expand Down
7 changes: 0 additions & 7 deletions src/spikelab/data_loaders/data_exporters.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,13 +270,6 @@ def export_spikedata_to_nwb(
when prefer_pynwb=False.
"""
ensure_h5py()
if sd.start_time != 0:
warnings.warn(
f"Exporting event-centered SpikeData (start_time={sd.start_time}) "
"to NWB. The NWB format does not store start_time, so spike times "
"are written as-is. On reload, start_time will default to 0.",
UserWarning,
)
counts = [len(t) for t in sd.train]
flat_ms = np.concatenate(sd.train) if sum(counts) else np.array([], float)
flat_s = times_from_ms(flat_ms, "s", fs_Hz=None)
Expand Down
Loading
Loading