Skip to content

Commit 7d81d90

Browse files
committed
put offload folder under tpu_inference
Signed-off-by: Juncheng Gu <jcgu@google.com>
1 parent 0d39925 commit 7d81d90

19 files changed

+34
-38
lines changed

.buildkite/features/KV_Cache_Offload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ steps:
1212
commands:
1313
- |
1414
.buildkite/scripts/run_in_docker.sh \
15-
python3 -m pytest -s -v /workspace/tpu_inference/tests/distributed/offload/tpu_offload_accuracy_test.py
15+
python3 -m pytest -s -v /workspace/tpu_inference/tests/offload/tpu_offload_accuracy_test.py
1616
- label: "Record correctness test result for KV Cache Offload"
1717
key: "record_KV_Cache_Offload_CorrectnessTest"
1818
depends_on: "KV_Cache_Offload_CorrectnessTest"

.buildkite/pipeline_jax.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ steps:
122122
--ignore=/workspace/tpu_inference/tests/e2e \
123123
--ignore=/workspace/tpu_inference/tpu_inference/mock \
124124
--ignore=/workspace/tpu_inference/tests/layers/vllm/test_compressed_tensors_moe.py \
125-
--ignore=/workspace/tpu_inference/tests/distributed/offload \
125+
--ignore=/workspace/tpu_inference/tests/offload \
126126
--cov-config=/workspace/tpu_inference/.coveragerc --cov tpu_inference --cov-report term-missing --cov-fail-under=69
127127
128128
- label: "JAX unit tests - kernels"
@@ -269,9 +269,9 @@ steps:
269269
commands:
270270
- |
271271
.buildkite/scripts/run_in_docker.sh \
272-
python3 -m pytest -s -v -x /workspace/tpu_inference/tests/distributed/offload/ \
272+
python3 -m pytest -s -v -x /workspace/tpu_inference/tests/offload/ \
273273
/workspace/tpu_inference/tests/kernels/host_dma_test.py \
274-
--ignore=/workspace/tpu_inference/tests/distributed/offload/tpu_offload_accuracy_test.py
274+
--ignore=/workspace/tpu_inference/tests/offload/tpu_offload_accuracy_test.py
275275
# -----------------------------------------------------------------
276276
# NOTIFICATION STEP
277277
# -----------------------------------------------------------------

examples/offload/gke/benchmarks/deploy-cpu-offload.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ spec:
2929
imagePullPolicy: Always
3030
command: ["/bin/sh", "-c"]
3131
args:
32-
- "vllm serve meta-llama/Llama-3.3-70B-Instruct --kv-transfer-config '{\"kv_connector\":\"TPUOffloadConnector\",\"kv_role\":\"kv_both\",\"kv_connector_module_path\":\"tpu_inference.distributed.offload.tpu_offload_connector\"}' --port 8000 --enable-chunked-prefill --tensor-parallel-size 8 --seed 42 --enable_prefix_caching --gpu-memory-utilization 0.9"
32+
- "vllm serve meta-llama/Llama-3.3-70B-Instruct --kv-transfer-config '{\"kv_connector\":\"TPUOffloadConnector\",\"kv_role\":\"kv_both\",\"kv_connector_module_path\":\"tpu_inference.offload.tpu_offload_connector\"}' --port 8000 --enable-chunked-prefill --tensor-parallel-size 8 --seed 42 --enable_prefix_caching --gpu-memory-utilization 0.9"
3333
env:
3434
- name: HUGGING_FACE_HUB_TOKEN
3535
valueFrom:

examples/offload/gke/pod_tpu_commons_cpu_offload.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ spec:
1818
- --tensor_parallel_size=8
1919
- --max_model_len=1024
2020
- --kv-transfer-config
21-
- '{"kv_connector":"TPUOffloadConnector","kv_connector_module_path":"tpu_inference.distributed.offload.tpu_offload_connector","kv_role":"kv_both"}'
21+
- '{"kv_connector":"TPUOffloadConnector","kv_connector_module_path":"tpu_inference.offload.tpu_offload_connector","kv_role":"kv_both"}'
2222
env:
2323
- name: HUGGING_FACE_HUB_TOKEN
2424
valueFrom:

examples/offload/gke/pod_tpu_commons_cpu_offload_verification.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ spec:
2525
- --max_model_len=1024
2626
- --seed=42
2727
- --kv-transfer-config
28-
- '{"kv_connector":"TPUOffloadConnector","kv_connector_module_path":"tpu_inference.distributed.offload.tpu_offload_connector","kv_role":"kv_both"}'
28+
- '{"kv_connector":"TPUOffloadConnector","kv_connector_module_path":"tpu_inference.offload.tpu_offload_connector","kv_role":"kv_both"}'
2929
env:
3030
- name: HUGGING_FACE_HUB_TOKEN
3131
valueFrom:

examples/offload/gke/pod_tpu_host_offload_unit_tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ spec:
1717
command:
1818
- /bin/bash
1919
- -c
20-
- "pytest -sv tests/distributed/offload/"
20+
- "pytest -sv tests/offload/"
2121
env:
2222
- name: HUGGING_FACE_HUB_TOKEN
2323
valueFrom:

tests/distributed/offload/tpu_offload_accuracy_test.py renamed to tests/offload/tpu_offload_accuracy_test.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,7 @@ def kv_transfer_config():
4040
return KVTransferConfig(
4141
kv_connector="TPUOffloadConnector",
4242
kv_role="kv_both",
43-
kv_connector_module_path=
44-
"tpu_inference.distributed.offload.tpu_offload_connector",
43+
kv_connector_module_path="tpu_inference.offload.tpu_offload_connector",
4544
)
4645

4746

tests/distributed/offload/tpu_offload_connector_scheduler_test.py renamed to tests/offload/tpu_offload_connector_scheduler_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from vllm.v1.core.sched.output import CachedRequestData, SchedulerOutput
1010
from vllm.v1.request import Request
1111

12-
from tpu_inference.distributed.offload.tpu_offload_connector import (
12+
from tpu_inference.offload.tpu_offload_connector import (
1313
RequestTracker, TPUOffloadConnectorScheduler)
1414

1515
_DEFAULT_BLOCK_SIZE = 16

tests/distributed/offload/tpu_offload_connector_worker_test.py renamed to tests/offload/tpu_offload_connector_worker_test.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,12 @@
1515
from jax.sharding import Mesh, NamedSharding, PartitionSpec
1616
from vllm.distributed.kv_transfer.kv_connector.v1.base import KVConnectorRole
1717

18-
from tpu_inference.distributed.offload.tpu_offload_connector import (LoadSpec,
19-
SaveSpec)
20-
from tpu_inference.distributed.offload.tpu_offload_connector import \
18+
from tpu_inference.logger import init_logger
19+
from tpu_inference.offload.tpu_offload_connector import LoadSpec, SaveSpec
20+
from tpu_inference.offload.tpu_offload_connector import \
2121
TPUOffloadConnector as CPUOffloadingConnector
22-
from tpu_inference.distributed.offload.tpu_offload_connector import (
22+
from tpu_inference.offload.tpu_offload_connector import (
2323
TPUOffloadConnectorMetadata, TPUReqMeta)
24-
from tpu_inference.logger import init_logger
2524
from tpu_inference.runner.tpu_runner import TPUModelRunner
2625

2726
logger = init_logger(__name__)

tests/distributed/offload/tpu_offload_cpu_backend_test.py renamed to tests/offload/tpu_offload_cpu_backend_test.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
import pytest
66

7-
from tpu_inference.distributed.offload.cpu_backend import LocalCPUBackend
8-
from tpu_inference.distributed.offload.utils import CpuChunkId
7+
from tpu_inference.offload.cpu_backend import LocalCPUBackend
8+
from tpu_inference.offload.utils import CpuChunkId
99

1010

1111
# Helper to create a mock jax array with a specific size in bytes

0 commit comments

Comments
 (0)