Releases: AI-Hypercomputer/xpk
Releases · AI-Hypercomputer/xpk
v1.12.0
v1.11.0
v1.10.0
What's Changed
Improvements
- Clarify path installation is only for installing from source by @bvandermoon in #1170
Bug fixes
- Fix circular import issue by @FIoannides in #1166
New Contributors
- @bvandermoon made their first contribution in #1170
Full Changelog: v1.9.0...v1.10.0
v1.9.0
What's Changed
New Features
- Add support for n2-standard-4 machine type. by @cpgaffney1 in #1128
- feat: Support Super-slicing Pathways workload by @jamOne- in #1119
- Add NATIVE_CLUSTER_TOOLKIT_ENABLED feature flag by @scaliby in #1133
- Add gcluster binary to Dependency Auto Download (DAD) mechanism by @scaliby in #1137
- feat: --no-use-parallel-containers in workload create by @jamOne- in #1136
- Add
--dependency-auto-downloadtogclusterexecutions by @scaliby in #1139 - feat: add private control plane endpoint support with custom subnet by @kryvokhyzha in #1124
- Launch Native Cluster Toolkit Execution by @scaliby in #1158
Improvements
- chore: Migrate gsutil usage to gcloud storage by @gurusai-voleti in #1064
- Add consent to delete old workload on duplicate by @scaliby in #1126
- Improve workload scheduling error message by @scaliby in #1131
- Extract CommandRunner base class to its own file by @scaliby in #1138
- Fix mtc updates by @FIoannides in #1115
- Increase kueue_manager cpu and memory by @SikaGrr in #1140
- Fix variable name for super slicing check by @jamOne- in #1142
- Move deployment dir computation to CommandRunner by @scaliby in #1145
- Implement NativeCommandRunner by @scaliby in #1146
- Update check for topology requirement by @FIoannides in #1150
- Improve error message for host-maintenance-interval error by @scaliby in #1153
- Update cluster toolkit + fix
zipdownloads by @scaliby in #1156 - Remove docker as a prerequisite for using xpk in docs by @scaliby in #1159
Bug fixes
- fix: correct placement of jobset exclusive placement annotations for Pathways by @jamOne- in #1122
- Fix formatting of Pathways workload message by @jamOne- in #1129
- ci: remove caching from GitHub Actions to deflake presubmits by @jamOne- in #1127
- fix: Update placement policy condition to include reservation by @jamOne- in #1130
- fix: xpk workload list Accelerator VMs counting by @jamOne- in #1132
gclusterDAD flag by @scaliby in #1144- fix: stacktrace-explorer sidecar hangs after main container exits by @kryvokhyzha in #1134
- fix(workload): resolve Kubernetes v0.15+ condition sorting issue by @jamOne- in #1149
- Fix
--num-slicesfallback calculation by @scaliby in #1152 - Update zone for GPU test by @FIoannides in #1100
- Update jobset used by GPUs from version 0.7.2 to 0.8.1 by @FIoannides in #1151
- Fix master ipv4 cidr psc by @FIoannides in #1154
- Fix inspector slice logs by @scaliby in #1155
- fix: JobSet Status Resolution by @jamOne- in #1161
New Contributors
- @gurusai-voleti made their first contribution in #1064
- @cpgaffney1 made their first contribution in #1128
Full Changelog: v1.8.0...v1.9.0
v1.8.0
What's Changed
New Features
- Pathwaysjob CRD migration by @FIoannides in #1099
- Enable Crane by @SikaGrr in #1120
Improvements
- Update docs to accomodate for DAD by @scaliby in #1102
- Refactoring workload list parsing to Python by @jamOne- in #1089
- Refactor pathways workload scheduling to use Jinja template by @jamOne- in #1116
Bug fixes
- fix: workload list priorityClassName for super-slicing workloads by @jamOne- in #1105
- Revert "Pathwaysjob CRD migration (#1099)" by @jamOne- in #1107
- fix: xpk workload list should sum podSet counts by @jamOne- in #1106
- Pathwaysjob migration by @FIoannides in #1108
- fix aggregate reservation accelerator type matching for DWS Calendar by @kryvokhyzha in #1114
- fix: relax Kueue version check for sub and super-slicing workloads by @jamOne- in #1111
- fix: skip reservation capacity assessment when no nodepools need to be created by @jamOne- in #1125
New Contributors
- @kryvokhyzha made their first contribution in #1114
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
New Features
- Add DEPENDENCY_AUTO_DOWNLOAD flag by @scaliby in #1088
- Add script to compute dependencies checksums by @scaliby in #1091
- Add dependency auto download downloader logic by @scaliby in #1094
- Integrate deps downloading with the rest by @scaliby in #1095
- Launch DAD + Update dependencies by @scaliby in #1101
Improvements
- Align download and install directory by @wstcliyu in #1090
- Add .multi_agent to .gitignore by @jamOne- in #1092
- feat: add flag to disable DAD by @scaliby in #1097
- Ignore nodepool creation errors by @scaliby in #1096
Bug fixes
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
New Features
- Reservations capacity assesment by @jamOne- in #1057
- feat: add tpu7 support by @bhuvanpkaruturi in #1072
- feat: allow Numa aware workloads with super-slicing by @jamOne- in #1080
- Add customizable binaries support by @scaliby in #1085
Improvements
Bug fixes
- Add TELEMETRY_TRASH_EXECUTION to goldens and verify-goldens by @jamOne- in #1071
- Fix b/485965692: Support custom topologies in superslicing by @jamOne- in #1082
New Contributors
- @bhuvanpkaruturi made their first contribution in #1072
Full Changelog: v1.5.0...v1.6.0
v1.5.0
What's Changed
New Features
- feat: Increase number of concurrent NPs to be created to 100 by @jamOne- in #1068
- docs: Add Super-slicing docs by @jamOne- in #1069
Improvements
- Parallelize make goldens execution by @jamOne- in #1061
make verify-goldensby @jamOne- in #1062- Add crane workload integration test by @SikaGrr in #1063
- fix: Super-slicing cluster documentation by @jamOne- in #1070
Full Changelog: v1.4.0...v1.5.0
v1.3.0
What's Changed
Improvements
- Record used command flags in telemetry by @scaliby in #1019
- CommandsTester autopatching setup by @jamOne- in #1025
- refactor: Introduce ReservationLink dataclasses and update function signatures by @jamOne- in #1029
- make verify: remove installation by @jamOne- in #1033
- Fix pylint by @jamOne- in #1035
Bug fixes
Full Changelog: v1.2.0...v1.3.0
v1.2.0
What's Changed
Improvements
Bug fixes
- Fix Super-slicing nume aware conflict by @jamOne- in #1001
- Fix Super-slicing on 1 cube by @jamOne- in #1002
- Upgrade any existing nodepools after Lustre driver installation by @SikaGrr in #1005
- Set big controller resources for Super-slicing by @jamOne- in #1007
- Fix workload resources in pathways and super-slicing with v7x by @jamOne- in #1009
- Limit the coreDNS replica count to the desired number of default pool… by @SikaGrr in #1010
Full Changelog: v1.1.0...v1.2.0