Rust library for deploying FlexServ model servers onto Tapis Pods and HPC via Tapis Jobs.
The library is the main product. The Actix-web binary is a thin wrapper and intentionally minimal.
-
Backend module (
backend.rs)Backendenum:Transformers,VLlm,SGLang,TrtLlmBackendParameterSet(command + params + env)BuildBackendParameterSettrait:build_params_for_pod(&self, server)build_params_for_hpc(&self, server)
- Per-backend builders:
TransformersParameterSetBuilder,VLlmParameterSetBuilder,SGLangParameterSetBuilder,TrtLlmParameterSetBuilder
-
Server module (
server.rs)FlexServInstance: tenant URL, user, model id, optional revision/HF token, backendFlexServInstanceBuilder: validated builderModelConfig,TapisConfig,ValidationError
-
Deployment module (
deployment/mod.rs,deployment/pod.rs,deployment/hpc.rs)FlexServDeploymenttrait: asynccreate/start/stop/terminate/monitorFlexServPodDeploymentwithPodDeploymentOptionsFlexServHPCDeploymentwithHpcDeploymentOptionsnew(server, tapis_token, options)for job submissionfrom_existing(tapis_token, job_uuid)for existing jobsjob_status()helper for lightweight status polling
DeploymentResultandDeploymentError
In your project’s Cargo.toml:
[dependencies]
flexserv-deployer = { path = "../FlexServ-Deployer" } # adjust path as neededuse flexserv_deployer::{
Backend,
DeploymentError,
DeploymentResult,
FlexServDeployment,
FlexServInstance,
FlexServPodDeployment,
PodDeploymentOptions,
};This example:
- builds a
FlexServInstance - wraps it in
FlexServPodDeployment - calls
create()to create a volume and pod
use flexserv_deployer::{
Backend, DeploymentError, DeploymentResult, FlexServDeployment,
FlexServInstance, FlexServPodDeployment,
};
#[tokio::main]
async fn main() -> Result<(), DeploymentError> {
env_logger::init();
let tenant_url = std::env::var("TAPIS_TENANT_URL").expect("TAPIS_TENANT_URL");
let tapis_token = std::env::var("TAPIS_TOKEN").expect("TAPIS_TOKEN");
// Model must already exist on the attached volume at /app/models/<model_dir_name>.
let model_id = std::env::var("FLEXSERV_MODEL_ID")
.unwrap_or_else(|_| "no-model-yet".to_string());
let server = FlexServInstance::new(
tenant_url,
"your_tapis_username".to_string(),
model_id,
None, // model_revision
std::env::var("HF_TOKEN").ok(), // optional HF token
None, // default_embedding_model
Backend::Transformers {
command: vec!["python".to_string()],
},
);
let mut deployment = FlexServPodDeployment::new(server, tapis_token);
let result = deployment.create().await?;
match result {
DeploymentResult::PodResult {
pod_id,
volume_id,
pod_url,
tapis_user,
tapis_tenant,
model_id,
..
} => {
println!("Pod created: {}", pod_id);
println!("Volume: {}", volume_id);
println!("Tapis user: {}", tapis_user);
println!("Tapis tenant: {}", tapis_tenant);
if let Some(url) = pod_url {
println!("Pod URL (for inference): {}", url);
}
let auth_token = model_id.replace('/', "_");
println!("Auth token for pod: {}", auth_token);
}
DeploymentResult::HPCResult { .. } => unreachable!("pod deployment returns PodResult"),
}
Ok(())
}Use PodDeploymentOptions with with_options:
let server = FlexServInstance::new(
tenant_url,
"your_tapis_username".to_string(),
"openai-community/gpt2".to_string(),
None,
std::env::var("HF_TOKEN").ok(),
None,
Backend::Transformers { command: vec!["python".to_string()] },
);
let options = PodDeploymentOptions {
deployment_id: Some("550e8400-e29b-41d4-a716-446655440000".to_string()),
volume_size_mb: Some(20 * 1024),
image: Some("tapis/flexserv:1.0".to_string()),
cpu_request: Some(2000),
mem_limit_mb: Some(16384),
gpus: Some(0),
flexserv_secret: Some("mysecret-".to_string()),
..Default::default()
};
let mut deployment = FlexServPodDeployment::with_options(server, tapis_token, options);
let result = deployment.create().await?;let server = FlexServInstance::new(/* same config as create */);
let deployment = FlexServPodDeployment::from_existing(
server,
tapis_token,
pod_id,
volume_id,
);
deployment.start().await?;
deployment.monitor().await?;
deployment.stop().await?;
deployment.terminate().await?;All methods return Result<DeploymentResult, DeploymentError>.
For HPC, use FlexServHPCDeployment with HpcDeploymentOptions to submit and manage jobs:
use flexserv_deployer::{
Backend, FlexServDeployment, FlexServHPCDeployment, FlexServInstance, HpcDeploymentOptions,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let tenant_url = std::env::var("TAPIS_TENANT_URL")
.unwrap_or_else(|_| "https://public.tapis.io".to_string());
let tapis_user = std::env::var("TAPIS_USER").expect("TAPIS_USER is required");
let tapis_token = std::env::var("TAPIS_TOKEN").expect("TAPIS_TOKEN is required");
let server = FlexServInstance::new(
tenant_url,
tapis_user,
"Qwen/Qwen3.5-0.8B".to_string(),
None,
std::env::var("HF_TOKEN").ok(),
None,
Backend::Transformers { command: vec![] },
);
let options = HpcDeploymentOptions::new(
std::env::var("TAPIS_HPC_APP_ID")?,
std::env::var("TAPIS_HPC_APP_VERSION")?,
std::env::var("TAPIS_HPC_EXEC_SYSTEM_ID")?,
std::env::var("TAPIS_HPC_EXEC_SYSTEM_LOGICAL_QUEUE")?,
std::env::var("TAPIS_HPC_MAX_MINUTES")?.parse::<i32>()?,
std::env::var("TAPIS_HPC_ALLOCATION")?,
);
let mut deployment = FlexServHPCDeployment::new(server, tapis_token, options);
let created = deployment.create().await?;
println!("Submitted: {created:#?}");
let monitored = deployment.monitor().await?;
println!("Monitor: {monitored:#?}");
Ok(())
}To manage an existing HPC job UUID:
use flexserv_deployer::{FlexServDeployment, FlexServHPCDeployment};
let mut deployment = FlexServHPCDeployment::from_existing(tapis_token, job_uuid);
deployment.tenant_url = Some(tenant_url); // required when using from_existing
let status = deployment.job_status().await?;
let resubmitted = deployment.start().await?; // resubmit from prior UUID
let cancelled = deployment.stop().await?; // cancel current job
let snapshot = deployment.monitor().await?; // status + full job detailsOnce you have:
pod_urlfromDeploymentResult::PodResultauth_token = model_id.replace('/', "_")
Use any HTTP client (e.g. reqwest) to call the pod:
- Health:
GET {pod_url}/v1/flexserv/health - Models:
GET {pod_url}/v1/models - Completions/Chat:
POST {pod_url}/v1/completions//v1/chat/completions
Headers:
Authorization: Bearer <auth_token>
There are two main categories:
- Unit tests – internal logic (backend, server, deployment).
- Integration tests – call real TAPIS Pods APIs.
At the repo root:
# Unit tests only
./run-tests.sh unit
# All tests (unit + integration)
./run-tests.sh all
# Single integration tests (pass parameters instead of exporting envs)
./run-tests.sh create https://tacc.tapis.io "<TAPIS_TOKEN>"
./run-tests.sh start https://tacc.tapis.io "<TAPIS_TOKEN>" "<your-pod-id>" "<your-volume-id>"
./run-tests.sh stop https://tacc.tapis.io "<TAPIS_TOKEN>" "<your-pod-id>" "<your-volume-id>"
./run-tests.sh monitor https://tacc.tapis.io "<TAPIS_TOKEN>" "<your-pod-id>" ["<your-volume-id>"]
./run-tests.sh terminate https://tacc.tapis.io "<TAPIS_TOKEN>" "<your-pod-id>" "<your-volume-id>"If no argument is provided, the script defaults to unit.
# Unit tests
cargo test --lib
# All tests (unit + integration)
cargo test
# Run a single integration test with output
TAPIS_TENANT_URL=https://tacc.tapis.io TAPIS_TOKEN=<jwt> \
cargo test --test pod_create_integration -- --nocaptureThe Rust integration tests under tests/ still read env vars (TAPIS_TENANT_URL, TAPIS_TOKEN, POD_ID, VOLUME_ID, etc.) and skip themselves if they are missing.
run-tests.sh simply maps its CLI arguments into those env vars for you.
You can also bypass the script and export the env vars manually if you prefer.
Examples are in examples/:
create_pod.rs– create volume + pod using env:TAPIS_TENANT_URL,TAPIS_TOKENFLEXSERV_MODEL_ID(defaultno-model-yet)- optional
HF_TOKEN
hpc_create.rs– submit and monitor an HPC FlexServ job.hpc_poll_status.rs– poll job status viajob_status()then callmonitor().hpc_resubmit_job.rs– resubmit an existing job UUID viastart().hpc_cancel_job.rs– cancel an existing job viastop().pod_monitor.rs– monitor an existing pod.pod_start_stop.rs– start then stop an existing pod.terminate_pod.rs– terminate an existing pod and volume (POD_ID,VOLUME_ID).call_pod.rs– call a running pod’s HTTP API.hash_demo.rs– demonstrate deployment hash generation.
Example:
export TAPIS_TENANT_URL=https://tacc.tapis.io
export TAPIS_TOKEN=<your-jwt>
export FLEXSERV_MODEL_ID=openai-community/gpt2
cargo run --example create_pod -- --nocapture- Pods expect models pre-populated on the volume under
/app/models/<model_dir_name>.
The deployer does not download models at pod startup. - For
FlexServHPCDeployment::from_existing, setdeployment.tenant_urlbefore callingjob_status/start/stop/monitor. - An empty
flexserv_secret/FLEXSERV_SECRETis allowed but insecure; production deployments should use a strong secret.