Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use axum::{
extract::State,
http::StatusCode,
response::{IntoResponse, Response},
routing::post,
routing::{get, post},
};
use serde::{Deserialize, Serialize};

Expand Down Expand Up @@ -38,6 +38,10 @@ impl IntoResponse for AppError {
}
}

async fn health() -> impl IntoResponse {
StatusCode::OK
}
Comment on lines +41 to +43

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Shallow health check doesn't reflect service readiness

The handler always returns 200 OK regardless of whether the EmbeddingClient is actually initialised and able to serve requests. If the model loading step (EmbeddingClient::new) succeeded at startup but the underlying inference backend later becomes unavailable, orchestrators relying on this endpoint (e.g. Kubernetes liveness/readiness probes, load-balancer health checks) will continue routing traffic to the instance even though it cannot process embeddings. Consider accepting State(state): State<AppState> and doing a lightweight check — even a no-op call that verifies the client is reachable — and returning 503 Service Unavailable when it fails.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile it wouldn't be reachable if the client is loaded yet. For starting the server the models must be loaded first and that is the point of adding this endpoint

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Looking at main(), EmbeddingClient::new(config)? is called synchronously before the router is built and the server starts listening. If that fails, the process exits — the server never binds. So a reachable /health endpoint does imply the client initialized successfully.

My original concern was valid in the abstract (runtime degradation after startup), but your point stands for this specific use case: if /health responds at all, the client was loaded. That's a reasonable invariant for a simple liveness check.

Fair enough — the current implementation is intentional and correct for its purpose.

Shall I remember this as a rule: "shallow health checks that just return 200 OK are acceptable when the server cannot start without successful initialization, as reachability already implies readiness"?

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.


async fn embed(
State(state): State<AppState>,
Json(req): Json<EmbedRequest>,
Expand Down Expand Up @@ -80,7 +84,10 @@ async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let client = Arc::new(EmbeddingClient::new(config)?);
let state = AppState { client };

let app = Router::new().route("/embed", post(embed)).with_state(state);
let app = Router::new()
.route("/health", get(health))
.route("/embed", post(embed))
.with_state(state);

let port_str = std::env::var("EMBEDDING_PORT").unwrap_or_else(|_| "3000".to_string());
let port: u16 = port_str.parse().map_err(|_| {
Expand Down