MLServer fails to load models from OCI image mounts on KServe

## Description

MLServer fails to load models when deployed on KServe using OCI model images (model-car sidecar). The model files are present in `/mnt/models` but runtimes fail to load them.

## Environment

- **MLServer Version**: 1.7.0+
- **Platform**: KServe on Kubernetes
- **Storage**: OCI Image via model-car sidecar
- **Affected Runtimes**: XGBoost, LightGBM, CatBoost, and potentially others

## Steps to Reproduce

1. Deploy an InferenceService with OCI image storage:
```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: xgboost-model
spec:
  predictor:
    model:
      modelFormat:
        name: xgboost
      storageUri: oci://quay.io/my-registry/xgboost-model:latest
      runtime: mlserver
```

2. Check MLServer logs
3. Model fails to load despite files being present

## Expected Behavior

Model should load successfully from `/mnt/models` mounted by model-car sidecar.

## Actual Behavior

**Error 1: Path Resolution Failure**
```
XGBoostError: filesystem error: cannot make canonical path: No such file or directory [/mnt/models/model.json]
```

Python can read the file, but C++ libraries fail:
```python
import os
path = "/mnt/models/model.json"
print(os.path.exists(path))  # True
print(os.path.realpath(path))  # /proc/123/root/... or bind mount path

# Python works:
with open(path, 'r') as f:
    f.read()  # ✅ Success

# XGBoost C++ fails:
import xgboost as xgb
xgb.Booster(model_file=path)  # ❌ Fails
```

**Error 2: Race Condition at Startup**

MLServer starts "successfully" but no models are loaded:

```
2026-01-23 11:58:14,866 [mlserver.rest] INFO - HTTP server running on http://0.0.0.0:8080
2026-01-23 11:58:14,889 [mlserver.metrics] INFO - Metrics server running on http://0.0.0.0:8082
2026-01-23 11:58:14,891 [mlserver.grpc] INFO - gRPC server running on http://0.0.0.0:8081
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
```

**No model loading logs appear.** Pod shows ready, but model is not available:

```bash
$ curl http://localhost:8080/v2/models/my-model/ready
# Returns {"error":"Model my-model not found"}

$ kubectl exec <pod> -- ls /mnt/models
# Files are present: model-settings.json, model.bst
```

**Issue**: MLServer starts before model-car mounts the files. It finds no models and doesn't retry.

## Root Cause

1. **Bind Mount Issue**: Model-car sidecar uses bind mounts or proc-based paths (`/proc/<pid>/root/...`) that C++/native libraries cannot canonicalize
2. **Race Condition**: Model-car mounts files after MLServer starts, causing timing issues
3. **Symlink Resolution**: `os.path.realpath()` returns paths inaccessible to C++ code

## Impact

- ❌ Blocks deployments on KServe with OCI images
- ❌ Impacts all runtimes using C++/native libraries (XGBoost, LightGBM, CatBoost)

## Additional Context

- Python file I/O works fine with these mounts
- Issue is specific to native libraries that try to canonicalize paths

## Labels

`bug` `kserve` `deployment` `high-priority` `xgboost` `runtime`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLServer fails to load models from OCI image mounts on KServe #2352

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Impact

Additional Context

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MLServer fails to load models from OCI image mounts on KServe #2352

Description

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Impact

Additional Context

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions