-
Notifications
You must be signed in to change notification settings - Fork 1
Dynamically deploy models to match request targets #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Revamp configuration logic using Pydantic models for better validation and maintainability and extend settings with options related to model deployment through the gateway. Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Introduce model records to keep track of deployed models, their deployment type, and idle TTLs. Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Purge containers based on deployment type, implementing the following removal strategy for each one: - Static: Never remove - Manual: Remove if the TTL specified during manual deployment has been exceeded - Auto: Remove if the model has been idle for longer than the TTL specified during auto-deployment, according to the database model record Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Downgrade mlflow to avoid compatibility issues with CMS MLflow server. Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Rename internal services to avoid clashes with services in the CMS network. If we move to Docker Swarm or Kubernetes in the future, we should be able to use FQDNs to avoid conflicts, but for now this appears to be the simplest solution: * minio -> object-store * postgres -> db * rabbitmq -> queue Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
Update integration tests to work with the new config. Tweak config handling and service discovery to fix integration tests: * Explicitly pass config.json path when loading config * Ensure the API can work with IPs as model identifiers since we're forced to use them in the integration tests environment (i.e. accessing containers from the localhost) Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
The integration tests update the config.json file when running, adding the MLflow and MinIO connection details. These are specific to the local testing environment and likely the given run, and therefore don't need to be committed to the repository. This commit adds the config.json file to the .gitignore to prevent accidental commits in the future. Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
8736059 to
0eb9b94
Compare
Provide an admin API to expose on-demand model configuration management, including creation, updating, retrieval, listing, and soft-deletion. Configuration are now stored in the database, with versioning support. The Python client is also updated to support these operations. Signed-off-by: Phoevos Kalemkeris <phoevos.kalemkeris@ucl.ac.uk>
0eb9b94 to
fec32bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces dynamic model deployment capabilities to the CogStack Model Gateway, enabling automatic deployment of models when they're requested. The changes include:
- A comprehensive refactoring of the configuration system from a simple dictionary-based approach to a Pydantic-validated schema
- Introduction of three deployment types (AUTO, MANUAL, STATIC) with different lifecycle management strategies
- New database models for tracking deployed models and on-demand model configurations
- Auto-deployment functionality that can deploy models on-demand when requests target them
- Enhanced ripper service to support multiple TTL strategies (fixed TTL for manual deployments, idle TTL for auto deployments)
Key Changes
- Refactored configuration system with Pydantic validation and hierarchical structure
- Added model lifecycle management with database tracking for usage and idle time
- Implemented auto-deployment with health checking and concurrent deployment protection
- Updated all services (gateway, scheduler, ripper) to use the new config and model management systems
Reviewed changes
Copilot reviewed 48 out of 51 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/ripper/test_main.py | Comprehensive test coverage for new multi-TTL ripper logic with different deployment types |
| tests/unit/common/test_models.py | New file with extensive tests for ModelManager and OnDemandModelConfig |
| tests/unit/common/test_config.py | Updated tests for Pydantic-based config system |
| tests/unit/common/test_tasks.py | Fixed logging import path |
| tests/unit/client/test_client.py | Updated client tests for renamed timeout parameters and new API endpoints |
| tests/integration/test_api.py | Added comprehensive integration tests for deployment and on-demand config APIs |
| tests/integration/utils.py | Added utilities for managing deployed containers in tests |
| tests/conftest.py | Added shared db_manager fixture |
| cogstack_model_gateway/common/config/ | Complete config system refactor with Pydantic models |
| cogstack_model_gateway/common/models.py | New model management system with database tracking |
| cogstack_model_gateway/common/containers.py | Enhanced container discovery and management utilities |
| cogstack_model_gateway/common/tracking.py | Extended tracking client with model type resolution |
| cogstack_model_gateway/gateway/core/auto_deploy.py | New auto-deployment implementation with health checks |
| cogstack_model_gateway/gateway/routers/ | Updated model and admin API routers |
| cogstack_model_gateway/ripper/main.py | Enhanced ripper with multi-TTL support |
| cogstack_model_gateway/scheduler/main.py | Updated to use new config system |
| cogstack_model_gateway/migrations/versions/ | New database migrations for model tracking |
| docker-compose.yaml | Service name updates and config file volume mounts |
| config.json | New JSON-based configuration file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mock_client_instance.request.return_value = mock_response | ||
|
|
||
| async with GatewayClient(base_url="http://test-gateway.com") as client: | ||
| config = await client.update_on_demand_config( |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable config is not used.
| }, | ||
| ) | ||
|
|
||
| initial_count = count_deployed_model_containers() |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable initial_count is not used.
| assert idle_seconds < 1001.0 | ||
|
|
||
| # Test is_model_idle returns False for recently used model | ||
| model = model_manager.create_model( |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'model' is unnecessary as it is redefined before this value is used.
| task_uuid = response.json()["uuid"] | ||
|
|
||
| # Completion should take at least a few seconds for container startup | ||
| task = wait_for_task_completion(task_uuid, tm, expected_status=Status.SUCCEEDED) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'task' is unnecessary as it is redefined before this value is used.
| # for 'autogenerate' support | ||
| # from myapp import mymodel | ||
| # target_metadata = mymodel.Base.metadata | ||
| from cogstack_model_gateway.common.models import Model # noqa: E402, F401 |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Model' is not used.
| from cogstack_model_gateway.common.models import Model # noqa: E402, F401 |
No description provided.