feat: add TTL support for environment instance auto-cleanup #67
feat: add TTL support for environment instance auto-cleanup #67JacksonMei merged 12 commits intomainfrom
Conversation
- Add GetTTL() method to Env model to read TTL value from DeployConfig - Update CreateInstanceByFunction to accept TTL parameter from request - Refactor InitializeFunction API to use request body for parameters - Add FunctionInitializeOptions struct for initialization options This enables TTL-based instance lifecycle management through FaaS backend.
- Change EnvInstance.TTL from string to int64 (seconds) - Update Instance.TTL in faas_model to int64 - Update PodListResponseData.TTL to int64 - Refactor cleanup_service.isExpired() to use int64 TTL: - Check TTL <= 0 instead of empty string - Use time.Duration(instance.TTL) * time.Second directly - Remove time.ParseDuration parsing - Update logging format from %s to %d for TTL - Update test files to use int64 seconds instead of duration strings This simplifies TTL handling by using seconds as the standard unit instead of duration strings.
This reverts commit 27814f1.
- Add --cleanup-interval argument to api-service deployment - Add cleanupInterval value to api-service/values.yaml (default: 5m) - Improve time parsing in cleanup_service.isExpired(): - Use time.DateTime constant instead of hardcoded format string - Add fallback to time.RFC3339 for compatibility - Add comprehensive unit tests for time parsing: - TestIsExpiredWithDateTimeFormat: tests DateTime format parsing - TestIsExpiredWithRFC3339Format: tests RFC3339 fallback parsing - TestIsExpiredWithInvalidTimeFormat: tests invalid time format handling - TestIsExpiredWithInvalidTTLFormat: tests invalid TTL format handling - TestIsExpiredWithEmptyTTL: tests empty TTL handling - TestIsExpiredWithVariousTTLDurations: tests various TTL durations All tests pass successfully.
Summary of ChangesHello @lanmaoxinqing, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the environment management system by introducing robust Time-To-Live (TTL) functionality for environment instances. It centralizes the automatic cleanup process within the API service, allowing instances to be automatically terminated after a specified duration. This change improves resource utilization, reduces manual overhead, and provides better control over environment lifecycles. The update also includes comprehensive testing, improved timestamp handling, and refined API interactions with the FaaS backend, ensuring a more reliable and maintainable system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces TTL support for automatic environment instance cleanup, which is a great feature for managing resource lifecycle. However, a critical security concern has been identified: user-controlled identifiers (like environment names and instance IDs) are used to construct internal API URIs without proper sanitization, posing a risk of path traversal and SSRF against internal services. It is recommended to validate and sanitize all user-supplied identifiers before using them in URI construction. Additionally, please address a potential performance regression in the cleanup logic and a bug in the new unit tests.
| expiredInstance := &models.EnvInstance{ | ||
| ID: "test-instance-1", | ||
| Status: "Running", | ||
| CreatedAt: "2025-01-01 10:00:00", | ||
| TTL: "1h", | ||
| } |
There was a problem hiding this comment.
The test for expired instances uses a future date (2025-01-01 10:00:00) for CreatedAt. The isExpired function checks if time.Now() is after the expiration time. Since time.Now() will be before 2025, this instance will be considered active, not expired, and the test will fail. To fix this, please use a date in the past.
This issue also affects other tests that check for expired instances, such as TestPerformCleanupWithDeleteError, TestIsExpiredWithDateTimeFormat, TestIsExpiredWithRFC3339Format, and TestIsExpiredWithVariousTTLDurations.
| expiredInstance := &models.EnvInstance{ | |
| ID: "test-instance-1", | |
| Status: "Running", | |
| CreatedAt: "2025-01-01 10:00:00", | |
| TTL: "1h", | |
| } | |
| expiredInstance := &models.EnvInstance{ | |
| ID: "test-instance-1", | |
| Status: "Running", | |
| CreatedAt: "2020-01-01 10:00:00", | |
| TTL: "1h", | |
| } |
| //} | ||
| // Synchronously call the function | ||
| instanceId, err := c.CreateInstanceByFunction(functionName, dynamicRuntimeName) | ||
| instanceId, err := c.CreateInstanceByFunction(functionName, dynamicRuntimeName, req.GetTTL()) |
There was a problem hiding this comment.
The functionName parameter, which is derived from user-controlled req.Name and req.Version, is passed to CreateInstanceByFunction and eventually used to construct internal API URIs (e.g., in GetFunction and InitializeFunction) without proper sanitization. An attacker could provide a malicious name containing path traversal characters (e.g., ../../) to manipulate the internal API request path and potentially access or trigger actions on unintended internal endpoints.
| ID: instance.InstanceID, | ||
| IP: instance.IP, | ||
| TTL: "", // No TTL field source available yet, can be added later | ||
| TTL: instance.TTL, // No TTL field source available yet, can be added later |
There was a problem hiding this comment.
The id parameter, which comes from a user-controlled URL path, is used in GetInstance(id) to construct an internal API URI. Lack of sanitization allows an attacker to perform a path traversal attack by providing a crafted ID (e.g., ../../other-endpoint), potentially leading to unauthorized access to internal service functionality.
| envInstances, err := cm.envInstanceService.ListEnvInstances("") | ||
| if err != nil { | ||
| log.Printf("Failed to list environment instances: %v", err) | ||
| return | ||
| } |
There was a problem hiding this comment.
The current implementation of performCleanup fetches all environment instances and then filters for expired ones on the client side. This could lead to performance issues if the number of instances is large, as it puts a load on both the network and the api-service memory. The previous implementation delegated filtering to the backend (.../pods?filter=expired), which is a more scalable approach. Consider re-introducing server-side filtering to avoid fetching all instances into memory.
Overview
This PR introduces TTL (Time-To-Live) support for automatic environment instance cleanup, enabling automatic management of environment lifecycle.
Key Changes
Core Features
cleanupIntervalconfiguration with improved time parsing functionalityClient Integration
Code Quality
AEnvCleanerinterface for cleaner architectureConfiguration Changes
stringfor better API flexibilityTesting
cleanup_service_test.go)Breaking Changes
None