Skip to content

Adds LLM-based test case generation API endpoints#195

Open
happylittle7 wants to merge 1 commit intodevfrom
feat/llm-generate
Open

Adds LLM-based test case generation API endpoints#195
happylittle7 wants to merge 1 commit intodevfrom
feat/llm-generate

Conversation

@happylittle7
Copy link
Copy Markdown
Contributor

Overview

Introduces integration with an external LLM service to automate test data generation for programming problems, enhancing the problem creation workflow and reducing manual effort.

Details

  • Configures LLM test generation service settings, including endpoint and timeout.
  • Implements service logic to handle solution uploads, test case generation, and service health checks.
  • Provides new API endpoints for:
    • Health checks of the LLM testgen service
    • Automated test case generation based on problem data
    • Customizable test case generation using API parameters
    • Saving generated test cases directly to a problem
  • Updates routing to expose these endpoints.
  • Ensures permission checks and consistent API response formatting.
  • Supports both direct input/output generation and generation using provided solution code.

Motivation

Automates a previously manual and error-prone process, streamlines problem authoring, and provides a scalable foundation for future AI-assisted features.

- Configure LLM test generation settings in settings.py
- Implement LLM test generation logic in llm_testgen.py
- Add API views for LLM test generation in llm_testgen.py
- Update URLs to include LLM test generation endpoints
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces LLM-based automated test case generation functionality, integrating with an external LLM service to reduce manual effort in creating test data for programming problems.

Key Changes:

  • Adds new API endpoints for health checks, test case generation, and saving generated test cases
  • Implements service layer to handle LLM API communication including solution uploads and test generation
  • Configures LLM service settings with endpoint URL and timeout values

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 20 comments.

File Description
problems/views/llm_testgen.py Implements four API view classes for health checks, automatic generation, custom generation, and saving test cases with permission checks
problems/services/llm_testgen.py Provides service layer functions for uploading solutions, generating test cases, and health checks of the external LLM service
problems/urls.py Adds URL routing for the four new LLM test generation endpoints
back_end/settings.py Adds configuration for LLM service URL and timeout settings

Critical Issues Found:

  • Multiple database model field mismatches (Test_cases uses path fields not data fields, Problem_subtasks missing expected fields)
  • Security concerns with hardcoded production IP addresses and missing rate limiting
  • Missing input validation and potential race conditions in test case creation
  • Lack of test coverage for new endpoints

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""

import logging
import json
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'json' module is imported but never used in this file. Remove this unused import to keep the code clean.

Suggested change
import json

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +364
existing_count = Test_cases.objects.filter(subtask_id=subtask).count()

# 建立測資
Test_cases.objects.create(
subtask_id=subtask,
idx=existing_count + 1,
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition when computing 'idx' for test cases. The code queries the count and then creates the test case in two separate operations. If multiple requests create test cases for the same subtask concurrently, they could compute the same 'idx' value, leading to a database constraint violation on the unique constraint (subtask_id, idx). Consider using database-level sequence generation or wrapping this in a transaction with appropriate locking.

Copilot uses AI. Check for mistakes.
response.raise_for_status()
result = response.json()

logger.info(f'Upload solution response: {result}')
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code logs the full API response payload which may contain sensitive information or large data that could flood logs. Consider using logger.debug for detailed payload logging rather than logger.info, and potentially sanitize or truncate the logged data.

Suggested change
logger.info(f'Upload solution response: {result}')
# Log the response payload in debug level and truncate to avoid flooding logs
result_str = str(result)
max_log_length = 1000
if len(result_str) > max_log_length:
result_str = result_str[:max_log_length] + '... [truncated]'
logger.debug(f'Upload solution response: {result_str}')

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +35
'javascript': 'python', # LLM 服務不支援 JS,暫時用 python
'js': 'python',
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaScript to Python fallback for unsupported languages is problematic. It silently converts JavaScript code to be executed as Python, which will likely fail during execution. Instead, this should return an error indicating that the language is not supported by the LLM service.

Copilot uses AI. Check for mistakes.
subtasks.append({
'id': st.subtask_no,
'name': f'Subtask {st.subtask_no}',
'desc': st.description or '',
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Problem_subtasks model has 'description' as a potential field based on its usage pattern in the service (line 288), but this field doesn't exist in the model definition. The model only has 'weight', 'time_limit_ms', 'memory_limit_mb', and timestamp fields. This will cause an AttributeError at runtime.

Suggested change
'desc': st.description or '',
'desc': getattr(st, 'description', '') or '',

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +16

logger = logging.getLogger(__name__)

# LLM 測資生成服務設定
LLM_TESTGEN_API_URL = getattr(settings, 'LLM_TESTGEN_API_URL', 'http://34.81.90.111:8001')
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM service URL should be validated when loaded from settings. If the URL is malformed or points to an invalid location, requests will fail with unclear errors. Consider adding validation to ensure the URL is properly formatted and optionally check service availability on startup.

Suggested change
logger = logging.getLogger(__name__)
# LLM 測資生成服務設定
LLM_TESTGEN_API_URL = getattr(settings, 'LLM_TESTGEN_API_URL', 'http://34.81.90.111:8001')
from urllib.parse import urlparse
logger = logging.getLogger(__name__)
# LLM 測資生成服務設定
DEFAULT_LLM_TESTGEN_API_URL = 'http://34.81.90.111:8001'
def _validate_llm_service_url(url: str) -> str:
"""
驗證並標準化 LLM 服務的基底 URL
如果設定的 URL 無效會記錄警告並回退到預設值
"""
if not url:
logger.warning(
"LLM_TESTGEN_API_URL is empty or not set; falling back to default '%s'.",
DEFAULT_LLM_TESTGEN_API_URL,
)
return DEFAULT_LLM_TESTGEN_API_URL
parsed = urlparse(url)
if parsed.scheme not in ("http", "https") or not parsed.netloc:
logger.warning(
"Invalid LLM_TESTGEN_API_URL '%s'; falling back to default '%s'.",
url,
DEFAULT_LLM_TESTGEN_API_URL,
)
return DEFAULT_LLM_TESTGEN_API_URL
# 移除結尾的斜線以避免組 URL 時出現重複的 '/'
return url.rstrip("/")
LLM_TESTGEN_API_URL = _validate_llm_service_url(
getattr(settings, 'LLM_TESTGEN_API_URL', DEFAULT_LLM_TESTGEN_API_URL)
)

Copilot uses AI. Check for mistakes.
Comment on lines +350 to +351
'score': 100 // len(set(t.get('subtask_id', 1) for t in testcases)),
'description': f'Subtask {subtask_no}'
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Problem_subtasks model does not have a 'score' field as seen in the model definition (it has 'weight' instead), and there's no 'description' field either. The defaults dictionary should use 'weight' instead of 'score', and should not reference non-existent 'description' field. Additionally, the field used in line 288 ('num_testcases') doesn't exist in the Problem_subtasks model.

Suggested change
'score': 100 // len(set(t.get('subtask_id', 1) for t in testcases)),
'description': f'Subtask {subtask_no}'
'weight': 100 // len(set(t.get('subtask_id', 1) for t in testcases)),

Copilot uses AI. Check for mistakes.
Comment on lines +362 to +367
Test_cases.objects.create(
subtask_id=subtask,
idx=existing_count + 1,
input_data=input_data,
expected_output=output_data
)
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Test_cases model uses 'input_path' and 'output_path' fields for file storage, not 'input_data' and 'expected_output' as used here. According to the model definition (lines 290-295), test cases store file paths, not the actual data content. This will cause a database error when trying to create test cases.

Copilot uses AI. Check for mistakes.

data = request.data
testcases = data.get('testcases', [])
create_subtasks = data.get('create_subtasks', True)
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter 'create_subtasks' is accepted in the request but never actually used in the logic. The code always creates subtasks via get_or_create regardless of this parameter's value. Either remove this unused parameter from the API contract or implement the conditional logic to respect it.

Copilot uses AI. Check for mistakes.
'ok': True,
'mode': result.get('mode'),
'testcases': result.get('data', {}).get('testcases', []),
'raw_response': result.get('data')
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API response includes 'raw_response' which may expose internal details or sensitive information from the LLM service. Consider whether this detailed response data should be included in production responses, or if it should only be available in debug mode or to admin users.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants