Updates checksum calculation from MD5 to SHA256#160
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the checksum algorithm from MD5 to SHA256 for problem testcase archive verification, improving security and cryptographic strength. The changes span backend services, API endpoints, sandbox client integration, and documentation.
Key Changes:
- Introduced utility functions
get_problem_testcase_hash()andget_problem_testcase_path()to retrieve SHA256 hashes and file paths for testcase archives - Updated sandbox submission payload to include SHA256 hash instead of placeholder TODO value
- Modified API endpoint documentation and implementation to return SHA256 checksums instead of MD5
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
submissions/sandbox_client.py |
Integrated get_problem_testcase_hash() to fetch and validate SHA256 hashes before submission; added error handling for missing testcase packages; refactored settings access and payload assembly |
problems/views/api.py |
Updated ProblemTestCaseChecksumView to compute and return SHA256 checksums instead of MD5 for sandbox verification |
problems/services/storage.py |
Added two new utility functions for retrieving SHA256 hash and file path of testcase archives; leverages existing _sha256_of_fileobj() helper |
docs/problems.MD |
Updated documentation to reflect SHA256 checksum usage in API responses and endpoint descriptions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| file_content = submission.source_code.encode('utf-8') | ||
| # 6. 準備程式碼檔案 | ||
| source_code = submission.source_code or '' | ||
| file_extension = get_file_extension(submission.language_type) |
There was a problem hiding this comment.
The get_file_extension function is being called with submission.language_type, which is a numeric code (0=C, 1=C++, 2=Python, etc.). However, get_file_extension expects a language string ('c', 'cpp', 'python', etc.) as its parameter. The converted language string is already available in the language variable from line 85. This should be changed to use language instead of submission.language_type.
| file_extension = get_file_extension(submission.language_type) | |
| file_extension = get_file_extension(language) |
| file_extension = get_file_extension(submission.language_type) | ||
| files = { | ||
| 'file': (filename, BytesIO(file_content), 'text/plain') | ||
| 'file': (f'solution{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain') |
There was a problem hiding this comment.
The filename is missing a dot before the extension. The current code generates filenames like 'solutionpy' or 'solutioncpp' instead of 'solution.py' or 'solution.cpp'. This should be changed to f'solution.{file_extension}'.
| 'file': (f'solution{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain') | |
| 'file': (f'solution.{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain') |
| except Problems.DoesNotExist: | ||
| logger.error(f'Problem not found: problem_id={submission.problem_id}') | ||
| raise ValueError(f'Problem {submission.problem_id} not found') | ||
|
|
||
| except requests.RequestException as e: | ||
| logger.error(f'Sandbox API error: {str(e)}') | ||
| raise | ||
|
|
||
| except Exception as e: | ||
| logger.error(f'Unexpected error submitting to sandbox: {str(e)}') | ||
| raise | ||
| raise ValueError(f"Problem {submission.problem_id} does not exist") |
There was a problem hiding this comment.
The error handling has been reduced compared to the previous implementation. The old code had specific handling for requests.RequestException and a general Exception catch with appropriate logging. Without these handlers, network errors and other exceptions will propagate without being logged, making debugging difficult. Consider re-adding error handling for requests.RequestException and ValueError (for missing testcase packages) with appropriate logging.
| response = requests.post( | ||
| url, | ||
| f'{settings.SANDBOX_API_URL}/api/v1/submissions', | ||
| data=data, | ||
| files=files, | ||
| headers=headers, | ||
| timeout=SANDBOX_TIMEOUT | ||
| timeout=getattr(settings, 'SANDBOX_TIMEOUT', 30) | ||
| ) | ||
|
|
||
| # 7. 檢查回應 | ||
| response.raise_for_status() | ||
| result = response.json() | ||
|
|
||
| logger.info(f'Sandbox response: {result}') | ||
| return result | ||
| return response.json() |
There was a problem hiding this comment.
The informative logging statements that were present in the old code have been removed. The old implementation logged the submission details before sending the request and the sandbox response upon success. These logs are valuable for debugging and monitoring. Consider re-adding at least a log statement before sending the request (e.g., logger.info(f'Submitting to Sandbox: submission_id={submission.id}, problem_id={submission.problem_id}')) and after receiving a successful response.
| @@ -147,12 +147,12 @@ curl "http://127.0.0.1:8000/problem/30/test-case" -H "Authorization: Bearer $TOK | |||
|
|
|||
There was a problem hiding this comment.
This line still references "MD5" when describing the checksum verification process. Since the PR updates checksum calculation from MD5 to SHA256 throughout the system, this should be updated to say "SHA256" instead of "MD5" for consistency with the rest of the changes.
Overview
Motivation
Switching to SHA256 enhances cryptographic strength when verifying testcase files, addressing security best practices and preventing potential hash collisions.
Details
No backward compatibility issues are expected, but clients must expect SHA256 hashes in responses.