Skip to content

Updates checksum calculation from MD5 to SHA256#160

Open
happylittle7 wants to merge 1 commit intodevfrom
fix/checksum
Open

Updates checksum calculation from MD5 to SHA256#160
happylittle7 wants to merge 1 commit intodevfrom
fix/checksum

Conversation

@happylittle7
Copy link
Copy Markdown
Contributor

Overview

  • Replaces all usage of MD5 checksum for problem testcase archives with SHA256 to improve security and integrity verification.
  • Updates API endpoints and documentation to reflect the new checksum algorithm.
  • Introduces utility functions for retrieving the SHA256 hash and file path of testcase archives.
  • Integrates the SHA256 hash into the payload sent to the sandbox during code submission, ensuring accurate validation.

Motivation

Switching to SHA256 enhances cryptographic strength when verifying testcase files, addressing security best practices and preventing potential hash collisions.

Details

  • Documentation and API responses now reference SHA256 instead of MD5.
  • The backend now computes and exposes SHA256 hashes for testcases.
  • Submissions sent to the sandbox include the correct SHA256 hash, ensuring consistency between backend and sandbox verification.
  • Adds error handling for missing testcase archives during submission.

No backward compatibility issues are expected, but clients must expect SHA256 hashes in responses.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the checksum algorithm from MD5 to SHA256 for problem testcase archive verification, improving security and cryptographic strength. The changes span backend services, API endpoints, sandbox client integration, and documentation.

Key Changes:

  • Introduced utility functions get_problem_testcase_hash() and get_problem_testcase_path() to retrieve SHA256 hashes and file paths for testcase archives
  • Updated sandbox submission payload to include SHA256 hash instead of placeholder TODO value
  • Modified API endpoint documentation and implementation to return SHA256 checksums instead of MD5

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
submissions/sandbox_client.py Integrated get_problem_testcase_hash() to fetch and validate SHA256 hashes before submission; added error handling for missing testcase packages; refactored settings access and payload assembly
problems/views/api.py Updated ProblemTestCaseChecksumView to compute and return SHA256 checksums instead of MD5 for sandbox verification
problems/services/storage.py Added two new utility functions for retrieving SHA256 hash and file path of testcase archives; leverages existing _sha256_of_fileobj() helper
docs/problems.MD Updated documentation to reflect SHA256 checksum usage in API responses and endpoint descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

file_content = submission.source_code.encode('utf-8')
# 6. 準備程式碼檔案
source_code = submission.source_code or ''
file_extension = get_file_extension(submission.language_type)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_file_extension function is being called with submission.language_type, which is a numeric code (0=C, 1=C++, 2=Python, etc.). However, get_file_extension expects a language string ('c', 'cpp', 'python', etc.) as its parameter. The converted language string is already available in the language variable from line 85. This should be changed to use language instead of submission.language_type.

Suggested change
file_extension = get_file_extension(submission.language_type)
file_extension = get_file_extension(language)

Copilot uses AI. Check for mistakes.
file_extension = get_file_extension(submission.language_type)
files = {
'file': (filename, BytesIO(file_content), 'text/plain')
'file': (f'solution{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain')
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename is missing a dot before the extension. The current code generates filenames like 'solutionpy' or 'solutioncpp' instead of 'solution.py' or 'solution.cpp'. This should be changed to f'solution.{file_extension}'.

Suggested change
'file': (f'solution{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain')
'file': (f'solution.{file_extension}', BytesIO(source_code.encode('utf-8')), 'text/plain')

Copilot uses AI. Check for mistakes.
Comment on lines 133 to +134
except Problems.DoesNotExist:
logger.error(f'Problem not found: problem_id={submission.problem_id}')
raise ValueError(f'Problem {submission.problem_id} not found')

except requests.RequestException as e:
logger.error(f'Sandbox API error: {str(e)}')
raise

except Exception as e:
logger.error(f'Unexpected error submitting to sandbox: {str(e)}')
raise
raise ValueError(f"Problem {submission.problem_id} does not exist")
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling has been reduced compared to the previous implementation. The old code had specific handling for requests.RequestException and a general Exception catch with appropriate logging. Without these handlers, network errors and other exceptions will propagate without being logged, making debugging difficult. Consider re-adding error handling for requests.RequestException and ValueError (for missing testcase packages) with appropriate logging.

Copilot uses AI. Check for mistakes.
Comment on lines 122 to +131
response = requests.post(
url,
f'{settings.SANDBOX_API_URL}/api/v1/submissions',
data=data,
files=files,
headers=headers,
timeout=SANDBOX_TIMEOUT
timeout=getattr(settings, 'SANDBOX_TIMEOUT', 30)
)

# 7. 檢查回應
response.raise_for_status()
result = response.json()

logger.info(f'Sandbox response: {result}')
return result
return response.json()
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The informative logging statements that were present in the old code have been removed. The old implementation logged the submission details before sending the request and the sandbox response upon success. These logs are valuable for debugging and monitoring. Consider re-adding at least a log statement before sending the request (e.g., logger.info(f'Submitting to Sandbox: submission_id={submission.id}, problem_id={submission.problem_id}')) and after receiving a successful response.

Copilot uses AI. Check for mistakes.
Comment thread docs/problems.MD
@@ -147,12 +147,12 @@ curl "http://127.0.0.1:8000/problem/30/test-case" -H "Authorization: Bearer $TOK

Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line still references "MD5" when describing the checksum verification process. Since the PR updates checksum calculation from MD5 to SHA256 throughout the system, this should be updated to say "SHA256" instead of "MD5" for consistency with the rest of the changes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants