-
Notifications
You must be signed in to change notification settings - Fork 41
docs #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
poojathakur00
wants to merge
4
commits into
main
Choose a base branch
from
docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
docs #21
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| # Creating New Challenges for LeetGPU | ||
|
|
||
| LeetGPU challenges are low-level GPU programming tasks focused on writing custom CUDA, Triton, or Tinygrad kernels. They evaluate both functional correctness and performance under real GPU constraints. | ||
|
|
||
| This guide provides instructions for creating new GPU programming challenges for LeetGPU. It covers the complete process from concept to submission. | ||
|
|
||
| ## Challenge Structure | ||
|
|
||
| Each challenge follows this directory structure: | ||
|
|
||
| ``` | ||
| challenges/<difficulty>/<number>_<name>/ | ||
| ├── challenge.html # Problem description and examples | ||
| ├── challenge.py # Reference implementation and test cases | ||
| └── starter/ # Starter templates for each framework | ||
| ├── starter.cu # CUDA template | ||
| ├── starter.mojo # Mojo template | ||
| ├── starter.pytorch.py # PyTorch template | ||
| ├── starter.tinygrad.py # TinyGrad template | ||
| └── starter.triton.py # Triton template | ||
| ``` | ||
|
|
||
| ### Challenge.html template | ||
|
|
||
|
|
||
| # [Challenge Name] | ||
|
|
||
| ## Description | ||
|
|
||
| [Provide a clear, concise explanation of what the algorithm or function is supposed to do. Include input and output specifications, if necessary.] | ||
|
|
||
| ### Mathematical Formulation | ||
|
|
||
| [If applicable, provide the mathematical formula using LaTeX notation] | ||
|
|
||
| $$ | ||
| \text{[Your formula here]} | ||
| $$ | ||
|
|
||
| ## Implementation Requirements | ||
|
|
||
| - **No External Libraries:** Solutions must be implemented using only native features. No external libraries or frameworks are permitted. | ||
| - **Function Signature:** The solve function signature is fixed and must not be modified. Implement your solution according to the provided signature. | ||
| - **Output Variable:** Results must be written to the designated output parameter: `[output_parameter_name]` | ||
|
|
||
|
|
||
|
|
||
| ## Examples | ||
|
|
||
| ### Example 1 | ||
| **Input:** | ||
| ``` | ||
| [Provide specific input values] | ||
| ``` | ||
|
|
||
| **Expected Output:** | ||
| ``` | ||
| [Show the corresponding output values] | ||
| ``` | ||
|
|
||
| ### Example 2 | ||
| **Input:** | ||
| ``` | ||
| [Provide different input values] | ||
| ``` | ||
|
|
||
| **Expected Output:** | ||
| ``` | ||
| [Show the corresponding output values] | ||
| ``` | ||
|
|
||
| ## Constraints | ||
|
|
||
| - **Input Size:** [Specify the range of input dimensions, e.g., "1 ≤ N ≤ 1,000,000"] | ||
| - **Value Range:** [Specify the range of input values, e.g., "-1000.0 ≤ input[i] ≤ 1000.0"] | ||
| - **Memory Limits:** [If applicable, specify any memory constraints] | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| # Starter Code Creation Process for LeetGPU Challenges | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is wrong. Rework this |
||
|
|
||
| A starter code is a template file that provides the basic structure and function signatures for implementing GPU-accelerated algorithms in LeetGPU challenges. It gives users a runnable foundation while leaving the core algorithmic logic as their task. | ||
|
|
||
| ## Major Components | ||
|
|
||
| - **Function Signatures:** Standardized `solve` function with consistent parameters across all frameworks | ||
| - **Framework-Specific Templates:** CUDA, Triton, Mojo, PyTorch, and TinyGrad implementations | ||
| - **Memory Management:** Proper device pointer handling and memory allocation patterns | ||
| - **Kernel Structure:** Basic kernel function templates with grid/block sizing | ||
| - **Error Handling:** Bounds checking and synchronization primitives | ||
|
|
||
|
|
||
| ### Identify Framework Requirements | ||
|
|
||
| Each framework has specific requirements: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't necessary |
||
|
|
||
| **CUDA:** | ||
| - Kernel functions with `__global__` qualifier(for easy problems) | ||
| - `extern "C"` solve function for framework integration | ||
| - Proper memory management and synchronization | ||
| - Grid and block size | ||
| - | ||
|
|
||
|
|
||
| **Triton:** | ||
| - `@triton.jit` decorator for kernel compilation | ||
| - Pointer type conversions for data types | ||
| - Block size and grid calculations | ||
| - PyTorch restriction compliance | ||
|
|
||
| **Mojo:** | ||
| - `@export` decorator for framework integration | ||
| - Proper GPU imports and memory types | ||
| - Device context management | ||
| - Function parameter types | ||
|
|
||
| **PyTorch/TinyGrad:** | ||
| - Tensor-based function signatures | ||
| - GPU tensor parameters | ||
| - Simple, direct implementations | ||
|
|
||
|
|
||
| ## Easy Problems | ||
|
|
||
| ### CUDA Starter Template | ||
|
|
||
| ```cuda | ||
| #include <cuda_runtime.h> | ||
|
|
||
| __global__ void kernel_name() { | ||
| } | ||
|
|
||
| // input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| extern "C" void solve(input, output,size) { | ||
|
|
||
| // define grid, block size | ||
| kernel_name<<<blocksPerGrid, threadsPerBlock>>>(input, output, size); | ||
| cudaDeviceSynchronize(); | ||
| } | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| ### Triton Starter Template | ||
|
|
||
| ```python | ||
| # The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking. | ||
| import triton | ||
| import triton.language as tl | ||
|
|
||
| @triton.jit | ||
| def kernel_name(input_ptr, output_ptr, input size, block size): | ||
| input_ptr = input_ptr.to(tl.pointer_type(tl.float32)) | ||
| output_ptr = output_ptr.to(tl.pointer_type(tl.float32)) | ||
|
|
||
| # TODO: Implement kernel logic | ||
| # Use tl.program_id(0) to get block index | ||
| # Use tl.program_id(1) to get thread ndex within block | ||
|
|
||
| # input_ptr, output_ptr are raw device pointers | ||
| def solve(input_ptr, output_ptr, input size): | ||
| # define grid, block size | ||
| kernel_name[grid](input_ptr, output_ptr, input size, block size) | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| ### Mojo Starter Template | ||
|
|
||
| ```mojo | ||
| from gpu.host import DeviceContext | ||
| from gpu.id import block_dim, block_idx, thread_idx | ||
| from memory import UnsafePointer | ||
| from math import ceildiv | ||
|
|
||
| fn kernel_name(input, output, size): | ||
| # TODO: Implement kernel logic | ||
| # Use thread_idx() to get thread index within block | ||
| # Use block_idx() to get block index | ||
| pass | ||
|
|
||
| # input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| @export | ||
| def solve(input, output, size): | ||
| #calculate threads per block | ||
| var ctx = DeviceContext() | ||
|
|
||
| ctx.enqueue_function[kernel_name]( | ||
| input, output, size, | ||
| grid_dim = num_blocks, | ||
| block_dim = BLOCK_SIZE | ||
| ) | ||
|
|
||
| ctx.synchronize() | ||
| ``` | ||
|
|
||
| ### PyTorch Starter Template | ||
|
|
||
| ```python | ||
| import torch | ||
|
|
||
| def solve(input, output, size): | ||
| # TODO: Implement solution using PyTorch operations | ||
| pass | ||
| ``` | ||
|
|
||
| ### TinyGrad Starter Template | ||
|
|
||
| ```python | ||
| import tinygrad | ||
|
|
||
| def solve(input, output, size): | ||
| # TODO: Implement solution using TinyGrad operations | ||
| pass | ||
| ``` | ||
|
|
||
|
|
||
| ## Medium and Hard Problems | ||
|
|
||
| ### CUDA Starter Template | ||
|
|
||
| ```cuda | ||
| #include <cuda_runtime.h> | ||
|
|
||
| // input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| extern "C" void solve(input, output, size) { | ||
|
|
||
| } | ||
| ``` | ||
|
|
||
| ### Triton Starter Template | ||
|
|
||
| ```python | ||
| # The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking. | ||
| import triton | ||
| import triton.language as tl | ||
|
|
||
| # input_ptr, output_ptr are raw device pointers | ||
| def solve(): | ||
| pass | ||
| ``` | ||
|
|
||
|
|
||
| ### Mojo Starter Template | ||
|
|
||
| ```mojo | ||
| from gpu.host import DeviceContext | ||
| from gpu.id import block_dim, block_idx, thread_idx | ||
| from memory import UnsafePointer | ||
| from math import ceildiv | ||
|
|
||
| @export | ||
| def solve(input, output, size): | ||
|
|
||
| pass | ||
| ``` | ||
|
|
||
| ### PyTorch Starter Template | ||
|
|
||
| ```python | ||
| import torch | ||
|
|
||
| def solve(input, output, size): | ||
| # TODO: Implement solution using PyTorch operations | ||
| pass | ||
| ``` | ||
|
|
||
| ### TinyGrad Starter Template | ||
|
|
||
| ```python | ||
| import tinygrad | ||
|
|
||
| def solve(input, output, size): | ||
| # TODO: Implement solution using TinyGrad operations | ||
| pass | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Testing Guide for LeetGPU Challenges | ||
|
|
||
| This guide covers how to create test cases and validate your challenges to ensure they work correctly across all frameworks. | ||
|
|
||
| ## Table of Contents | ||
|
|
||
| 1. [Test Case Types](#test-case-types) | ||
| 2. [Test Case Design Principles](#test-case-design-principles) | ||
| 3. [Creating Robust Test Cases](#creating-robust-test-cases) | ||
| 4. [Edge Cases and Boundary Conditions](#edge-cases-and-boundary-conditions) | ||
| 5. [Performance Testing](#performance-testing) | ||
| 6. [Validation Strategies](#validation-strategies) | ||
| 7. [Common Testing Patterns](#common-testing-patterns) | ||
| 8. [Debugging Test Issues](#debugging-test-issues) | ||
|
|
||
| ## Test Case Types | ||
|
|
||
| ### 1. Example Test (`generate_example_test`) | ||
| - **Purpose**: Simple test case that matches the example in `challenge.html` | ||
| - **Complexity**: Low - should be easy to understand and verify manually | ||
| - **Size**: Small (typically 3-10 elements) | ||
| - **Values**: Simple, predictable values | ||
|
|
||
| ### 2. Functional Tests (`generate_functional_test`) | ||
| - **Purpose**: Comprehensive test suite covering various scenarios | ||
| - **Complexity**: Medium - includes edge cases and typical usage | ||
| - **Size**: Varied (small to medium) | ||
| - **Values**: Diverse, including edge cases | ||
|
|
||
| ### 3. Performance Test (`generate_performance_test`) | ||
| - **Purpose**: Large test case for performance evaluation | ||
| - **Complexity**: High - tests scalability and efficiency | ||
| - **Size**: Large (typically 1M+ elements) | ||
| - **Values**: Random or structured large datasets | ||
|
|
||
| ## Test Case Design Principles | ||
|
|
||
| ### 1. Coverage | ||
| - **Input ranges**: Test minimum, maximum, and typical values | ||
| - **Input sizes**: Test small, medium, and large inputs | ||
| - **Data patterns**: Test edge cases, special values, and random data | ||
| - **Error conditions**: Test boundary conditions and invalid inputs | ||
|
|
||
| ### 2. Determinism | ||
| - **Reproducible**: Tests should produce the same results every time | ||
| - **Seeded randomness**: Use fixed seeds for random test cases | ||
| - **Clear expectations**: Expected outputs should be well-defined | ||
|
|
||
| ### 3. Efficiency | ||
| - **Fast execution**: Tests should run quickly for development | ||
| - **Memory efficient**: Avoid unnecessarily large test cases | ||
| - **Scalable**: Performance tests should be appropriately sized | ||
|
|
||
| ## Debugging Test Issues | ||
|
|
||
| ### Common Issues and Solutions | ||
|
|
||
| #### 1. Memory Issues | ||
| ```python | ||
| # Problem: CUDA out of memory | ||
| # Solution: Reduce test case sizes | ||
| def generate_performance_test(self) -> Dict[str, Any]: | ||
| # Reduce size if memory issues occur | ||
| size = 100_000 # Instead of 1_000_000 | ||
| return { | ||
| "input": torch.empty(size, device="cuda", dtype=torch.float32).uniform_(-100.0, 100.0), | ||
| "output": torch.empty(size, device="cuda", dtype=torch.float32), | ||
| "N": size | ||
| } | ||
| ``` | ||
|
|
||
| #### 2. Precision Issues | ||
| ```python | ||
| # Problem: Floating point precision errors | ||
| # Solution: Adjust tolerances | ||
| def __init__(self): | ||
| super().__init__( | ||
| name="Complex Algorithm", | ||
| atol=1e-03, # Increase tolerance for complex algorithms | ||
| rtol=1e-03, | ||
| num_gpus=1, | ||
| access_tier="free" | ||
| ) | ||
| ``` | ||
|
|
||
| #### 3. Shape Mismatch Issues | ||
| ```python | ||
| # Problem: Tensor shape mismatches | ||
| # Solution: Add shape validation | ||
| def reference_impl(self, input: torch.Tensor, output: torch.Tensor, N: int): | ||
| # Validate shapes | ||
| assert input.shape == (N,), f"Expected input shape ({N},), got {input.shape}" | ||
| assert output.shape == (N,), f"Expected output shape ({N},), got {output.shape}" | ||
|
|
||
| # Rest of implementation... | ||
| ``` | ||
|
|
||
| ### Debugging Checklist | ||
|
|
||
| - [ ] Reference implementation produces correct results | ||
| - [ ] All test cases have required parameters | ||
| - [ ] Tensor shapes match expectations | ||
| - [ ] Data types are consistent (float32) | ||
| - [ ] Tolerances are appropriate for the algorithm | ||
| - [ ] Performance test size is reasonable | ||
| - [ ] Edge cases are covered | ||
| - [ ] Random test cases use appropriate ranges | ||
|
|
||
| --- | ||
|
|
||
| *This testing guide ensures your challenges are robust, well-tested, and ready for production use.* |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omit