Skip to content
Open

docs #21

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions docs/Getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Creating New Challenges for LeetGPU

LeetGPU challenges are low-level GPU programming tasks focused on writing custom CUDA, Triton, or Tinygrad kernels. They evaluate both functional correctness and performance under real GPU constraints.

This guide provides instructions for creating new GPU programming challenges for LeetGPU. It covers the complete process from concept to submission.

## Challenge Structure

Each challenge follows this directory structure:

```
challenges/<difficulty>/<number>_<name>/
├── challenge.html # Problem description and examples
├── challenge.py # Reference implementation and test cases
└── starter/ # Starter templates for each framework
├── starter.cu # CUDA template
├── starter.mojo # Mojo template
├── starter.pytorch.py # PyTorch template
├── starter.tinygrad.py # TinyGrad template
└── starter.triton.py # Triton template
```

### Challenge.html template


# [Challenge Name]

## Description

[Provide a clear, concise explanation of what the algorithm or function is supposed to do. Include input and output specifications, if necessary.]

### Mathematical Formulation

[If applicable, provide the mathematical formula using LaTeX notation]

$$
\text{[Your formula here]}
$$

## Implementation Requirements

- **No External Libraries:** Solutions must be implemented using only native features. No external libraries or frameworks are permitted.
- **Function Signature:** The solve function signature is fixed and must not be modified. Implement your solution according to the provided signature.
- **Output Variable:** Results must be written to the designated output parameter: `[output_parameter_name]`



## Examples

### Example 1
**Input:**
```
[Provide specific input values]
```

**Expected Output:**
```
[Show the corresponding output values]
```

### Example 2
**Input:**
```
[Provide different input values]
```

**Expected Output:**
```
[Show the corresponding output values]
```

## Constraints

- **Input Size:** [Specify the range of input dimensions, e.g., "1 ≤ N ≤ 1,000,000"]
- **Value Range:** [Specify the range of input values, e.g., "-1000.0 ≤ input[i] ≤ 1000.0"]
- **Memory Limits:** [If applicable, specify any memory constraints]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omit



201 changes: 201 additions & 0 deletions docs/Starter_Codes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Starter Code Creation Process for LeetGPU Challenges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. Rework this


A starter code is a template file that provides the basic structure and function signatures for implementing GPU-accelerated algorithms in LeetGPU challenges. It gives users a runnable foundation while leaving the core algorithmic logic as their task.

## Major Components

- **Function Signatures:** Standardized `solve` function with consistent parameters across all frameworks
- **Framework-Specific Templates:** CUDA, Triton, Mojo, PyTorch, and TinyGrad implementations
- **Memory Management:** Proper device pointer handling and memory allocation patterns
- **Kernel Structure:** Basic kernel function templates with grid/block sizing
- **Error Handling:** Bounds checking and synchronization primitives


### Identify Framework Requirements

Each framework has specific requirements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary


**CUDA:**
- Kernel functions with `__global__` qualifier(for easy problems)
- `extern "C"` solve function for framework integration
- Proper memory management and synchronization
- Grid and block size
-


**Triton:**
- `@triton.jit` decorator for kernel compilation
- Pointer type conversions for data types
- Block size and grid calculations
- PyTorch restriction compliance

**Mojo:**
- `@export` decorator for framework integration
- Proper GPU imports and memory types
- Device context management
- Function parameter types

**PyTorch/TinyGrad:**
- Tensor-based function signatures
- GPU tensor parameters
- Simple, direct implementations


## Easy Problems

### CUDA Starter Template

```cuda
#include <cuda_runtime.h>

__global__ void kernel_name() {
}

// input, output are device pointers (i.e. pointers to memory on the GPU)
extern "C" void solve(input, output,size) {

// define grid, block size
kernel_name<<<blocksPerGrid, threadsPerBlock>>>(input, output, size);
cudaDeviceSynchronize();
}
```





### Triton Starter Template

```python
# The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking.
import triton
import triton.language as tl

@triton.jit
def kernel_name(input_ptr, output_ptr, input size, block size):
input_ptr = input_ptr.to(tl.pointer_type(tl.float32))
output_ptr = output_ptr.to(tl.pointer_type(tl.float32))

# TODO: Implement kernel logic
# Use tl.program_id(0) to get block index
# Use tl.program_id(1) to get thread ndex within block

# input_ptr, output_ptr are raw device pointers
def solve(input_ptr, output_ptr, input size):
# define grid, block size
kernel_name[grid](input_ptr, output_ptr, input size, block size)
```





### Mojo Starter Template

```mojo
from gpu.host import DeviceContext
from gpu.id import block_dim, block_idx, thread_idx
from memory import UnsafePointer
from math import ceildiv

fn kernel_name(input, output, size):
# TODO: Implement kernel logic
# Use thread_idx() to get thread index within block
# Use block_idx() to get block index
pass

# input, output are device pointers (i.e. pointers to memory on the GPU)
@export
def solve(input, output, size):
#calculate threads per block
var ctx = DeviceContext()

ctx.enqueue_function[kernel_name](
input, output, size,
grid_dim = num_blocks,
block_dim = BLOCK_SIZE
)

ctx.synchronize()
```

### PyTorch Starter Template

```python
import torch

def solve(input, output, size):
# TODO: Implement solution using PyTorch operations
pass
```

### TinyGrad Starter Template

```python
import tinygrad

def solve(input, output, size):
# TODO: Implement solution using TinyGrad operations
pass
```


## Medium and Hard Problems

### CUDA Starter Template

```cuda
#include <cuda_runtime.h>

// input, output are device pointers (i.e. pointers to memory on the GPU)
extern "C" void solve(input, output, size) {

}
```

### Triton Starter Template

```python
# The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking.
import triton
import triton.language as tl

# input_ptr, output_ptr are raw device pointers
def solve():
pass
```


### Mojo Starter Template

```mojo
from gpu.host import DeviceContext
from gpu.id import block_dim, block_idx, thread_idx
from memory import UnsafePointer
from math import ceildiv

@export
def solve(input, output, size):

pass
```

### PyTorch Starter Template

```python
import torch

def solve(input, output, size):
# TODO: Implement solution using PyTorch operations
pass
```

### TinyGrad Starter Template

```python
import tinygrad

def solve(input, output, size):
# TODO: Implement solution using TinyGrad operations
pass
```
111 changes: 111 additions & 0 deletions docs/TESTING_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Testing Guide for LeetGPU Challenges

This guide covers how to create test cases and validate your challenges to ensure they work correctly across all frameworks.

## Table of Contents

1. [Test Case Types](#test-case-types)
2. [Test Case Design Principles](#test-case-design-principles)
3. [Creating Robust Test Cases](#creating-robust-test-cases)
4. [Edge Cases and Boundary Conditions](#edge-cases-and-boundary-conditions)
5. [Performance Testing](#performance-testing)
6. [Validation Strategies](#validation-strategies)
7. [Common Testing Patterns](#common-testing-patterns)
8. [Debugging Test Issues](#debugging-test-issues)

## Test Case Types

### 1. Example Test (`generate_example_test`)
- **Purpose**: Simple test case that matches the example in `challenge.html`
- **Complexity**: Low - should be easy to understand and verify manually
- **Size**: Small (typically 3-10 elements)
- **Values**: Simple, predictable values

### 2. Functional Tests (`generate_functional_test`)
- **Purpose**: Comprehensive test suite covering various scenarios
- **Complexity**: Medium - includes edge cases and typical usage
- **Size**: Varied (small to medium)
- **Values**: Diverse, including edge cases

### 3. Performance Test (`generate_performance_test`)
- **Purpose**: Large test case for performance evaluation
- **Complexity**: High - tests scalability and efficiency
- **Size**: Large (typically 1M+ elements)
- **Values**: Random or structured large datasets

## Test Case Design Principles

### 1. Coverage
- **Input ranges**: Test minimum, maximum, and typical values
- **Input sizes**: Test small, medium, and large inputs
- **Data patterns**: Test edge cases, special values, and random data
- **Error conditions**: Test boundary conditions and invalid inputs

### 2. Determinism
- **Reproducible**: Tests should produce the same results every time
- **Seeded randomness**: Use fixed seeds for random test cases
- **Clear expectations**: Expected outputs should be well-defined

### 3. Efficiency
- **Fast execution**: Tests should run quickly for development
- **Memory efficient**: Avoid unnecessarily large test cases
- **Scalable**: Performance tests should be appropriately sized

## Debugging Test Issues

### Common Issues and Solutions

#### 1. Memory Issues
```python
# Problem: CUDA out of memory
# Solution: Reduce test case sizes
def generate_performance_test(self) -> Dict[str, Any]:
# Reduce size if memory issues occur
size = 100_000 # Instead of 1_000_000
return {
"input": torch.empty(size, device="cuda", dtype=torch.float32).uniform_(-100.0, 100.0),
"output": torch.empty(size, device="cuda", dtype=torch.float32),
"N": size
}
```

#### 2. Precision Issues
```python
# Problem: Floating point precision errors
# Solution: Adjust tolerances
def __init__(self):
super().__init__(
name="Complex Algorithm",
atol=1e-03, # Increase tolerance for complex algorithms
rtol=1e-03,
num_gpus=1,
access_tier="free"
)
```

#### 3. Shape Mismatch Issues
```python
# Problem: Tensor shape mismatches
# Solution: Add shape validation
def reference_impl(self, input: torch.Tensor, output: torch.Tensor, N: int):
# Validate shapes
assert input.shape == (N,), f"Expected input shape ({N},), got {input.shape}"
assert output.shape == (N,), f"Expected output shape ({N},), got {output.shape}"

# Rest of implementation...
```

### Debugging Checklist

- [ ] Reference implementation produces correct results
- [ ] All test cases have required parameters
- [ ] Tensor shapes match expectations
- [ ] Data types are consistent (float32)
- [ ] Tolerances are appropriate for the algorithm
- [ ] Performance test size is reasonable
- [ ] Edge cases are covered
- [ ] Random test cases use appropriate ranges

---

*This testing guide ensures your challenges are robust, well-tested, and ready for production use.*
Loading