diff --git a/docs/Getting_started.md b/docs/Getting_started.md
new file mode 100644
index 0000000..17c21e6
--- /dev/null
+++ b/docs/Getting_started.md
@@ -0,0 +1,78 @@
+# Creating New Challenges for LeetGPU
+
+LeetGPU challenges are low-level GPU programming tasks focused on writing custom CUDA, Triton, or Tinygrad kernels. They evaluate both functional correctness and performance under real GPU constraints.
+
+This guide provides instructions for creating new GPU programming challenges for LeetGPU. It covers the complete process from concept to submission.
+
+## Challenge Structure
+
+Each challenge follows this directory structure:
+
+```
+challenges/<difficulty>/<number>_<name>/
+├── challenge.html          # Problem description and examples
+├── challenge.py           # Reference implementation and test cases
+└── starter/              # Starter templates for each framework
+    ├── starter.cu           # CUDA template
+    ├── starter.mojo         # Mojo template
+    ├── starter.pytorch.py   # PyTorch template
+    ├── starter.tinygrad.py  # TinyGrad template
+    └── starter.triton.py    # Triton template
+```
+
+### Challenge.html template
+
+
+# [Challenge Name]
+
+## Description
+
+[Provide a clear, concise explanation of what the algorithm or function is supposed to do. Include input and output specifications, if necessary.]
+
+### Mathematical Formulation
+
+[If applicable, provide the mathematical formula using LaTeX notation]
+
+$$
+\text{[Your formula here]}
+$$
+
+## Implementation Requirements
+
+- **No External Libraries:** Solutions must be implemented using only native features. No external libraries or frameworks are permitted.
+- **Function Signature:** The solve function signature is fixed and must not be modified. Implement your solution according to the provided signature.
+- **Output Variable:** Results must be written to the designated output parameter: `[output_parameter_name]`
+
+
+
+## Examples
+
+### Example 1
+**Input:**
+```
+[Provide specific input values]
+```
+
+**Expected Output:**
+```
+[Show the corresponding output values]
+```
+
+### Example 2
+**Input:**
+```
+[Provide different input values]
+```
+
+**Expected Output:**
+```
+[Show the corresponding output values]
+```
+
+## Constraints
+
+- **Input Size:** [Specify the range of input dimensions, e.g., "1 ≤ N ≤ 1,000,000"]
+- **Value Range:** [Specify the range of input values, e.g., "-1000.0 ≤ input[i] ≤ 1000.0"]
+- **Memory Limits:** [If applicable, specify any memory constraints]
+
+
diff --git a/docs/Starter_Codes.md b/docs/Starter_Codes.md
new file mode 100644
index 0000000..67b40ea
--- /dev/null
+++ b/docs/Starter_Codes.md
@@ -0,0 +1,201 @@
+# Starter Code Creation Process for LeetGPU Challenges
+
+A starter code is a template file that provides the basic structure and function signatures for implementing GPU-accelerated algorithms in LeetGPU challenges. It gives users a runnable foundation while leaving the core algorithmic logic as their task.
+
+## Major Components
+
+- **Function Signatures:** Standardized `solve` function with consistent parameters across all frameworks
+- **Framework-Specific Templates:** CUDA, Triton, Mojo, PyTorch, and TinyGrad implementations
+- **Memory Management:** Proper device pointer handling and memory allocation patterns
+- **Kernel Structure:** Basic kernel function templates with grid/block sizing
+- **Error Handling:** Bounds checking and synchronization primitives
+
+
+### Identify Framework Requirements
+
+Each framework has specific requirements:
+
+**CUDA:**
+- Kernel functions with `__global__` qualifier(for easy problems)
+- `extern "C"` solve function for framework integration
+- Proper memory management and synchronization
+- Grid and block size 
+- 
+
+
+**Triton:**
+- `@triton.jit` decorator for kernel compilation
+- Pointer type conversions for data types
+- Block size and grid calculations
+- PyTorch restriction compliance
+
+**Mojo:**
+- `@export` decorator for framework integration
+- Proper GPU imports and memory types
+- Device context management
+- Function parameter types
+
+**PyTorch/TinyGrad:**
+- Tensor-based function signatures
+- GPU tensor parameters
+- Simple, direct implementations
+
+
+## Easy Problems
+
+### CUDA Starter Template
+
+```cuda
+#include <cuda_runtime.h>
+
+__global__ void kernel_name() {
+}
+
+// input, output are device pointers (i.e. pointers to memory on the GPU)
+extern "C" void solve(input, output,size) {
+    
+    // define grid, block size
+    kernel_name<<<blocksPerGrid, threadsPerBlock>>>(input, output, size);
+    cudaDeviceSynchronize();
+}
+```
+
+
+
+
+
+### Triton Starter Template
+
+```python
+# The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking.
+import triton
+import triton.language as tl
+
+@triton.jit
+def kernel_name(input_ptr, output_ptr, input size, block size):
+    input_ptr = input_ptr.to(tl.pointer_type(tl.float32))
+    output_ptr = output_ptr.to(tl.pointer_type(tl.float32)) 
+    
+    # TODO: Implement kernel logic
+    # Use tl.program_id(0) to get block index
+    # Use tl.program_id(1) to get thread ndex within block
+
+# input_ptr, output_ptr are raw device pointers
+def solve(input_ptr, output_ptr, input size):    
+    # define grid, block size
+    kernel_name[grid](input_ptr, output_ptr, input size, block size)
+```
+
+
+
+
+
+### Mojo Starter Template
+
+```mojo
+from gpu.host import DeviceContext
+from gpu.id import block_dim, block_idx, thread_idx
+from memory import UnsafePointer
+from math import ceildiv
+
+fn kernel_name(input, output, size):
+    # TODO: Implement kernel logic
+    # Use thread_idx() to get thread index within block
+    # Use block_idx() to get block index
+    pass
+
+# input, output are device pointers (i.e. pointers to memory on the GPU)
+@export
+def solve(input, output, size):
+    #calculate threads per block
+    var ctx = DeviceContext()
+
+    ctx.enqueue_function[kernel_name](
+        input, output, size,
+        grid_dim = num_blocks,
+        block_dim = BLOCK_SIZE
+    )
+
+    ctx.synchronize()
+```
+
+### PyTorch Starter Template
+
+```python
+import torch
+
+def solve(input, output, size):
+    # TODO: Implement solution using PyTorch operations
+    pass
+```
+
+### TinyGrad Starter Template
+
+```python
+import tinygrad
+
+def solve(input, output, size):
+    # TODO: Implement solution using TinyGrad operations
+    pass
+```
+
+
+## Medium and Hard Problems
+
+### CUDA Starter Template
+
+```cuda
+#include <cuda_runtime.h>
+
+// input, output are device pointers (i.e. pointers to memory on the GPU)
+extern "C" void solve(input, output, size) {
+    
+}
+```
+
+### Triton Starter Template
+
+```python
+# The use of PyTorch in Triton programs is not allowed for the purposes of fair benchmarking.
+import triton
+import triton.language as tl
+
+# input_ptr, output_ptr are raw device pointers
+def solve():    
+    pass
+```
+
+
+### Mojo Starter Template
+
+```mojo
+from gpu.host import DeviceContext
+from gpu.id import block_dim, block_idx, thread_idx
+from memory import UnsafePointer
+from math import ceildiv
+
+@export
+def solve(input, output, size):
+
+    pass
+```
+
+### PyTorch Starter Template
+
+```python
+import torch
+
+def solve(input, output, size):
+    # TODO: Implement solution using PyTorch operations
+    pass
+```
+
+### TinyGrad Starter Template
+
+```python
+import tinygrad
+
+def solve(input, output, size):
+    # TODO: Implement solution using TinyGrad operations
+    pass
+```
diff --git a/docs/TESTING_GUIDE.md b/docs/TESTING_GUIDE.md
new file mode 100644
index 0000000..31a5acb
--- /dev/null
+++ b/docs/TESTING_GUIDE.md
@@ -0,0 +1,111 @@
+# Testing Guide for LeetGPU Challenges
+
+This guide covers how to create test cases and validate your challenges to ensure they work correctly across all frameworks.
+
+## Table of Contents
+
+1. [Test Case Types](#test-case-types)
+2. [Test Case Design Principles](#test-case-design-principles)
+3. [Creating Robust Test Cases](#creating-robust-test-cases)
+4. [Edge Cases and Boundary Conditions](#edge-cases-and-boundary-conditions)
+5. [Performance Testing](#performance-testing)
+6. [Validation Strategies](#validation-strategies)
+7. [Common Testing Patterns](#common-testing-patterns)
+8. [Debugging Test Issues](#debugging-test-issues)
+
+## Test Case Types
+
+### 1. Example Test (`generate_example_test`)
+- **Purpose**: Simple test case that matches the example in `challenge.html`
+- **Complexity**: Low - should be easy to understand and verify manually
+- **Size**: Small (typically 3-10 elements)
+- **Values**: Simple, predictable values
+
+### 2. Functional Tests (`generate_functional_test`)
+- **Purpose**: Comprehensive test suite covering various scenarios
+- **Complexity**: Medium - includes edge cases and typical usage
+- **Size**: Varied (small to medium)
+- **Values**: Diverse, including edge cases
+
+### 3. Performance Test (`generate_performance_test`)
+- **Purpose**: Large test case for performance evaluation
+- **Complexity**: High - tests scalability and efficiency
+- **Size**: Large (typically 1M+ elements)
+- **Values**: Random or structured large datasets
+
+## Test Case Design Principles
+
+### 1. Coverage
+- **Input ranges**: Test minimum, maximum, and typical values
+- **Input sizes**: Test small, medium, and large inputs
+- **Data patterns**: Test edge cases, special values, and random data
+- **Error conditions**: Test boundary conditions and invalid inputs
+
+### 2. Determinism
+- **Reproducible**: Tests should produce the same results every time
+- **Seeded randomness**: Use fixed seeds for random test cases
+- **Clear expectations**: Expected outputs should be well-defined
+
+### 3. Efficiency
+- **Fast execution**: Tests should run quickly for development
+- **Memory efficient**: Avoid unnecessarily large test cases
+- **Scalable**: Performance tests should be appropriately sized
+
+## Debugging Test Issues
+
+### Common Issues and Solutions
+
+#### 1. Memory Issues
+```python
+# Problem: CUDA out of memory
+# Solution: Reduce test case sizes
+def generate_performance_test(self) -> Dict[str, Any]:
+    # Reduce size if memory issues occur
+    size = 100_000  # Instead of 1_000_000
+    return {
+        "input": torch.empty(size, device="cuda", dtype=torch.float32).uniform_(-100.0, 100.0),
+        "output": torch.empty(size, device="cuda", dtype=torch.float32),
+        "N": size
+    }
+```
+
+#### 2. Precision Issues
+```python
+# Problem: Floating point precision errors
+# Solution: Adjust tolerances
+def __init__(self):
+    super().__init__(
+        name="Complex Algorithm",
+        atol=1e-03,  # Increase tolerance for complex algorithms
+        rtol=1e-03,
+        num_gpus=1,
+        access_tier="free"
+    )
+```
+
+#### 3. Shape Mismatch Issues
+```python
+# Problem: Tensor shape mismatches
+# Solution: Add shape validation
+def reference_impl(self, input: torch.Tensor, output: torch.Tensor, N: int):
+    # Validate shapes
+    assert input.shape == (N,), f"Expected input shape ({N},), got {input.shape}"
+    assert output.shape == (N,), f"Expected output shape ({N},), got {output.shape}"
+    
+    # Rest of implementation...
+```
+
+### Debugging Checklist
+
+- [ ] Reference implementation produces correct results
+- [ ] All test cases have required parameters
+- [ ] Tensor shapes match expectations
+- [ ] Data types are consistent (float32)
+- [ ] Tolerances are appropriate for the algorithm
+- [ ] Performance test size is reasonable
+- [ ] Edge cases are covered
+- [ ] Random test cases use appropriate ranges
+
+---
+
+*This testing guide ensures your challenges are robust, well-tested, and ready for production use.* 
\ No newline at end of file
diff --git a/docs/challenge_template.py b/docs/challenge_template.py
new file mode 100644
index 0000000..08f5387
--- /dev/null
+++ b/docs/challenge_template.py
@@ -0,0 +1,136 @@
+import ctypes
+from typing import Any, List, Dict
+import torch
+from core.challenge_base import ChallengeBase
+
+class Challenge(ChallengeBase):
+    def __init__(self):
+        super().__init__(
+            name="[CHALLENGE_NAME]",  # e.g., "ReLU", "Softmax", "Multi-Head Attention"
+            atol=1e-05,  # Absolute tolerance for testing. 1e-05 is a good default.
+            rtol=1e-05,  # Relative tolerance for testing. 1e-05 is a good default.
+            num_gpus=1,  # Number of GPUs required. 
+            access_tier="free"  # Access tier
+        )
+        
+    def reference_impl(self, *args, **kwargs):
+        """
+        Reference implementation of the algorithm/function.
+        
+        Common patterns:
+        - Assert input shapes and properties (dtype, device)
+        - Implement the core algorithm logic
+        - Use output.copy_(result) to write results
+        
+        Example signature patterns:
+        - Simple: (input: torch.Tensor, output: torch.Tensor, N: int)
+        - Complex: (Q: torch.Tensor, K: torch.Tensor, V: torch.Tensor, output: torch.Tensor, N: int, d_model: int, h: int)
+        """
+        # TODO: Add input assertions
+        # assert input.shape == expected_shape
+        # assert input.dtype == expected_dtype
+        # assert input.device == expected_device
+        
+        # TODO: Implement core algorithm logic
+        # result = your_algorithm_implementation()
+        
+        # TODO: Copy result to output tensor
+        # output.copy_(result)
+        pass
+    
+    def get_solve_signature(self) -> Dict[str, Any]:
+        """
+        Define the C function signature for the solver.
+        
+        Common ctypes patterns:
+        - Tensor pointers: ctypes.POINTER(ctypes.c_float)
+        - Integers: ctypes.c_int
+        - Floats: ctypes.c_float
+        """
+        return {
+            # TODO: Define your function signature
+            # "input": ctypes.POINTER(ctypes.c_float),
+            # "output": ctypes.POINTER(ctypes.c_float),
+            # "N": ctypes.c_int,
+            # Add other parameters as needed
+        }
+    
+    def generate_example_test(self) -> Dict[str, Any]:
+        """
+        Generate a simple example test case.
+        Usually small, hand-crafted data for basic demonstration.
+        """
+        dtype = torch.float32
+        
+        # TODO: Create example input tensors
+        # input_tensor = torch.tensor([...], device="cuda", dtype=dtype)
+        # output_tensor = torch.empty(shape, device="cuda", dtype=dtype)
+        
+        return {
+            # TODO: Return test case dictionary
+            # "input": input_tensor,
+            # "output": output_tensor,
+            # "N": size,
+            # Add other parameters as needed
+        }
+    
+    def generate_functional_test(self) -> List[Dict[str, Any]]:
+        """
+        Generate comprehensive functional test cases.
+        
+        Common test patterns:
+        - Edge cases (zeros, negatives, single elements)
+        - Boundary conditions
+        - Various sizes
+        - Random data
+        - Special mathematical cases
+        """
+        dtype = torch.float32
+        test_cases = []
+        
+        # TODO: Add basic test case
+        # test_cases.append({
+        #     "input": torch.tensor([...], device="cuda", dtype=dtype),
+        #     "output": torch.empty(shape, device="cuda", dtype=dtype),
+        #     "N": size
+        # })
+        
+        # TODO: Add edge cases
+        # - All zeros
+        # - All negatives  
+        # - Single element
+        # - Large values
+        # - Small values
+        # - Mixed positive/negative
+        
+        # TODO: Add random test cases
+        # test_cases.append({
+        #     "input": torch.empty(size, device="cuda", dtype=dtype).uniform_(min_val, max_val),
+        #     "output": torch.empty(size, device="cuda", dtype=dtype),
+        #     "N": size
+        # })
+        
+        return test_cases
+    
+    def generate_performance_test(self) -> Dict[str, Any]:
+        """
+        Generate a large-scale performance test case.
+        Usually uses large tensors with random data.
+        """
+        dtype = torch.float32
+        
+        # TODO: Set appropriate size for performance testing
+        # Common sizes: 25000000, 500000, 1024x1024, etc.
+        N = 1000000  # Adjust based on your challenge
+        
+        # TODO: Create large tensors for performance testing
+        # input_tensor = torch.empty(N, device="cuda", dtype=dtype).uniform_(min_val, max_val)
+        # output_tensor = torch.empty(N, device="cuda", dtype=dtype)
+        
+        return {
+            # TODO: Return performance test case
+            # "input": input_tensor,
+            # "output": output_tensor,
+            # "N": N,
+            # Add other parameters as needed
+        }
\ No newline at end of file