Skip to content

Commit 79f478a

Browse files
committed
Expand course site syllabus and resources via Codex
1 parent 6e70316 commit 79f478a

2 files changed

Lines changed: 102 additions & 1 deletion

File tree

css/style.css

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,41 @@ main {
9292
color: #18222d;
9393
}
9494

95+
.toc-grid,
96+
.catalog-grid {
97+
display: grid;
98+
grid-template-columns: repeat(2, minmax(0, 1fr));
99+
gap: 1.2rem;
100+
margin: 1.2rem 0 1.6rem;
101+
}
102+
103+
.toc-card,
104+
.catalog-card {
105+
padding: 1rem 1.1rem;
106+
background: #fffdf8;
107+
border-left: 4px solid #6f1d1b;
108+
box-shadow: 0 10px 20px rgba(24, 34, 45, 0.05);
109+
}
110+
111+
.toc-card h3,
112+
.catalog-card h3 {
113+
margin-top: 0;
114+
margin-bottom: 0.6rem;
115+
color: #6f1d1b;
116+
}
117+
118+
.toc-card ol,
119+
.toc-card ul,
120+
.catalog-card ul {
121+
margin: 0;
122+
padding-left: 1.2rem;
123+
}
124+
125+
.catalog-card li,
126+
.toc-card li {
127+
margin-bottom: 0.55rem;
128+
}
129+
95130
.media {
96131
margin: 0;
97132
}
@@ -135,6 +170,8 @@ footer p {
135170
grid-template-columns: 1fr;
136171
}
137172

173+
.toc-grid,
174+
.catalog-grid,
138175
.dual {
139176
grid-template-columns: 1fr;
140177
}

index.html

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ <h2>Course Overview</h2>
4949
design.
5050
</p>
5151
<p>
52-
The class is co-taught with <strong>Professor Sripathi Pai of the University of Rochester</strong>.
52+
The class is co-taught with <strong>Professor Sreepathi Pai of the University of Rochester</strong>.
5353
It is explicitly project-centered: student-designed primitives will be
5454
tested in realistic ML and HPC settings, and the course is intended to
5555
support paper writing and public artifact release when the work matures
@@ -64,6 +64,32 @@ <h2>Course Overview</h2>
6464

6565
<section id="syllabus" class="panel alt">
6666
<h2>Syllabus Snapshot</h2>
67+
<div class="toc-grid">
68+
<div class="toc-card">
69+
<h3>Table of Contents</h3>
70+
<ol>
71+
<li>Course organization and project expectations</li>
72+
<li>Number systems, floating-point, and tool foundations</li>
73+
<li>GPU performance fundamentals and throughput modeling</li>
74+
<li>Formal GPU execution models, races, and schedule-sensitive bugs</li>
75+
<li>AWS Trainium and Neuron/NKI experimentation</li>
76+
<li>Compiler and language systems: Tilus, Mojo, MLIR, MLIR-AIR</li>
77+
<li>Profiling, tracing, and performance-measurement workflows</li>
78+
<li>Verification, race-checking, and floating-point analysis</li>
79+
<li>Student paper presentations and visiting research talks</li>
80+
<li>Semester-long project development and artifact release</li>
81+
</ol>
82+
</div>
83+
<div class="toc-card">
84+
<h3>Semester Shape</h3>
85+
<ul>
86+
<li>Early weeks emphasize abstractions: numeric representation, correctness reasoning, and cost models.</li>
87+
<li>Middle weeks shift into concrete GPU and Trainium experimentation, with profiling and tool use.</li>
88+
<li>Later weeks increasingly revolve around paper discussion, project reviews, and system-building.</li>
89+
<li>Short student presentations are threaded throughout to connect reading with active experimentation.</li>
90+
</ul>
91+
</div>
92+
</div>
6793
<p>
6894
The detailed course document lays out a semester that moves from basic
6995
numerical and performance foundations toward concrete GPU experiments on
@@ -130,6 +156,44 @@ <h2>Resources and Logistics</h2>
130156
archived course materials and the semester documents in the repository.
131157
This homepage is meant to provide the compact public-facing summary.
132158
</p>
159+
<div class="catalog-grid">
160+
<article class="catalog-card">
161+
<h3>Software Tools</h3>
162+
<p>
163+
The shared course document points students toward a hands-on stack
164+
of systems for writing, checking, and profiling GPU primitives.
165+
</p>
166+
<ul>
167+
<li><strong>AWS Trainium + Neuron/NKI</strong>: the main accelerator experimentation path in the syllabus, including NKI kernels, Neuron Explorer, profiling traces, and attention and matrix-multiplication tutorials.</li>
168+
<li><strong>CHPC GPU workflow</strong>: CUDA-capable campus systems, `nvcc`, `nvidia-smi`, `nsys`, and batch allocation workflows for NVIDIA profiling.</li>
169+
<li><strong>Faial</strong>: a race and cost-analysis direction used in the course to reason about warp-level behavior and correctness/performance interactions.</li>
170+
<li><strong>GKLEE</strong>: symbolic and concolic GPU bug-finding, used as a reference point for race exposure and schedule-sensitive failures.</li>
171+
<li><strong>Tilus</strong>: a tile-level GPGPU language for low-precision computation, treated as a language-design case study for structured primitive construction.</li>
172+
<li><strong>Mojo</strong>: discussed as an emerging systems language for high-performance kernel and HPC-oriented experimentation.</li>
173+
<li><strong>MLIR and MLIR-AIR</strong>: compiler infrastructure and accelerator-lowering frameworks used to connect loop nests, transformations, and hardware realization.</li>
174+
<li><strong>AIR2CUDA and related tooling</strong>: software artifacts used to inspect lowering pathways from MLIR-AIR-style flows toward GPU code generation.</li>
175+
<li><strong>NVBit and custom instrumentation</strong>: dynamic GPU instrumentation ideas, including barrier-focused tooling and low-level runtime inspection.</li>
176+
<li><strong>Vercors, CIVL, and FP analysis tools</strong>: formal and numeric-analysis tools for proving race freedom, checking semantics, and studying floating-point error.</li>
177+
</ul>
178+
</article>
179+
<article class="catalog-card">
180+
<h3>Papers by Topic</h3>
181+
<p>
182+
The readings in the shared syllabus cluster naturally into a few
183+
recurring themes.
184+
</p>
185+
<ul>
186+
<li><strong>Performance and throughput modeling</strong>: papers such as <em>uiCA</em>, <em>Facile</em>, the shared-memory atomic bottleneck work, and modular static cost analysis build the vocabulary for predicting and explaining kernel throughput.</li>
187+
<li><strong>GPU execution cost and productivity</strong>: works such as NPBench, data-centric Python, and CUDA cost-model papers connect user productivity, performance portability, and evaluation-cost reasoning.</li>
188+
<li><strong>Race detection and GPU verification</strong>: the syllabus groups FastTrack, FSE 2010 SMT-based GPU verification, GKLEE, GPUVerify, HiRace, Memory Access Protocols, and Vercors as complementary approaches to proving or detecting correctness properties.</li>
189+
<li><strong>Formal semantics and Hoare-style reasoning</strong>: materials such as Hoare logic for GPU programs, memory-model readings, and CIVL point students toward specification-first reasoning instead of purely empirical debugging.</li>
190+
<li><strong>Floating-point rigor</strong>: the background includes Goldberg’s classic essay, floating-point error-analysis work, Herbie-style rewriting, and scalable rigorous FP analysis, tying numerical semantics directly to kernel trustworthiness.</li>
191+
<li><strong>Scheduling, mapping, and specialization</strong>: software pipelining, warp specialization, distributed tensor mapping, and distributed Fourier mapping papers capture the scheduling side of making kernels and tensor systems fast.</li>
192+
<li><strong>Compiler and accelerator design</strong>: MLIR, MLIR-AIR, Tilus, and recent accelerator-lowering work show how modern compiler structures can encode performance intent and hardware structure more systematically.</li>
193+
<li><strong>Project-facing frontier systems</strong>: RenderMan XPU, tritonBLAS, ParallelKittens, ProofWright, GEAK, TileGym, and Tensor Core survey material serve as examples of current systems that students can study, reimplement, or benchmark against.</li>
194+
</ul>
195+
</article>
196+
</div>
133197
</section>
134198
</main>
135199

0 commit comments

Comments
 (0)