@@ -7,6 +7,8 @@ Terry Sun; Arch Linux, Intel i5-4670, GTX 750
77
88This project contains a simplied graphics pipeline implemented in CUDA.
99
10+ ![ ] ( renders/demo.gif )
11+
1012## Pipeline Overview
1113
12141 . Vertex shader: applies a model-view-projection transformation to each
@@ -16,51 +18,129 @@ This project contains a simplied graphics pipeline implemented in CUDA.
1618 primitives (triangles). Parallelized across primitives.
1719
18203 . Geometry shader: after primitives are assembled, the geometry shader
19- generates more (or fewer) triangles for each existing triangle, up to a fixed
20- N for each triangle. Examples of this are (fixed-number) tesselation and
21- backface culling (both implemented).
21+ performs additional primitive generation (or deletion), up to a fixed factor
22+ (4) per original primitive.
23+
24+ 1 . Backface culling: triangles that face away from the camera are removed.
25+ ` thrust::remove_if ` stream compaction is used to filter these out before
26+ rasterization occurs.
2227
23- 3 . Rasterization: uses a scanline algorithm to determine which fragments are
28+ 2 . Tessellation shading/smoothing: Each triangle is subdivided into 4 smaller
29+ triangles, with interpolated normals.
30+
31+ 4 . Rasterization: uses a scanline algorithm to determine which fragments are
2432 covered by a particular primitive, performs depth testing, and stores into a
2533 depth buffer. Uses an axis-aligned bounding box for optimization, barycentric
2634 coordinate checking to test coverage, and CUDA ` atomicMin ` to avoid race
2735 conditions when doing depth testing. Parallelized across primitives.
2836
29- 4 . Fragment shading: computes color of each pixel using Lambert (diffuse)
37+ 5 . Fragment shading: computes color of each pixel using Lambert (diffuse)
3038 shading. Interpolates normals within a triangle. Parallelized across
3139 fragments.
3240
41+ 6 . Copy to screen/frame buffer.
42+
3343## Features
3444
35- ### Geometry shader + backface culling
45+ ### Geometry shader
46+
47+ ** Backface culling** . Triangles which do not face the camera are removed before
48+ rasterization. Triangles are tested by computing the cross product between the
49+ edges (v0-v1, v0-v2); by convention, triangles which face away from the front of
50+ the model will be defined such that this cross product has a negative z
51+ component. ([ source] [ bfc-wiki ] )
52+
53+ [ bfc-wiki] : https://en.wikipedia.org/wiki/Back-face_culling )
54+
55+ Back-facing faces are stream compacted away. This improves the execution
56+ warp coherency because all threads going through the scanline function are
57+ guaranteed to draw to at least one pixel on the screen.
58+
59+ ![ ] ( renders/backface.gif )
60+
61+ (In this example the back direction is fixed relative to the model in order to
62+ demonstrate missing faces. In practice, backface culling would be invisible to
63+ the viewer.)
64+
65+ * Tessellation geometry shading* . A second geometry shader divides each triangle
66+ into 4 smaller triangles (see left, below). Three new vertices are generated
67+ from each existing triangle. The vertex transformation must be applied again to
68+ each of these vertices, thus blurring the pipeline stages. (I considered moving
69+ the entire vertex shader to within geometry shader, but made the optimization of
70+ splitting the vertex shader out and transforming the original vertices once.)
71+
72+ Below: the middle triangle of the 4 generated triangles is colored lightly to
73+ show the pattern of tessellation.
74+
75+ ![ ] ( renders/tri-subdiv.png )
76+ ![ ] ( renders/suzanne-subdiv.png )
3677
3778### Color interpolation
3879
3980For every point, its normal is interpolated from its relative distance from the
40813 vertices of its triangle. This is calculated using barycentric coordinates.
41- This normal is then used to calculate a Lambert (diffuse) shading, which is
42- smooth.
82+ This normal is then used to calculate a Lambert (diffuse) shading (plus a small
83+ amount of ambient lighting).
84+
85+ Comparison of non-interpolated and interpolated normals:
86+
87+ ![ ] ( renders/suzanne-nosmooth.png )
88+ ![ ] ( renders/suzanne-smooth.png )
89+
90+ Animation of a light moving across the screen:
91+
92+ ![ ] ( renders/light.gif )
4393
4494### Antialiasing
4595
46- TODO: image
96+ Four fragments are generated for every pixel, spaced evenly within the pixel.
97+ The parallelization for this process varies between stages. In some (fragment
98+ shader), a thread is launched for each fragment; however, in others (scanline),
99+ a single thread will handle four fragments in succession. Future work might be
100+ to do analysis on methods of parallelizing this multi-fragmented approach.
47101
48- Four fragments are generated for every pixel, spaced evenly within the pixel. In
49- general this is parallelized
102+ At the very end of the pipeline, as fragments are translated into colors for the
103+ frame buffer, the four pixels associated with a frame are averaged together.
50104
51105### Scissor test
52106
53107Clipping optimization. Define a ` glm::vec4(xmin, xmax, ymin, ymax) ` window in
54108which to render to the screen. When performing scanline rasterization algorithm,
55109this test discards data outside of this window.
56110
57- ## Internals
111+ ![ ] ( renders/suzanne-clipped.png )
112+
113+ ## Performance
114+
115+ ### Breakdown by Stage
116+
117+ ![ ] ( data/percent-stacked.png )
118+
119+ By far the longest stage is the rasterization/scanline function. Many of the
120+ other stages are completely inconsequential in comparison, as expected -- very
121+ little computational work is done in the primitive assembly (mostly memory
122+ access) or vertex shader step. It surprised me that the fragment shader is the
123+ second largest stage and by that margin.
58124
59- ![ ] ( renders/suzanne-red.png )
125+ A future work possibility is to use a more optimized scanline function or use a
126+ different rasterization technique altogether.
127+
128+ ### Backface Culling
129+
130+ Data in this case taken * with* triangle subdivision active.
131+
132+ File | Triangles Orig | Triangles after BFC | % Removed | Time Orig | Time AFter | % Speedup
133+ --------|----------------|---------------------|-----------|-----------|------------|----------
134+ Cow | 23216 | 9352 | 59.7% | 13.21 | 12.94 | 2.02%
135+ --------|----------------|---------------------|-----------|-----------|------------|----------
136+ Suzanne | 3862 | 2576 | 33.3% | 22.2 | 20.83 | 6.17%
137+
138+
139+ ## Internals
60140
61141![ ] ( renders/suzanne-normals.png )
62142
63- # Bloopers
143+ ## Bloopers
64144
65145![ ] ( renders/cow-oops1.png )
66146
0 commit comments