A software CPU implementation of the Direct3D family of APIs.
| API | Status |
|---|---|
| Direct3D 11 | Feature-complete core |
| Direct3D 9 | Work in progress |
| Direct3D 7/8 | Planned |
Direct3D 11
- Vertex, Pixel, Compute, Geometry, Hull, Domain shaders
- JIT pipeline: DXBC bytecode → C++20 → clang++/MSVC → native .dylib/.dll
- Full SM4.0 / SM5.0 instruction set (arithmetic, integer, bitwise, control flow, atomics, derivatives, bit manipulation)
- All shader stages share the same JIT, codegen, and runtime — frontend is pluggable
- Tiled rasterizer with 28.4 fixed-point edge functions
- 2×2 quad pixel shader execution (correct derivatives, auto-LOD)
- MSAA: 1×/2×/4×/8×/16×, per-sample coverage/depth/stencil, per-sample shading (
SV_SampleIndex),SV_Coverageinput/output,LD_MS,ResolveSubresource - Early-Z with optional Hi-Z acceleration; late-Z when PS uses
discard, writesSV_Depth, or uses UAVs SV_ViewportArrayIndex,SV_RenderTargetArrayIndex
- Geometry Shader, stream output, adjacency topologies,
DrawAuto - Hull Shader + Domain Shader (tri/quad/isoline domains, all partition modes)
- Instanced geometry shaders (
DCL_GS_INSTANCE_COUNT)
- 1D / 2D / 3D / Cube textures, mipmap chains,
GenerateMips - Point, bilinear, trilinear filtering; all address modes (wrap, mirror, clamp, border, mirror-once)
- Anisotropic filtering (correct per-pixel footprint, up to 16×)
SampleLevel,SampleGrad,SampleBias,SampleCmp,SampleCmpLevelZero,Gather,GatherCmp,GatherPO- BC compressed textures: BC1, BC2, BC3, BC4, BC5, BC7 (BC6H not yet decoded — returns black)
- sRGB read/write
- Depth/stencil: all comparison functions, all stencil ops, Hi-Z
- Blending: all blend factors/ops, dual-source blending, logic ops
- Multi-render-target (up to 8), per-RT write masks, clip/cull distances
- Thread group shared memory (TGSM),
GroupMemoryBarrierWithGroupSync - Work-stealing thread pool for TGSM-free shaders; barrier-synchronised group pool for shaders with TGSM
- Append/consume buffers, structured buffers, raw buffers, typed UAVs
- Atomic operations (32-bit int/uint, compare-exchange, exchange)
- Indexed, instanced, indirect draw and dispatch (
DrawInstancedIndirect,DispatchIndirect) - All primitive topologies including adjacency and patch lists
D3D11_QUERY_EVENT— CPU/GPU sync pointD3D11_QUERY_TIMESTAMP/D3D11_QUERY_TIMESTAMP_DISJOINT— timingD3D11_QUERY_PIPELINE_STATISTICS— VS/PS/etc invocation countsD3D11_QUERY_OCCLUSION/D3D11_QUERY_OCCLUSION_PREDICATED3D11_QUERY_SO_STATISTICS/D3D11_QUERY_SO_OVERFLOW_PREDICATE
The following return E_NOTIMPL or produce no-op results:
- BC6H texture decoding —
DXGI_FORMAT_BC6H_UF16 / SF16texels decode to black. Requires a full BC6H bit-field decoder - Deferred contexts —
CreateDeferredContext,CreateDeferredContext1/2/3. Would require a full command-list record/replay path ID3DDeviceContextState(CreateDeviceContextState) — D3D11.1 context snapshot/swap; not implemented- Tiled resources —
UpdateTileMappings,CopyTileMappings,CopyTiles,ResizeTilePool(D3D11.2 sparse resources) - Fences —
CreateFence,OpenSharedFence(D3D11.4 / D3D12-style timeline semaphores) - Cross-process shared resources —
OpenSharedResource,OpenSharedResource1,OpenSharedResourceByName - Class linkage / shader subroutines —
CreateClassLinkage; theINTERFACE_CALLSM5 opcode is not handled - Performance counters —
CreateCounter,CheckCounter,CheckCounterInfo. - Predicates —
CreatePredicate(GPU predicated rendering) MSAD4— the SM5 masked sum-of-absolute-differences instructionEVAL_SNAPPED/EVAL_SAMPLE_INDEX/EVAL_CENTROID— SM5 PS interpolation-mode override opcodes- DXGI factory extras —
CreateSwapChainForCoreWindow,CreateSwapChainForComposition,GetWindowAssociation,CreateSoftwareAdapter, stereo/occlusion status events - DXGI device extras —
CreateSurface(DXGI surface sharing),QueryResourceResidency,SetGPUThreadPriority
Direct3D 9 (work in progress)
- SM2 and SM3 (vs_2_0 / ps_2_0 / vs_3_0 / ps_3_0): same JIT pipeline as D3D11 — token stream → SM4-shaped IR → C++20 → native code
- Broad SM3 opcode coverage: MOV/ADD/MUL/MAD/DP3/DP4/MIN/MAX, NRM, POW, SINCOS, ABS, LRP, CMP, SLT/SGE, LIT, DST, SGN, CRS, DP2ADD, TEXLD/TEXLDL/TEXLDB/TEXLDP/TEXLDD, TEXKILL, REP/ENDREP, LOOP/ENDLOOP, CALL/CALLNZ/LABEL (subroutine inlining), MOVA, matrix macros (M3x2–M4x4), all source/dest modifiers
- Predicated flow control: SETP, IFC, BREAKC/BREAKP (predicate register p0 mapped to temp)
- SM2 output registers:
oPos(RASTOUT[0]) → SV_Position,oD0/oD1(ATTROUT) → COLOR,oT0–oT7(TEXCRDOUT) → TEXCOORD — inferred from write scan when no DCL is present D3DSPR_MISCTYPE: vPos (→ SV_Position screen-space) and vFace (→ ±1.0 front/back via MOVC)- Compile-time DEF/DEFI constant folding; sampler/texture binding reconstruction from DCL
- Implicit-LOD SAMPLE dispatches to
sw_sample_3d_grad/sw_sample_cube_grad/sw_sample_2d_gradper texture type; screen-space derivatives computed from all UV components across the 2×2 quad
- VS: WVP transform; normal transform + normalize; full Blinn-Phong lighting (ambient, diffuse, specular) for up to 8 directional/point/spot lights; full material colour source routing (
D3DRS_COLORVERTEX,D3DMCS_COLOR1/2/MATERIALper component); vertex fog (LINEAR/EXP/EXP2);D3DTS_TEXTURE[i]transform (COUNT1–4, PROJECTED) - PS: all 8 TSS stages chained; color/alpha ops: SELECTARG1/2, MODULATE, MODULATE2X, ADD, ADDSIGNED, SUBTRACT, LERP; args: D3DTA_TEXTURE (white fallback), D3DTA_DIFFUSE, D3DTA_CURRENT, D3DTA_COMPLEMENT, D3DTA_ALPHAREPLICATE
- Custom VS + FF PS and FF VS + custom PS correctly coexist
- Vertex fog (
D3DRS_FOGVERTEXMODE): LINEAR/EXP/EXP2 computed from eye-space Z in the FF VS; custom SM3 VS writes fog factor tooFog(RASTOUT[1]) - Table fog (
D3DRS_FOGTABLEMODE): computed per-pixel from eye-space depth (1/perspW) in the rasterizer - Applied to all active render targets after the PS executes:
lerp(fogColor, color, fogFactor)
SetRenderState: depth/stencil, blend, cull, fill, scissor, alpha test, separate alpha blend, per-RT write masks, two-sided stencil, slope depth bias, lighting/specular/ambient, fogSetTexture/SetSamplerState(stages 0–7)SetTransform/MultiplyTransform(512 slots: world, view, projection, texture, bone matrices)SetTextureStageState(all 8 stages),SetMaterial,SetLight/LightEnable(up to 8 lights)SetClipPlane(6 planes),SetClipStatusSetVertexDeclaration,SetFVF,SetStreamSource(16 slots),SetStreamSourceFreq(GPU instancing),SetIndicesSetVertexShaderConstantF/I/B,SetPixelShaderConstantF/I/BCreateStateBlock/BeginStateBlock/EndStateBlock
SetStreamSourceFreqwithD3DSTREAMSOURCE_INDEXEDDATA | N(stream 0) andD3DSTREAMSOURCE_INSTANCEDATA | stepRate(per-instance streams)- Vertex declaration elements on per-instance streams automatically classified as
PerInstanceDatawith the declared step rate
DrawPrimitive,DrawIndexedPrimitive,DrawPrimitiveUP,DrawIndexedPrimitiveUP- All D3D9 primitive topologies including
D3DPT_TRIANGLEFAN Clear(colour, depth, stencil),PresentUpdateSurface,UpdateTexture(pixel copy; all resources are system memory)
IDirect3DTexture9: mipmap levels,LockRect/UnlockRect,D3DFMT_R8G8B824-bpp expandIDirect3DCubeTexture9: 6-face flat backing buffer,GetCubeMapSurface, cube-map samplingIDirect3DVolumeTexture9/IDirect3DVolume9: contiguous mip storage,LockBox/UnlockBox, 3D texture samplingIDirect3DVertexBuffer9,IDirect3DIndexBuffer9IDirect3DSurface9(render target, depth-stencil, offscreen)- Private data (
SetPrivateData/GetPrivateData/FreePrivateData) on all resources - Broad
D3DFORMATcoverage: BGRA/RGBA families, depth formats, DXT/BC compressed, float HDR, signed bump-map formats, luminance formats,D3DFMT_R3G3B2(3+3+2-bit packed, expanded on upload)
Direct3DCreate9andDirect3DCreate9Ex— both return a fullIDirect3D9EximplementationIDirect3DDevice9Ex:CheckDeviceState(returns S_OK — no lost device),ResetEx,PresentEx,Create*Exsurface variants
GetAdapterCount(1 software adapter),GetAdapterIdentifier,GetAdapterModeCount/EnumAdapterModes,GetAdapterModeCountEx/EnumAdapterModesEx,GetDeviceCaps,ValidateDevice,CheckDeviceType/Format
CreateQuery: EVENT (always ready), OCCLUSION (returns 0), TIMESTAMP (returns 0)
- Clip plane enable in FF VS/PS
DOTPRODUCT3,BLENDDIFFUSEALPHA,BUMPENVMAP*TSS opsProcessVertices
D3D11
Triangle
Textured Cube
Instanced Cubes
Floor — Aliased vs Mipmapped
DirectX SDK Samples — Tutorial 10
DirectX SDK Samples — DecalTessellation11
750+ tests across three categories:
- Unit tests — device, resources, views, states, formats, shader compilation, draw and compute pipelines, SM3 translator
- Golden tests — pixel-exact comparison against reference images for D3D11 and D3D9 rendering paths
- Perf tests — draw and compute benchmarks
- D3D11.3 Functional Spec — rasterization rules, fixed-point precision, LOD calculation, texture filtering
- D3D11 API Reference — API contracts, parameter rules, struct/enum definitions
- Parsing DXBC — DXBC container layout
d3d11TokenizedProgramFormat.hpp(Windows SDK) — opcode definitions, operand encoding, token layout for SM4/SM5 bytecode- DirectX9 Graphics Reference









