Caution
This is not a library, and it is not production-ready. Imla is a for-fun experiment exploring an alternative Jetpack Compose rendering approach — capturing the Compose root and effect layers through OpenGL and HardwareBuffers. Don't ship it.
Imla (Ukrainian for "Haze", pronounced [ˈimlɑ] (eem-lah)) is an experimental
GPU-accelerated backdrop blur for Jetpack Compose. It captures the Compose root
into a HardwareBuffer, imports it zero-copy as an OpenGL ES texture, and runs
blur, tint, noise, clip, and progressive-mask passes that sample that shared
backdrop. Targets Android 6 (API 23) and up.
- Backdrop blur — separable Gaussian, adjustable radius, gamma-correct (blurred in linear light to avoid dark seams).
- Progressive blur — a brush mask drives blur strength per pixel (crisp → blurred gradients).
- Shape masking — clip a layer to any
Shapeoutline. - Tinting — color composited over the blurred backdrop.
- Noise blend — frosted-glass grain, lightness-driven (strongest in mid-tones, eased off in shadows and highlights).
- Composite rendering — stack independent effect layers; each samples the shared backdrop and composites by z-order.
- Rotation-correct — 3-axis
graphicsLayerrotation stays aligned with the sampled backdrop. - Hardware-accelerated — HardwareBuffer capture and zero-copy GL import, on-GPU end-to-end.
- Supports Android 6 (API 23) onwards.
Each tile below is a single Modifier.effectLayer { … } sampling one shared
backdrop — rendered on a calibration grid so the effect is easy to read.
Wrap a screen — or any region — in ImlaHost, then declare blur regions with the
effectGroup / effectLayer modifiers. No renderer or OpenGL objects are passed
to children; they register through the host.
ImlaHost {
Box(
modifier = Modifier
.fillMaxSize()
.effectGroup() // the backdrop that effect layers sample from
) {
FeedContent()
TopBar(
modifier = Modifier.effectLayer {
backdropBlur(radius = 12.dp)
tint(Color.White.copy(alpha = 0.1f))
noise(alpha = 0.15f)
clip(RoundedCornerShape(16.dp))
}
)
}
}Public API:
ImlaHost,
Modifier.effectGroup(),
Modifier.effectLayer { ... },
EffectLayerScope, and
EffectLayerBoundsProvider.
ImlaHost wraps everything inside it into a single GPU-backed surface and
renders all content and effects through it. Children never touch the renderer —
they register through the host's SceneRegistry.
flowchart TB
Host["ImlaHost { }"]
Content["content() — your screen UI"]
Group["Modifier.effectGroup()<br/>backdrop the layers sample"]
Layer["Modifier.effectLayer { }<br/>blur · tint · noise · clip"]
Registry[("SceneRegistry")]
Renderer["SceneRenderer"]
Surface["single output SurfaceView<br/>API 29+ · SurfaceControl (zero-copy)<br/>API 23–28 · blit into Surface"]
Host --> Content
Host --> Surface
Content --> Group
Content --> Layer
Group -. registers .-> Registry
Layer -. registers .-> Registry
Registry --> Renderer
Renderer -- presents --> Surface
The output is a single SurfaceView on all supported APIs (Compose's
AndroidExternalSurface is itself backed by a SurfaceView). What differs is how
the renderer presents into it:
- API 29+ — the
SurfaceViewis driven viaSurfaceControl; the renderer handsHardwareBuffers straight to SurfaceFlinger with no present pass oreglSwapBuffers(zero-copy). - API 23–28 — there is no
SurfaceControl, so the renderer blits into theSurfaceView'sSurface(obtained throughAndroidExternalSurface).
The root content is captured once per frame and shared by every effect layer that samples it.
sequenceDiagram
participant UI as Compose main thread
participant CAP as Capture thread
participant GL as GL thread
participant OUT as Output (SurfaceControl / EGL)
UI->>UI: record content into a GraphicsLayer (draw pass)
UI->>UI: vsync-align capture (postOnAnimation)
UI->>CAP: HardwareRenderer.syncAndDraw → HardwareBuffer
CAP-->>UI: HardwareBuffer ready (async)
UI->>GL: hand off latest-only snapshot, requestRender
GL->>GL: import HardwareBuffer zero-copy (EGLImage)
Note over GL: noise (once) → root draw →<br/>per slot: [stencil clip] prepare → separable blur →<br/>composite (tint · noise · mask fused) → content
GL->>OUT: present — API 29+ SurfaceControl (zero-copy) · else blit + swap
The Compose root is drawn into a single shared HardwareBuffer (a RenderNode
rendered single-buffered off the main thread), and every effect layer samples that
one backdrop. Capture lives in
SingleBufferRenderer;
the buffer is leased to the GL thread via
BufferLease.
The GL thread imports it zero-copy with eglCreateImageFromHardwareBuffer +
glEGLImageTargetTexture2DOES — no glReadPixels round-trip — in
OpenGLHardwareBufferTexture2D,
driven by
CapturedFrameImporter.
On API 29+ the finished HardwareBuffer goes straight to SurfaceFlinger (see
ScenePresenter
and the buffer ring
SceneHwBufferRing),
so pixels stay in GPU/shared memory across capture → effects → present. This path
is the main thing being experimented with here, and is still being tuned.
The OpenGL abstractions are reused from a sister experiment, desugar-64/android-opengl-renderer — a playground for learning graphics and OpenGL, with helpers for setting up GL data and calls.
The renderer is fully dynamic: it pushes vertex data every frame. Flexible, but it adds overhead — a future optimization target.
- The separable Gaussian blur pass is the dominant GPU cost; it scales with the blur radius and the on-screen area being blurred.
- Per-frame cost is driven mostly by render-pass / FBO switches rather than the blur math itself, so timing varies by scene.
- Expect bugs and untested edge cases. This is a for-fun experiment, not a
hardened library — it has rough edges and unhandled cases. In particular, it
has not been tested hosting content that itself contains other
SurfaceViews (video players, maps, camera previews, nested Imla hosts); surface-compositing scenarios like these may misbehave. - HardwareBuffer capture is not always the fastest path. Across the 4 devices
I tested on, the hardware-rendering capture path (
RenderNode→HardwareBuffer→ zero-copy GL import) is fast on 3. On the 4th, a low-end budget Android 13 tablet, it runs slower than capturing the UI into an external texture. The cause is not yet pinned down. - No atlas grouping yet. Effect layers render as separate draws/passes rather than batched into a shared atlas. It's the main remaining GL-side optimization, deferred because it has to integrate with the bucketed capture buffers used for resizable content.
- Dynamic per-frame vertex push. The renderer uploads vertex data every frame (see Rendering Abstraction) — flexible, but more overhead than a static/instanced path.
- Convoluted internal architecture. The code grew through iteration and is more tangled than it needs to be — capture/import/present spread across many small types, with seams left over from past experiments. It works, but expect rough edges.
- Render the Compose root through OpenGL behind a single host surface;
- Capture and draw child content as GL textures;
- Validate rotated slot geometry and root-space backdrop sampling;
- Real separable Gaussian blur passes;
- Progressive masks, stencil clips, tint, noise, and cumulative backdrop sampling;
- Atlas grouping for effect layers (needs integration with the bucketed capture buffers).
- Present via hardware
Bitmap, no SurfaceView. Wrap the resultHardwareBufferas a hardwareBitmap(Bitmap.wrapHardwareBuffer, API 29+) and draw it in the Compose layout. It then composites through the normal Android (HWUI) path, so the host need not redirect the whole layout through our renderer. Open question: is it faster than the SurfaceView present, especially on low-end devices? - Compute-shader effect path. Run the separable blur as a GLES 3.1 compute shader instead of fragment passes into FBOs, skipping rasterization and some FBO ping-pong. Open questions: faster on mobile GPUs? compute-support / min-API floor?
This project is open to suggestions and contributions. Feel free to open issues or submit pull requests on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.
For project development updates and history, refer to this Twitter thread.







