Skip to main content
← All papers

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0

Asaria, Salomone, GandhiΒ·June 12, 2026SYSTEMSVISION

A fused Triton kernel that properly drives the INT8 tensor cores on consumer Ampere GPUs β€” ~1.1Γ— end-to-end speedup, making 1024px generation feasible on a single RTX 3090.

Your browser can’t display the PDF inline. Download it instead.