β All papers
Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0
A fused Triton kernel that properly drives the INT8 tensor cores on consumer Ampere GPUs β ~1.1Γ end-to-end speedup, making 1024px generation feasible on a single RTX 3090.