Quantizing Ideogram 4.0 onto a 3090: an INT8 build that matches FP8 and a 4-bit GGUF that beats NF4
· 6 min read
INT8 and GGUF post-training quantization of Ideogram 4.0 on Ampere consumer GPUs, with the evidence behind each number.
Quick summary: Ideogram released two builds of their new model — a high-quality version meant for massive data center GPUs, and a lower-quality version for consumer cards. We took the high-quality version, shrank it down to the memory footprint of the consumer build, and kept the premium performance intact.


