Skip to main content

One post tagged with "computer-vision"

View All Tags

Judging single-image 3D generation without humans (and why cheap proxies fall short)

· 8 min read

A reproducible, human-free way to tell whether one generated 3D mesh is better than another, and a warning about the cheap automatic proxies people reach for instead.

Quick summary: Single-image-to-3D generators are improving fast, but there is no agreed, human-free way to say which of two generated meshes is better. We built a protocol around a fixed multi-view render rig and two independent vision-language judges, with a mandatory position-bias correction, and the two judges agree substantially (Cohen’s κ = 0.66) with no human labels. We then used that protocol as the reference and asked whether the cheap proxies people actually use (mesh geometry-validity statistics and render-space CLIP) can stand in for it. They cannot: geometry validity is a weak signal and render-CLIP is at chance. Worse, the proxies fail in a specifically misleading way, and we show below exactly where. We also report the things that didn’t work, because that is the useful part.