Quite crappy? It seems to me like the current SOTA is working plenty good enough...

crote · on May 9, 2024

I sure haven't seen any of those "plenty good" results yet - current SOTA seems to be about as useful as semi-coherently gluing together random Google results. Good enough to perhaps replace a minimum-wage worker, not good enough to provide actual value when you care about the quality of the result.

This paper seems to suggest that significant advancement in domain knowledge acquisition and retention is exactly the problem, as you seem to need exponentially more data due to a lack of generalization. What's the point of a model which can perfectly quote Shakespeare if you're a programmer trying to refactor a proprietary codebase and it fails to make a link to whatever garbage it picked up from StackOverflow?

Filligree · on May 9, 2024

In CLIP.

Replacing CLIP with an LLM is the current meta in image generation models, specifically because of that lack of generalisation. This isn’t a surprise to anyone.