Get the latest Science News and Discoveries
‘Visual’ AI models might not see anything at all: « The latest round of language models, like GPT-4o and Gemini 1.5 Pro, are touted as “multimodal,” able to understand images and audio as well as text. But a new study makes clear that they don’t really see the way you might expect. »
Although these companies' claims are artfully couched, it's clear that they want to express that the model sees in some sense of the word.
None
Or read this on r/EverythingScience