Get the latest Science News and Discoveries

‘Visual’ AI models might not see anything at all: « The latest round of language models, like GPT-4o and Gemini 1.5 Pro, are touted as “multimodal,” able to understand images and audio as well as text. But a new study makes clear that they don’t really see the way you might expect. »


Although these companies' claims are artfully couched, it's clear that they want to express that the model sees in some sense of the word.

None

Get the Android app

Or read this on r/EverythingScience

Read more on:

Photo of Text

Text

Photo of Audio

Audio

Photo of GPT-4o

GPT-4o

Related news:

News photo

Transformative FiBa soft actuators pave the way for future soft robotics - EurekAlert

News photo

From Text to Trajectory: How MIT’s AI Masters Language-Guided Navigation

News photo

Stranded in Space? NASA Doesn’t See the Starliner Astronauts That Way.