What is ChatGPT4o - omni?

Written by:

Johannes Olsson

CEO & Founder

GPT-4o - Omni

GPT-4o, where 'o' stands for "omni" or "everything" as it translates to. The model is a step towards a much more natural interaction between humans and ChatGPT. It is a so-called multimodal model — it accepts text, audio, and image as input and generates text, audio, and image as output. It can respond to audio input in as little as 1/4 of a second, which is similar to human response time in a conversation.

Faster and Better

It matches GPT-4 Turbo performance on English text and code, with significant improvements on text in other languages, while being much faster and the API is 50% cheaper. GPT-4o is particularly better at understanding images and audio compared to existing models.

GPT-4o voice

Before GPT-4o, you could use Voice to talk to ChatGPT but it couldn't directly observe tone, multiple speakers, or background sounds, and it couldn't generate laughter, singing, or express emotions like GPT4o can.

What else can GPT-4o do?

In videos that OpenAI has shown, you can also interrupt the AI mid-conversation to, for example, ask a new question, which makes the flow of the conversation much more natural.

Skrivet: 2024-05-31
Updated: 2024-10-31

Genom att klicka på "Acceptera" samtycker du till lagringen av cookies på din enhet.