I believe you are mistaken. The new model doesn’t first transcribe audio into text first. It’s fully modal end to end, meaning it accepts audio as input directly
@applemei4 ай бұрын
你怎麼知道的?
@DicksonPau964 ай бұрын
@@applemei this is the key selling point of the 4o model. It’s an end to end multimodal model.
@DicksonPau964 ай бұрын
You can check it out in the system card of 4o Here is what it says about the model in the opening paragraph. GPT-4o1 is an autoregressive omni model, which accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It’s trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.
Two numerical number 9.11 and 9.8 which one is bigger GPT say Between the two numbers 9.11 and 9.8, **9.8** is larger. This is because 9.8 is a greater value than 9.11 when comparing the decimal parts.