https://www.geeknetic.es/Noticia/36691/La-nueva-integracion-de-voz-de-ChatGPT-unifica-conversacion-texto-y-controles-en-un-mismo-flujo-y-convierte-el-chat-en-una-experiencia-casi-natural.html
For years we have talked with AIs writing in a text box, as if the chat were an eternal form. The voice was always there, but in “demo” mode: something that you activated for a while, tried, surprised you a little and then returned to the keyboard.
with the last ChatGPT voice integrationthat feeling changes a lot. It is no longer a separate mode with its own interface, but rather another layer of the usual chat. And that, on a day-to-day basis, is much more noticeable than it seems.
From a “special way” to something that is always there
Until now, when you activated voice mode, ChatGPT would take you out of your conversation and take you to that screen of the “floating orb” that occupied everything. Visually it was attractive, but it also cut off your flow: either you talked, or you chatted. There was no middle ground.
With the new integration, the voice has been included in the normal chat. There is a button, you press it and start talking without leaving the conversation. While you speak, the system transcribes what you say line by line, as if you were writing very quickly. When he finishes answering you out loud, the text remains there, ready to reread it or quote it at another time.
The practical difference is clear: You no longer have to “change your mental mode”. You may be typing, throw out a phrase by voice because you’re too lazy to type, and return to the keyboard for the next response. It’s more like a real conversation, where you alternate between sending a voice note, writing something short, or just talking.
A small interface change that alters how you use it
What is most surprising is not the technology itself, but how This small interface change modifies the use. The voice button is no longer a hidden function and becomes a natural gesture: you have your phone in your hand, you have both hands busy or you are walking, and instead of juggling to write, you touch the icon and speak.
The integration also maintains the visual part. If you ask for a map, it shows it directly in the chat; If you ask about the weather, it gives you a voice plus a table with the week’s forecast, with its little “curtain” melody before speaking. It’s a very silly detail, but it makes the response feel less robotic and closer to the idea of ”assistant” than that of a simple text reader.
On the computer the sensation is different, but just as useful. Having the chat open in a tab and being able to ask a quick question by voice (without changing views or modes) makes ChatGPT something that accompanies you while you work, not a place you enter and exit.
The experience: smoother than Alexa or Siri, and with supportive text
In real use, the new voice feels much more natural than the classic Alexa or Siri ones. Not so much because of the quality of the voice (which has also improved) but because the interaction. You can interrupt the response in the middle, change the subject, rephrase as you go… and everything is reflected in text in the chat itself.
That supporting text is key. If you only had your voice, it would be easy to miss details or not remember exactly what they told you. With the transcript in front of you, you can go back, copy data, reread instructions, or review a summary without having to ask for it to be repeated. It is a mix that exposes many traditional attendees, accustomed to responding once and disappearing.
Besides, the integration with the rest of ChatGPT functions is still there: you can ask it to summarize news for you and read it to you while showing you links; You can ask for long explanations and, if you get tired of hearing him speak, continue reading yourself; You can even turn on the camera so that it “sees” what is in front of you and comments on it, all without leaving the same conversation thread.
The seams: images that do not arrive and maps that do not “live”
Not everything is perfect, and it shows that there are still unpolished edges. One of the clearest examples is the generation of images from one’s own voice. In theory, just ask him something like “make me a picture of X” and wait for him to appear. In practice, there are cases where the AI promises you that it is creating the image, apologizes for the wait… and the image never arrives. It’s the typical integration failure: the voice model does its part, but something is left hanging on the way to the visual generator.
Something similar happens with maps.: What it shows are static graphics, without the possibility of zooming, navigating or directly opening a map app. It’s useful as a quick reference, but it’s still far from that “show me how to get there” fantasy with a live, mobile-integrated route.
These are details that do not ruin the experience, but remind us that this is a first iteration. The base is very well resolved; Now the rest of the pieces need to be hooked at the same level.
The really interesting thing about this change is how it stops the voice from feeling like a demo.. Before you had to decide “now I’m going to try voice mode”; Now you simply speak when it suits you and write when you feel like it. Mobile, above all, is where it makes the most sense: anyone who has tried to write something a bit long on a small screen knows how tiring it is.
The most important thing about this update. It is not a spectacular function to show in a video, but rather a piece that fits into everyday life and changes the way you relate to the tool.
