ChatGPT Expands New Voice and Image Functions Feature

OpenAI, the company led by Sam Altman, has updated its great flagship development: ChatGPT. The popular chatbot has seen its capabilities expanded, and from now on, ‘he will see, hear and speak.’ Specifically, as corporate sources have advanced, “we are beginning to deploy new voice and image functions in ChatGPT.” An update that provides a more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you are talking about. “Voice and image give you more ways to use ChatGPT in your life. Take a photo of a point of interest. At the same time, you travel and have a live conversation about what you find interesting. When you are at home, take photos of the refrigerator and pantry to know what’s for dinner,” they exemplified.

Thus, in the next two weeks, Plus and Enterprise users will be able to enjoy the voice and image functions of ChatGPT. In this sense, it is worth noting that while voice will be available on iOS and Android, images will be enabled on all platforms.

Gradual rollout

The objective of OpenAI is to build “safe” and “beneficial” artificial intelligence. Precisely for this reason, they believe that their tools should be made available gradually, as this “allows us to introduce improvements and refine risk mitigation over time while preparing everyone for more powerful systems in the future.” This strategy, they have stressed, is even more important with advanced voice and vision models.

The new voice technology – capable of creating realistic synthetic voices from a few seconds of real speech – “opens the doors to many creative and accessibility-focused applications .” However, these capabilities also present new risks, such as the possibility of malicious actors impersonating public figures or committing fraud.

That is why they have chosen to use this technology for a specific use case: voice chat. “The voice chat has been created with voice actors that we have worked with directly.” However, they are also collaborating in similar ways with others. For example, Spotify is using the power of this technology to pilot its voice translation feature, which helps podcast creators expand the reach of their storytelling by translating podcasts into other languages with their own voice.

But vision models also bring new challenges to the table, ranging from speculation about people to confidence in the model’s interpretation of images in high-risk areas. “Before widespread deployment, we tested the model with risk red teams in areas such as extremism and scientific competition and with a diverse set of alpha testers.” In this sense, they say, the research allowed them to focus on some key details for responsible use.

A useful and safe vision

Like other ChatGPT features, Vision aims to help users in their daily lives. And he does it best when he can see what you see. This approach has been based directly on his work with Be My Eyes, a free mobile application for the blind and people with low Vision, to understand its uses and limitations. “Users have told us that they find it valuable to have general conversations about images where people appear in the background, for example, if someone appears on TV while you are trying to figure out how to adjust the remote control.”

They have also taken technical steps to significantly limit ChatGPT’s ability to analyze and make direct claims about people since ChatGPT is not always accurate. These systems must respect people’s privacy. “Real-world use and feedback will help us further improve these safeguards without the tool ceasing to be useful,” they defended.

Transparency about model limitations

Users can depend on ChatGPT for specialized topics, for example, in fields such as research. “We are transparent about the limitations of the model and discourage higher risk use cases without proper verification.” In addition, they have highlighted the model is competent in transcribing texts in English but does not work well with other languages, especially those that do not have Roman script. “We advise our non-English speaking users not to use ChatGPT for this purpose.”