GPT-4 Now Has Voice and Vision

Sep 26

Exciting news! ChatGPT has recently rolled out new voice and image capabilities, making interactions with the AI assistant more intuitive and interactive. Here are some of the new features:

Voice Conversations

Users can now engage in seamless voice conversations with ChatGPT, making it more convenient and accessible to use. To activate this feature, users can navigate to Settings → New Features on the mobile app and opt into voice conversations. A selection from five different voice options is available for a customized experience.

Image Sharing

ChatGPT’s image sharing feature allows users to share images for various purposes. Whether it’s troubleshooting a grill, planning a meal with the contents of a fridge, or analyzing a complex work-related graph, ChatGPT is ready to assist. A drawing tool in the mobile app enables focus on specific parts of an image for more detailed assistance.

The deployment of these advanced features is gradual, in alignment with OpenAI’s commitment to safety and benefit. This approach allows for continuous improvement and refinement, ensuring the technology remains safe and beneficial for all users.

The new voice technology, backed by a robust text-to-speech model, offers realistic synthetic voices, enhancing the user experience. It is being used for specific use cases like voice chat to minimize the risk of misuse, such as impersonation or fraud. The vision-based models used for image input have undergone extensive testing to ensure responsible and safe usage. Technical measures have been implemented to limit ChatGPT’s ability to analyze and make direct statements about people, respecting individuals’ privacy.

User feedback plays a crucial role in enhancing the safety and usefulness of these features. OpenAI has collaborated with Be My Eyes, a mobile app for blind and low-vision people, to understand the uses and limitations of vision-based features, ensuring the technology is both useful and safe.

OpenAI maintains transparency about the model's limitations, especially in specialized topics and non-English text transcription. The upcoming expansion of access to voice and image features for Plus and Enterprise users is a significant step forward, with further access to other user groups, including developers, anticipated soon after.

The rollout of voice and image capabilities in ChatGPT marks a notable advancement in making the platform more interactive, accessible, and helpful for users across various domains. The gradual and thoughtful deployment ensures the technology remains safe and beneficial, aligning with OpenAI’s commitment to building advanced and responsible AI solutions. Stay tuned for more updates and enhancements in ChatGPT, enhancing the user experience even further.

You can learn more about these exciting new features, here: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak:

Richard Cawood

Richard is an award winning portrait photographer, creative media professional and educator currently based in Dubai, UAE.

http://www.2ndLightPhotography.com

GPT-4 Now Has Voice and Vision

Future's So Bright…

The Future of AI and Education