Can ChatGPT transcribe audio?

Can ChatGPT transcribe audio? Listen up to find out.

Can chatgpt transcribe audio?

Last Updated on

In these past few years, progressions in artificial intelligence software’s ability to understand language has enabled the future of transcription services to really come alive. ChatGPT has remained in the spotlight since being available to the public, being a hugely popular language model chatbot. But can ChatGPT transcribe audio? The latest version is powered by the multimodal model (GPT-4), and is by far the most advanced ChatGPT we’ve seen yet. With its list of language-based capabilities continuing to grow with each update, you might be wondering if ChatGPT can transcribe audio.

Can ChatGPT transcribe audio?

Yes, ChatGPT can in fact transcribe audio with a Speech to Text function that is powered by OpenAI’s Whisper API.

After a user uploads an audio file, ChatGPT will put this file through a speech recognition algorithm, which will process the speech and create a corresponding text output. Currently, the Whisper API supports the following file types: mp3, mp4, mpeg, mpga, m4a, wav, and webm. However, file uploads are presently restricted to 25 MB.

When you input prompts to OpenAI’s Whisper API, you get user-friendly speech-to-text functionality inside your own app! This is separate from the ChatGPT API. In addition, OpenAI Whisper comes with dialects and language support for Arabic, Greek, Polish, Swahili, Hindi, Malay, Tagalog, Hebrew, Marathi, Urdu, Kannada, and Welsh.

ChatGPT was trained on huge amounts of speech data. Therefore, ChatGPT Speech to Text is capable of understanding and transcribing over 50 languages to industry-standard benchmarks. Furthermore, it can translate and transcribe audio files from many languages into English.

You can use the speech to text function through ChatGPT on your PC or laptop. In addition, you can now access these features on the ChatGPT app for IOS. This makes the convenience and possibilities of speech to text transcription readily available. OpenAI continues to evolve the ways in which we view transcription and how it can be done efficiently.

Essential AI Tools

Editor’s pick
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
Editor’s pick

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
Editor’s pick
Only $0.00015 per word!

Winston AI detector

Winston AI: The most trusted AI detector. Winston AI is the industry leading AI content detection tool to help check AI content generated with ChatGPT, GPT-4, Bard, Bing Chat, Claude, and many more LLMs.
Only $0.01 per 100 words

Originality AI detector

Originality.AI Is The Most Accurate AI Detection.Across a testing data set of 1200 data samples it achieved an accuracy of 96% while its closest competitor achieved only 35%. Useful Chrome extension. Detects across emails, Google Docs, and websites.


10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
*Prices are subject to change. PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Learn more

How accurate is ChatGPT Speech to Text?

ChatGPT is infamous for its outstanding capabilities in natural language processing (NLP). However, no speech to text transcription tool will ever hit the one hundred percent accuracy mark. As a result, we can expect relatively high levels of accuracy. Despite this, there are some natural limitations to this Whisper API including the quality of the audio file, the diction and pronunciation, and any interfering background noise.

You can further use ChatGPT to help pull apart your transcription to create summaries, key points, and even related topics.

What is a multimodal AI?

Multimodal AI is artificial intelligence that can interpret more than one type of media – that is text, audio, image, and or video.

Large language models (LLMs) are, by themselves, text-based. ChatGPT exhibits multimodality in its ability to receive audio, and then output text based on the content of that audio. There are so many steps in between, standing on the shoulders of previous technological achievements. Siri, Cortana, Hey Google or Alexa have perfected turning your voice into text data. Producing original artworks and music, poems and videos however, is something no one yet claims to master.

Final Thoughts

If you found yourself wondering if you can use ChatGPT to transcribe audio files, then the answer is yes. Furthermore, We can expect greater accuracy and functionality of natural language processing as the model continues to develop, and to see how it can be used in a variety of industries from healthcare, education, and finance.

If you’re interested in exploring other options, some other AI-based transcription tools that have caught the attention of users recently include Otter AI, and alternatively Trint.