ChatGPT can transcribe audio, and here’s how to do it

ChatGPT voice control is finally here

Image shows the ChatGPT logo next to a speech bubble on a green background below the PC Guide logo

You can trust PC GuideOur team of experts use a combination of independent consumer research, in-depth testing where appropriate – which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Last Updated on

Recent advances in artificial intelligence have allowed software to understand natural human language. This ability, known as natural language processing (NLP) has allowed transcription services to really come alive. Principally among them, ChatGPT has remained in the spotlight since its release in November 2022, and with its recent updates, ChatGPT can transcribe audio. The latest version is powered by the multimodal model (GPT-4) and is by far the most advanced AI bot we’ve seen yet. Its list of language-based capabilities continues to grow with each update, so in this article, we’ve decided to discuss its impressive speech-to-text feature and how to use it.

✓ Quick answer

Transcribing audio with ChatGPT

ChatGPT is now able to transcribe audio with its new speech-to-text feature. You can either utilize this feature through the mobile app using the microphone icon on the keyboard or you can input audio files using OpenAI’s Whisper API.

How to use ChatGPT to transcribe audio

Transcribing audio on the ChatGPT mobile app

ChatGPT can transcribe audio with a speech-to-text function that is powered by OpenAI’s Whisper API. While using the ChatGPT app for iOS or Android, you can ‘talk to ChatGPT’ by tapping the audio waveform icon. You’ll find this icon on the right-hand side of the prompt window. You may need to enable your phone’s microphone first, but once you’ve done that you should be good to go.

A smartphone screen displaying the chatgpt interface with a dark mode theme, where a user has typed a prompt asking if chatgpt can transcribe audio.

Once you’ve tapped on the microphone icon, the blue screen that’s pictured below should appear, and then all you have to do to stop recording is tap again. It might take a few seconds for the audio to process, then it will transcribe what you’ve said and send it as a message. Easy!

Voice recording interface on a messaging app screen with a blue record button displayed, now featuring that chatgpt can transcribe audio.

How to upload audio files to ChatGPT

If you’d like to upload a prerecorded audio file to ChatGPT, you’ll want the OpenAI API, not the ChatGPT interface. You can find out more about the API, and how to run it, on the OpenAI website.

After a user uploads an audio file, ChatGPT will put this file through a speech recognition algorithm, which will process the speech and create a corresponding text output. Currently, the Whisper API supports the following file types: mp3, mp4, mpeg, mpga, m4a, wav, and webm. However, file uploads are presently restricted to 25 MB.

When you input prompts to OpenAI’s Whisper API, you get user-friendly speech-to-text functionality inside your own app! This is separate from the ChatGPT API. In addition, OpenAI Whisper comes with dialects and language support for Arabic, Greek, Polish, Swahili, Hindi, Malay, Tagalog, Hebrew, Marathi, Urdu, Kannada, and Welsh.

ChatGPT was trained on huge amounts of speech data. Therefore, ChatGPT Speech to Text is capable of understanding and transcribing over 50 languages to industry-standard benchmarks. Furthermore, it can translate and transcribe audio files from many languages into English.

What devices can use the speech-to-text feature?

You can use the speech-to-text function through ChatGPT on your PC or laptop. In addition, you can now access these features on the ChatGPT app for IOS. This makes the convenience and possibilities of speech-to-text transcription readily available. OpenAI continues to evolve the ways in which we view transcription and how it can be done efficiently.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.


10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.


Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

How accurate is ChatGPT speech-to-text?

ChatGPT is infamous for its outstanding capabilities in natural language processing (NLP). However, no speech-to-text transcription tool will ever hit the one hundred percent accuracy mark. As a result, we can expect relatively high levels of accuracy. Despite this, there are some natural limitations to this Whisper API including the quality of the audio file, the diction and pronunciation, and any interfering background noise.

You can further use ChatGPT to help pull apart your transcription to create summaries, key points, and even related topics. If you’d like to find out more about the performance of ChatGPT’s functions, check out our ChatGPT review where put the AI chatbot to the test.

AI transcription alternatives

Although ChatGPT’s audio transcription capabilities are impressive and work well, it may be useful to consider other audio AI transcription alternatives. See the list below for ChatGPT audio transcription alternatives.

What is a multimodal AI?

Multimodal AI is artificial intelligence that can interpret more than one type of media. Different types of digital media include text, audio, image, and video. This list is roughly in order of least to most difficult to generate a convincing AI version of, as video is essentially dozens of images, plus motion and consistency between frames, in addition to audio. At this time, no company has achieved photorealistic AI video that would convince most people most of the time, although Runway ML is leading in this respect.

Large language models (LLMs) are, by themselves, text-based. ChatGPT exhibits multimodality in its ability to receive audio or image files, and then output text or images based on that content. There are so many steps in between, standing on the shoulders of previous technological achievements. Siri, Cortana, Hey Google or Alexa have perfected turning your voice into text data. However, the production of original artwork such as music, poems, and videos, is something no one yet claims to master.

Final Thoughts

If you found yourself wondering if you can use ChatGPT to transcribe audio files, then the answer is yes. Furthermore, We can expect greater accuracy and functionality of natural language processing as the model continues to develop and to see how it can be used in a variety of industries from healthcare, to education, and finance.

If you’re interested in exploring other options, some other AI-based transcription tools that have caught the attention of users recently include Otter AI and alternatively Trint.

Marla writes across a wide range of topics across PC Guide, including AI, PC hardware, and news on the latest tech releases. She's a passionate writer that's interested in the future of technology.