Home > AI

ChatGPT can extract text from an image, here’s how

The simple process of extracting text from an image using ChatGPT

Reviewed By: Kevin Pocock

Last Updated on April 1, 2024
You can trust PC Guide: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Information is abundantly available in various forms in the digital age, including text and images. While text is easily accessible and understood by computers, extracting valuable information from images has traditionally been challenging. However, advancements in artificial intelligence have revolutionized this process. One such breakthrough is the ability of ChatGPT, a state-of-the-art language model developed by OpenAI, being able to extract text from images. But how do you get ChatGPT to extract text from an image? And, how does it work?

Quick Answer

ChatGPT can now extract text from an image using the GPT-4 visual language model. To access this function users have to be subscribed to ChatGPT plus, and know how to input an image with the paperclip icon.

How to extract text from an image with ChatGPT

Extracting text from an image using ChatGPT is a simple process. However, before we get started it is important to note that this feature is only available for users subscribed to one of ChatGPT’s paid plans (Plus/Team/Enterprise). The file inputting function arrived as part of the GPT-4 upgrade and is not available for users on the free version of the app. So, if you’re hoping to access this feature but are not subscribed, your first step would be to set up a ChatGPT Plus account, here’s our simple guide where we explain the process.

Now, let’s dive in.

Step

1

Log into  your account

Open ChatGPT in your Web browser or the app, and log into your account.

Step

2

Select the paperclip icon

Select the paperclip icon found on bottom left side of the page.

Step

3

Select image

Once the paperclip icon is selected, your devices files should pop up. From there you can navigate through your files and select the image you want to extract text from.

Step

4

Type prompt

You’re image will appear, from there you can type a prompt that relates to the text in the image. For example, “Extract the text from this image”.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
TRY FOR FREE

WordAI

10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
TRY FOR FREE

Copy.ai

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
TRY FOR FREE

Writesonic

Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

How ChatGPT extracts text from images

ChatGPT extracts text from images with the help of OpenAI’s Code Interpreter. It is a Python-based ChatGPT plugin that enhances the generative AI tool’s abilities. Thanks to the GPT-4 VLM (visual language model), ChatGPT converts images to text with the aid of computer vision. A specific kind of computer vision is used, called optical character recognition technology (OCR technology). This deep learning tech recognizes a subject, like alphabetical letters or human faces, present in an image. It then converts this visual data (pixels) into a machine-readable format.

Using a GPT (Generative Pre-trained Transformer) like ChatGPT’s GPT-4 for data extraction via image recognition is an advanced computer process only possible with artificial intelligence. OCR software uses computer vision models to interface between what a human would subjectively say they can see, and what a computer can objectively process in some usable way.

This process presents a different content creation use for the chatbot other than the standard text input prompts. It proves the growing uses of complex large language model (LLM) algorithms and convolutional neural networks (CNNs).

Here’s how the image-to-text extraction works:

  • Image Processing: The first step is to preprocess the image and prepare it for analysis. This may involve resizing, enhancing contrast, and noise reduction.
  • Text Detection: ChatGPT employs advanced object detection techniques to identify regions in the image that likely contain text. This involves identifying shapes and patterns that resemble letters and words.
  • Feature Extraction: Once potential text regions are detected, ChatGPT extracts relevant features from these regions, such as font styles, sizes, and orientations. This information helps in reconstructing the text accurately.
  • Contextual Analysis: The extracted features are fed into the language model, where ChatGPT uses its contextual understanding of language to decipher the text. This step ensures that the extracted text makes sense within the context of the visual image.
  • Post-Processing: After text extraction from the image input, a post-processing step may be applied to refine the output, correct errors, and improve overall accuracy.

Using ChatGPT to extract text from images – limitations

While ChatGPT’s text extraction from images represents a significant advancement in natural language processing (NLP) AI, there are still challenges to address. When inputting images into ChatGPT it is important to ensure they are of high quality. The text extraction accuracy may vary based on image quality, fonts, and other factors. So blurry screenshots with small text might prove hard for ChatGPT to analyze and extract. Continued research and development in machine learning will likely lead to improvements in performance and reliability. 

Additionally, a limitation of this function, as previously discussed in this article, is it only being readily available for users on a paid subscription to ChatGPT Plus. Sadly the image input paperclip icon cannot be used in GPT 3.5, found in the free version of ChatGPT.

Conclusion

OpenAI’s GPT-4 update came with a whole host of exciting new features available for ChatGPT Plus subscribers. Among them, is the ability to have ChatGPT extract text from user inputted images. This handy feature allows users to have ChatGPT convert images into text with the help of computer vision – GPT-4’s Visual Language model.

Following the simple steps outlined above provides you with the option of using AI to extract possibly creating a more efficient working environment and saving time. If you’d like to learn more about ChatGPT’s image-analyzing capabilities, then check out our ChatGPT image-analyzing feature guide.

Gloria is a tech and AI writer for PC Guide. She is interested in what new technology means for the future of digital and broadcast journalism.