Text Recognition from Video Feed using ESP32-CAM

Today, we will design a project where we will perform Text Recognition from Video Feed using ESP32-CAM, we will use the OpenCV Image Processing Library and write code in Python.

Posted at: 27 - Jun - 2025

Category: ESP32 Projects

Author: xeohacker

0 Comments

Text Recognition from Video Feed using ESP32-CAM, Text Recognition with OpenCV, Text Recognition with ESP32 CAM, Python Code to recognize text

Hello, dear tech savvies! We hope everything is going fine with you. Today we’re back with another interesting project. Do you ever wonder how amazing it would be to have a text reader that would be able to read texts from pictures and videos? Think about a self-driving car that can read the road signs meticulously and go to the right direction. Or imagine an AI bot that can read what is written on images uploaded to social media. How nice it would be to have such a system that will be able to read vulgar posts and filter them even when they are in picture format? Or imagine a caregiver robot that can read the medicine bottle levels and give medicines to the patients always on time. Now you understand how important it is for AI solutions to recognize texts, right?

Today, we are going to do the same task in this project. The main component of our project is an ESP32-CAM. We will integrate it with the OpenCV library of Python. The Python code will read text from the video feed and show the text in the output terminal.

Introduction to the ESP32-CAM

The ESP32-CAM is a powerful yet affordable development board that combines the ESP32 microcontroller with an integrated camera module, making it an excellent choice for IoT and vision-based applications. Whether you're building a wireless security camera, a QR code scanner, or an AI-powered image recognition system, the ESP32-CAM provides a compact and cost-effective solution.

One of its standout features is built-in WiFi and Bluetooth connectivity, allowing it to stream video or capture images remotely. Despite its small size, it packs a punch with a dual-core processor, support for microSD card storage, and compatibility with various camera sensors (such as the OV2640). However, since it lacks built-in USB-to-serial functionality, flashing firmware requires an external FTDI adapter.

System Architecture

Overview

This system consists of an ESP32-CAM module capturing images and serving them over a web server. A separate Python-based OpenCV application fetches the images, processes them for Optical Character Recognition (OCR) using EasyOCR, and displays the results.

Components

ESP32-CAM Module

Captures images at 800x600 resolution.
Hosts a web server on port 80 to serve the images.
Connects to a Wi-Fi network as a station.
Provides image data when requested via an HTTP GET request.

Python OpenCV & EasyOCR Client

Requests images from the ESP32-CAM web server via HTTP GET requests.
Decodes the image and preprocesses it (resizing & grayscale conversion).
Performs OCR using EasyOCR.
Displays the real-time camera feed and extracted text.

Workflow

Step 1: ESP32-CAM Setup & Image Hosting

The ESP32-CAM initializes and configures the camera settings.
It connects to the Wi-Fi network.
It starts an HTTP web server that serves JPEG images via the endpoint http:///cam-hi.jpg.
When a request is received on /cam-hi.jpg, the ESP32-CAM captures an image and returns it as a response.

Step 2: Image Retrieval and Processing (Python OpenCV)

The Python script continuously fetches images from the ESP32-CAM.
The image is converted from a raw HTTP response into an OpenCV-compatible format.
It is resized to 400x300 for faster processing.
It is converted to grayscale to improve OCR accuracy.

Step 3: OCR and Text Extraction

EasyOCR processes the grayscale image to recognize text.
Detected text is printed to the console.
The processed image feed is displayed using OpenCV.

Step 4: User Interaction

The user can view the real-time video feed.
The recognized text is displayed in the terminal.
The script can be terminated by pressing 'q'.

List of components

Components	Quantity
ESP32-CAM WiFi + Bluetooth Camera Module	1
FTDI USB to Serial Converter 3V3-5V	1
Male-to-female jumper wires	4
Female-to-female jumper wire	1
MicroUSB data cable	1

Circuit diagram

The following is the circuit diagram for this project:

Fig: Circuit diagram

ESP32-CAM WiFi + Bluetooth Camera Module	FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)
5V	VCC
GND	GND
UOT	Rx
UOR	TX
IO0	GND (FTDI or ESP32-CAM)

Programming

If this is your first project with an ESP32 board, you need to do board installation. You will also need to download and install the ESP32-CAM library. To make the camera functional, the cp210x USB driver and the FTDI driver must be properly installed on your computer. Here is a detailed tutorial that shows how to get started with the ESP32-CAM.

ESP32-CAM code

#include

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

WebServer server(80);

static auto hiRes = esp32cam::Resolution::find(800, 600);

void serveJpg()

{

auto frame = esp32cam::capture();

if (frame == nullptr) {

Serial.println("CAPTURE FAIL");

server.send(503, "", "");

return;

}

Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

static_cast(frame->size()));

server.setContentLength(frame->size());

server.send(200, "image/jpeg");

WiFiClient client = server.client();

frame->writeTo(client);

}

void handleJpgHi()

{

if (!esp32cam::Camera.changeResolution(hiRes)) {

Serial.println("SET-HI-RES FAIL");

}

serveJpg();

}

void setup(){

Serial.begin(115200);

Serial.println();

{

using namespace esp32cam;

Config cfg;

cfg.setPins(pins::AiThinker);

cfg.setResolution(hiRes);

cfg.setBufferCount(2);

cfg.setJpeg(80);

bool ok = Camera.begin(cfg);

Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

}

WiFi.persistent(false);

WiFi.mode(WIFI_STA);

WiFi.begin(WIFI_SSID, WIFI_PASS);

while (WiFi.status() != WL_CONNECTED) {

delay(500);

}

Serial.print("http://");

Serial.println(WiFi.localIP());

Serial.println(" /cam-hi.jpg");

server.on("/cam-hi.jpg", handleJpgHi);

server.begin();

}

void loop()

{

server.handleClient();

}

After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Fig: Code successfully uploaded to ESP32-CAM

You have to copy the IP address and paste it into the following part of your Python code.

Fig: Copy-pasting the URL to the Python script

Code breakdown

#include

#include : Adds support for creating a lightweight HTTP server.
#include : Allows the ESP32 to connect to Wi-Fi networks.
#include : Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.

WebServer server(80);

WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).

static auto hiRes = esp32cam::Resolution::find(800, 600);

esp32cam::Resolution::find: Defines camera resolutions:

hiRes: High-resolution (800x600).

void serveJpg()

{

auto frame = esp32cam::capture();

if (frame == nullptr) {

Serial.println("CAPTURE FAIL");

server.send(503, "", "");

return;

}

Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

static_cast(frame->size()));

server.setContentLength(frame->size());

server.send(200, "image/jpeg");

WiFiClient client = server.client();

frame->writeTo(client);

}

esp32cam::capture: Captures a frame from the camera.
Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.
Logging Success: Prints the resolution and size of the captured image.
Serving the Image:

Sets the content length and MIME type as image/jpeg.
Writes the image data directly to the client.

void handleJpgHi()

{

if (!esp32cam::Camera.changeResolution(hiRes)) {

Serial.println("SET-HI-RES FAIL");

}

serveJpg();

}

handleJpgHi: Switches the camera to high resolution using esp32cam::Camera.changeResolution(hiRes) and calls serveJpg.
Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.

void setup(){

Serial.begin(115200);

Serial.println();

{

using namespace esp32cam;

Config cfg;

cfg.setPins(pins::AiThinker);

cfg.setResolution(hiRes);

cfg.setBufferCount(2);

cfg.setJpeg(80);

bool ok = Camera.begin(cfg);

Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

}

WiFi.persistent(false);

WiFi.mode(WIFI_STA);

WiFi.begin(WIFI_SSID, WIFI_PASS);

while (WiFi.status() != WL_CONNECTED) {

delay(500);

}

Serial.print("http://");

Serial.println(WiFi.localIP());

Serial.println(" /cam-hi.jpg");

server.on("/cam-hi.jpg", handleJpgHi);

server.begin();

}

∙ Serial Initialization:

Initializes the serial port for debugging.
Sets baud rate to 115200.

∙ Camera Configuration:

Sets pins for the AI Thinker ESP32-CAM module.
Configures the default resolution, buffer count, and JPEG quality (80%).
Attempts to initialize the camera and log the status.

∙ Wi-Fi Setup:

Connects to the specified Wi-Fi network in station mode.
Waits for the connection and logs the device's IP address.

∙ Web Server Routes:

Maps URL endpoint ( /cam-hi.jpg).
∙ Server Start:

Starts the web server.

void loop()

{

server.handleClient();

}

server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.

Summary of Workflow

The ESP32-CAM connects to Wi-Fi and starts a web server.
URL endpoint /cam-hi.jpg) lets the user request images at high resolution.
The camera captures an image and serves it to the client as a JPEG.
The system continuously handles new client requests.

Python code

import cv2

import requests

import numpy as np

import easyocr

import time

# Replace with your ESP32-CAM IP

ESP32_CAM_URL = "http://192.168.1.101/cam-hi.jpg"

# Initialize EasyOCR reader

reader = easyocr.Reader(['en'], gpu=False)

def capture_image():

""" Captures an image from the ESP32-CAM """

try:

start_time = time.time()

response = requests.get(ESP32_CAM_URL, timeout=2) # Reduced timeout for faster response

if response.status_code == 200:

img_arr = np.frombuffer(response.content, np.uint8)

img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)

print(f"[INFO] Image received in {time.time() - start_time:.2f} seconds")

return img

else:

print("[Error] Failed to get image from ESP32-CAM.")

return None

except Exception as e:

print(f"[Error] {e}")

return None

print("[INFO] Starting text recognition...")

while True:

frame = capture_image()

if frame is None:

continue # Skip this iteration if the image wasn't retrieved

# Resize image for faster processing

frame_resized = cv2.resize(frame, (400, 300))

# Convert to grayscale (better OCR accuracy)

gray = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2GRAY)

# Process image with EasyOCR

start_time = time.time()

results = reader.readtext(gray, detail=0, paragraph=True)

print(f"[INFO] OCR processed in {time.time() - start_time:.2f} seconds")

if results:

detected_text = " ".join(results)

print(f"[INFO] Recognized Text: {detected_text}")

# Display the image feed

cv2.imshow("ESP32-CAM Feed", frame_resized)

# Press 'q' to exit the loop

if cv2.waitKey(1) & 0xFF == ord('q'):

break

# Cleanup

cv2.destroyAllWindows()

Code Breakdown: ESP32-CAM Text Recognition Using EasyOCR

This Python script captures images from an ESP32-CAM, processes them, and extracts text using EasyOCR. Below is a detailed breakdown of each part of the code.

Importing Required Libraries

import cv2 # OpenCV for image processing and display

import requests # To send HTTP requests to the ESP32-CAM

import numpy as np # NumPy for handling image arrays

import easyocr # EasyOCR for text recognition

import time # For measuring performance time

cv2 (OpenCV) → Used for decoding, processing, and displaying images.
requests → Fetches the image from the ESP32-CAM.
numpy → Converts the image data into a format usable by OpenCV.
easyocr → Runs Optical Character Recognition (OCR) on the image.
time → Measures execution time for optimization.

Define ESP32-CAM IP Address

ESP32_CAM_URL = "http://192.168.1.100/cam-hi.jpg"

The ESP32-CAM hosts an image at this URL.
Ensure your ESP32-CAM and PC are on the same network.

Initialize EasyOCR

reader = easyocr.Reader(['en'], gpu=False)

EasyOCR is initialized with English ('en') as the recognition language.
gpu=False ensures it runs on the CPU (Set gpu=True if using a GPU for faster processing).

Function to Capture Image from ESP32-CAM

def capture_image():

""" Captures an image from the ESP32-CAM """

try:

start_time = time.time()

response = requests.get(ESP32_CAM_URL, timeout=2) # Reduced timeout for faster response

Sends an HTTP GET request to fetch an image.
timeout=2 → Ensures it doesn’t wait too long (prevents network lag).

if response.status_code == 200:

img_arr = np.frombuffer(response.content, np.uint8)

img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)

print(f"[INFO] Image received in {time.time() - start_time:.2f} seconds")

return img

If HTTP response is successful (200 OK):

Convert raw binary data (response.content) into a NumPy array.
Use cv2.imdecode() to convert it into an OpenCV image.
Print how long the image retrieval took.
Return the image.

else:

print("[Error] Failed to get image from ESP32-CAM.")

return None

If the ESP32-CAM fails to respond, it prints an error message and returns None.

except Exception as e:

print(f"[Error] {e}")

return None

Handles connection errors (e.g., ESP32-CAM offline, network issues).

Start Text Recognition

print("[INFO] Starting text recognition...")

Logs a message when the program starts.

Main Loop: Capturing & Processing Images

while True:

frame = capture_image()

if frame is None:

continue # Skip this iteration if the image wasn't retrieved

Continuously fetch images from ESP32-CAM.
If None (failed to capture), skip processing and retry.

Resize & Convert the Image to Grayscale

# Resize image for faster processing

frame_resized = cv2.resize(frame, (400, 300))

# Convert to grayscale (better OCR accuracy)

gray = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2GRAY)

Resizing to (400, 300) → Speeds up OCR processing without losing clarity.
Converting to grayscale → Improves OCR accuracy.

Perform OCR (Text Recognition)

start_time = time.time()

results = reader.readtext(gray, detail=0, paragraph=True)

print(f"[INFO] OCR processed in {time.time() - start_time:.2f} seconds")

Calls reader.readtext(gray, detail=0, paragraph=True).

detail=0 → Returns only the recognized text.
paragraph=True → Groups words into sentences.

Logs how long OCR processing takes.

if results:

detected_text = " ".join(results)

print(f"[INFO] Recognized Text: {detected_text}")

If text is detected, print the recognized text.

Display the Image (Optional)

cv2.imshow("ESP32-CAM Feed", frame_resized)

Opens a real-time preview window of the ESP32-CAM feed.

# Press 'q' to exit the loop

if cv2.waitKey(1) & 0xFF == ord('q'):

break

Press 'q' to exit the loop and stop the program.

Cleanup

cv2.destroyAllWindows()

Closes all OpenCV windows when the program exits.

Setting Up Python Environment

Install Dependencies:

Create a virtual environment:
python -m venv ocr_env

source ocr_env/bin/activate # Linux/Mac

ocr_env\Scripts\activate # Windows

Install required libraries:

pip install opencv-python numpy easyocr requests

After setting up the Python environment, run the Python code to capture images from the ESP32-CAM and perform text recognition using EasyOCR.

Let’s test the setup!

Run the Python code and place your camera in front of a text. The text will be detected.

Fig: Sample

You will see the text in the output window.

Fig: Detected text shown

fig: sample

fig: Detected text

Wrapping It Up

Congratulations! You've successfully built a real-time OCR system using ESP32-CAM and Python. With this setup, your ESP32-CAM captures images and streams them to your Python script, where OpenCV and EasyOCR extract text from the visuals. Whether you're automating data entry, reading license plates, or enhancing accessibility, this project lays the foundation for countless applications.

Now that you have it running, why not take it a step further? You could improve accuracy with better lighting, add pre-processing filters, or even integrate the results into a database or web dashboard. The possibilities are endless!

If you run into any issues or have ideas for improvements, feel free to experiment, tweak the code, and keep learning. Happy coding!

Text Recognition from Video Feed using ESP32-CAM

Introduction to the ESP32-CAM

System Architecture

Overview

Components

Workflow

Step 1: ESP32-CAM Setup & Image Hosting

Step 2: Image Retrieval and Processing (Python OpenCV)

Step 3: OCR and Text Extraction

Step 4: User Interaction

List of components

Circuit diagram

Programming

ESP32-CAM code

Code breakdown

Python code

Code Breakdown: ESP32-CAM Text Recognition Using EasyOCR

Importing Required Libraries

Define ESP32-CAM IP Address

Initialize EasyOCR

Function to Capture Image from ESP32-CAM

Start Text Recognition

Main Loop: Capturing & Processing Images

Resize & Convert the Image to Grayscale

Perform OCR (Text Recognition)

Display the Image (Optional)

Cleanup

Setting Up Python Environment

Install Dependencies:

Let’s test the setup!

Wrapping It Up

Syed Zain Nasir

THE ENGINEERING PROJECTS

ARDUINO

Raspberry Pi

ESP32

Introduction to the ESP32-CAM

System Architecture

Overview

Components

Workflow

Step 1: ESP32-CAM Setup & Image Hosting

Step 2: Image Retrieval and Processing (Python OpenCV)

Step 3: OCR and Text Extraction

Step 4: User Interaction

List of components

Circuit diagram

Programming

ESP32-CAM code

Code breakdown

Python code

Code Breakdown: ESP32-CAM Text Recognition Using EasyOCR

Importing Required Libraries

Define ESP32-CAM IP Address

Initialize EasyOCR

Function to Capture Image from ESP32-CAM

Start Text Recognition

Main Loop: Capturing & Processing Images

Resize & Convert the Image to Grayscale

Perform OCR (Text Recognition)

Display the Image (Optional)

Cleanup

Setting Up Python Environment

Install Dependencies:

Let’s test the setup!

Wrapping It Up

Syed Zain Nasir