
Hello, dear tech savvies! We hope everything is going fine with you. Today we’re back with another interesting project. Do you ever wonder how amazing it would be to have a text reader that would be able to read texts from pictures and videos? Think about a self-driving car that can read the road signs meticulously and go to the right direction. Or imagine an AI bot that can read what is written on images uploaded to social media. How nice it would be to have such a system that will be able to read vulgar posts and filter them even when they are in picture format? Or imagine a caregiver robot that can read the medicine bottle levels and give medicines to the patients always on time. Now you understand how important it is for AI solutions to recognize texts, right?
Today, we are going to do the same task in this project. The main component of our project is an ESP32-CAM. We will integrate it with the OpenCV library of Python. The Python code will read text from the video feed and show the text in the output terminal.
Introduction to the ESP32-CAM
The ESP32-CAM is a powerful yet affordable development board that combines the ESP32 microcontroller with an integrated camera module, making it an excellent choice for IoT and vision-based applications. Whether you're building a wireless security camera, a QR code scanner, or an AI-powered image recognition system, the ESP32-CAM provides a compact and cost-effective solution.
One of its standout features is built-in WiFi and Bluetooth connectivity, allowing it to stream video or capture images remotely. Despite its small size, it packs a punch with a dual-core processor, support for microSD card storage, and compatibility with various camera sensors (such as the OV2640). However, since it lacks built-in USB-to-serial functionality, flashing firmware requires an external FTDI adapter.
System Architecture
Overview
This system consists of an ESP32-CAM module capturing images and serving them over a web server. A separate Python-based OpenCV application fetches the images, processes them for Optical Character Recognition (OCR) using EasyOCR, and displays the results.
Components
ESP32-CAM Module
Captures images at 800x600 resolution.
Hosts a web server on port 80 to serve the images.
Connects to a Wi-Fi network as a station.
Provides image data when requested via an HTTP GET request.
Python OpenCV & EasyOCR Client
Requests images from the ESP32-CAM web server via HTTP GET requests.
Decodes the image and preprocesses it (resizing & grayscale conversion).
Performs OCR using EasyOCR.
Displays the real-time camera feed and extracted text.
Workflow
Step 1: ESP32-CAM Setup & Image Hosting
The ESP32-CAM initializes and configures the camera settings.
It connects to the Wi-Fi network.
It starts an HTTP web server that serves JPEG images via the endpoint http://
/cam-hi.jpg. When a request is received on /cam-hi.jpg, the ESP32-CAM captures an image and returns it as a response.
Step 2: Image Retrieval and Processing (Python OpenCV)
The Python script continuously fetches images from the ESP32-CAM.
The image is converted from a raw HTTP response into an OpenCV-compatible format.
It is resized to 400x300 for faster processing.
It is converted to grayscale to improve OCR accuracy.
Step 3: OCR and Text Extraction
EasyOCR processes the grayscale image to recognize text.
Detected text is printed to the console.
The processed image feed is displayed using OpenCV.
Step 4: User Interaction
The user can view the real-time video feed.
The recognized text is displayed in the terminal.
The script can be terminated by pressing 'q'.
List of components

Components |
Quantity |
ESP32-CAM WiFi + Bluetooth Camera Module |
1 |
FTDI USB to Serial Converter 3V3-5V |
1 |
Male-to-female jumper wires |
4 |
Female-to-female jumper wire |
1 |
MicroUSB data cable |
1 |
Circuit diagram
The following is the circuit diagram for this project:

Fig: Circuit diagram

ESP32-CAM WiFi + Bluetooth Camera Module |
FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position) |
---|---|
5V |
VCC |
GND |
GND |
UOT |
Rx |
UOR |
TX |
IO0 |
GND (FTDI or ESP32-CAM) |
Programming
If this is your first project with an ESP32 board, you need to do board installation. You will also need to download and install the ESP32-CAM library. To make the camera functional, the cp210x USB driver and the FTDI driver must be properly installed on your computer. Here is a detailed tutorial that shows how to get started with the ESP32-CAM.
ESP32-CAM code
#include
#include
#include
const char* WIFI_SSID = "SSID";
const char* WIFI_PASS = "password";
WebServer server(80);
static auto hiRes = esp32cam::Resolution::find(800, 600);
void serveJpg()
{
auto frame = esp32cam::capture();
if (frame == nullptr) {
Serial.println("CAPTURE FAIL");
server.send(503, "", "");
return;
}
Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),
static_cast
server.setContentLength(frame->size());
server.send(200, "image/jpeg");
WiFiClient client = server.client();
frame->writeTo(client);
}
void handleJpgHi()
{
if (!esp32cam::Camera.changeResolution(hiRes)) {
Serial.println("SET-HI-RES FAIL");
}
serveJpg();
}
void setup(){
Serial.begin(115200);
Serial.println();
{
using namespace esp32cam;
Config cfg;
cfg.setPins(pins::AiThinker);
cfg.setResolution(hiRes);
cfg.setBufferCount(2);
cfg.setJpeg(80);
bool ok = Camera.begin(cfg);
Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");
}
WiFi.persistent(false);
WiFi.mode(WIFI_STA);
WiFi.begin(WIFI_SSID, WIFI_PASS);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
Serial.print("http://");
Serial.println(WiFi.localIP());
Serial.println(" /cam-hi.jpg");
server.on("/cam-hi.jpg", handleJpgHi);
server.begin();
}
void loop()
{
server.handleClient();
}
After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Fig: Code successfully uploaded to ESP32-CAM
You have to copy the IP address and paste it into the following part of your Python code.

Fig: Copy-pasting the URL to the Python script
Code breakdown
#include
#include
#include
#include
: Adds support for creating a lightweight HTTP server. #include
: Allows the ESP32 to connect to Wi-Fi networks. #include
: Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.
const char* WIFI_SSID = "SSID";
const char* WIFI_PASS = "password";
WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.
WebServer server(80);
WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).
static auto hiRes = esp32cam::Resolution::find(800, 600);
esp32cam::Resolution::find: Defines camera resolutions:
hiRes: High-resolution (800x600).
void serveJpg()
{
auto frame = esp32cam::capture();
if (frame == nullptr) {
Serial.println("CAPTURE FAIL");
server.send(503, "", "");
return;
}
Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),
static_cast
server.setContentLength(frame->size());
server.send(200, "image/jpeg");
WiFiClient client = server.client();
frame->writeTo(client);
}
esp32cam::capture: Captures a frame from the camera.
Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.
Logging Success: Prints the resolution and size of the captured image.
Serving the Image:
Sets the content length and MIME type as image/jpeg.
Writes the image data directly to the client.
void handleJpgHi()
{
if (!esp32cam::Camera.changeResolution(hiRes)) {
Serial.println("SET-HI-RES FAIL");
}
serveJpg();
}
handleJpgHi: Switches the camera to high resolution using esp32cam::Camera.changeResolution(hiRes) and calls serveJpg.
Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.
void setup(){
Serial.begin(115200);
Serial.println();
{
using namespace esp32cam;
Config cfg;
cfg.setPins(pins::AiThinker);
cfg.setResolution(hiRes);
cfg.setBufferCount(2);
cfg.setJpeg(80);
bool ok = Camera.begin(cfg);
Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");
}
WiFi.persistent(false);
WiFi.mode(WIFI_STA);
WiFi.begin(WIFI_SSID, WIFI_PASS);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
Serial.print("http://");
Serial.println(WiFi.localIP());
Serial.println(" /cam-hi.jpg");
server.on("/cam-hi.jpg", handleJpgHi);
server.begin();
}
∙ Serial Initialization:
Initializes the serial port for debugging.
Sets baud rate to 115200.
∙ Camera Configuration:
Sets pins for the AI Thinker ESP32-CAM module.
Configures the default resolution, buffer count, and JPEG quality (80%).
Attempts to initialize the camera and log the status.
∙ Wi-Fi Setup:
Connects to the specified Wi-Fi network in station mode.
Waits for the connection and logs the device's IP address.
∙ Web Server Routes:
Maps URL endpoint ( /cam-hi.jpg).
∙ Server Start:
Starts the web server.
void loop()
{
server.handleClient();
}
server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.
Summary of Workflow
The ESP32-CAM connects to Wi-Fi and starts a web server.
URL endpoint /cam-hi.jpg) lets the user request images at high resolution.
The camera captures an image and serves it to the client as a JPEG.
The system continuously handles new client requests.
Python code
import cv2
import requests
import numpy as np
import easyocr
import time
# Replace with your ESP32-CAM IP
ESP32_CAM_URL = "http://192.168.1.101/cam-hi.jpg"
# Initialize EasyOCR reader
reader = easyocr.Reader(['en'], gpu=False)
def capture_image():
""" Captures an image from the ESP32-CAM """
try:
start_time = time.time()
response = requests.get(ESP32_CAM_URL, timeout=2) # Reduced timeout for faster response
if response.status_code == 200:
img_arr = np.frombuffer(response.content, np.uint8)
img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
print(f"[INFO] Image received in {time.time() - start_time:.2f} seconds")
return img
else:
print("[Error] Failed to get image from ESP32-CAM.")
return None
except Exception as e:
print(f"[Error] {e}")
return None
print("[INFO] Starting text recognition...")
while True:
frame = capture_image()
if frame is None:
continue # Skip this iteration if the image wasn't retrieved
# Resize image for faster processing
frame_resized = cv2.resize(frame, (400, 300))
# Convert to grayscale (better OCR accuracy)
gray = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2GRAY)
# Process image with EasyOCR
start_time = time.time()
results = reader.readtext(gray, detail=0, paragraph=True)
print(f"[INFO] OCR processed in {time.time() - start_time:.2f} seconds")
if results:
detected_text = " ".join(results)
print(f"[INFO] Recognized Text: {detected_text}")
# Display the image feed
cv2.imshow("ESP32-CAM Feed", frame_resized)
# Press 'q' to exit the loop
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Cleanup
cv2.destroyAllWindows()
Code Breakdown: ESP32-CAM Text Recognition Using EasyOCR
This Python script captures images from an ESP32-CAM, processes them, and extracts text using EasyOCR. Below is a detailed breakdown of each part of the code.
Importing Required Libraries
import cv2 # OpenCV for image processing and display
import requests # To send HTTP requests to the ESP32-CAM
import numpy as np # NumPy for handling image arrays
import easyocr # EasyOCR for text recognition
import time # For measuring performance time
cv2 (OpenCV) → Used for decoding, processing, and displaying images.
requests → Fetches the image from the ESP32-CAM.
numpy → Converts the image data into a format usable by OpenCV.
easyocr → Runs Optical Character Recognition (OCR) on the image.
time → Measures execution time for optimization.
Define ESP32-CAM IP Address
ESP32_CAM_URL = "http://192.168.1.100/cam-hi.jpg"
The ESP32-CAM hosts an image at this URL.
Ensure your ESP32-CAM and PC are on the same network.
Initialize EasyOCR
reader = easyocr.Reader(['en'], gpu=False)
EasyOCR is initialized with English ('en') as the recognition language.
gpu=False ensures it runs on the CPU (Set gpu=True if using a GPU for faster processing).
Function to Capture Image from ESP32-CAM
def capture_image():
""" Captures an image from the ESP32-CAM """
try:
start_time = time.time()
response = requests.get(ESP32_CAM_URL, timeout=2) # Reduced timeout for faster response
Sends an HTTP GET request to fetch an image.
timeout=2 → Ensures it doesn’t wait too long (prevents network lag).
if response.status_code == 200:
img_arr = np.frombuffer(response.content, np.uint8)
img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
print(f"[INFO] Image received in {time.time() - start_time:.2f} seconds")
return img
If HTTP response is successful (200 OK):
Convert raw binary data (response.content) into a NumPy array.
Use cv2.imdecode() to convert it into an OpenCV image.
Print how long the image retrieval took.
Return the image.
else:
print("[Error] Failed to get image from ESP32-CAM.")
return None
If the ESP32-CAM fails to respond, it prints an error message and returns None.
except Exception as e:
print(f"[Error] {e}")
return None
Handles connection errors (e.g., ESP32-CAM offline, network issues).
Start Text Recognition
print("[INFO] Starting text recognition...")
Logs a message when the program starts.
Main Loop: Capturing & Processing Images
while True:
frame = capture_image()
if frame is None:
continue # Skip this iteration if the image wasn't retrieved
Continuously fetch images from ESP32-CAM.
If None (failed to capture), skip processing and retry.
Resize & Convert the Image to Grayscale
# Resize image for faster processing
frame_resized = cv2.resize(frame, (400, 300))
# Convert to grayscale (better OCR accuracy)
gray = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2GRAY)
Resizing to (400, 300) → Speeds up OCR processing without losing clarity.
Converting to grayscale → Improves OCR accuracy.
Perform OCR (Text Recognition)
start_time = time.time()
results = reader.readtext(gray, detail=0, paragraph=True)
print(f"[INFO] OCR processed in {time.time() - start_time:.2f} seconds")
Calls reader.readtext(gray, detail=0, paragraph=True).
detail=0 → Returns only the recognized text.
paragraph=True → Groups words into sentences.
Logs how long OCR processing takes.
if results:
detected_text = " ".join(results)
print(f"[INFO] Recognized Text: {detected_text}")
If text is detected, print the recognized text.
Display the Image (Optional)
cv2.imshow("ESP32-CAM Feed", frame_resized)
Opens a real-time preview window of the ESP32-CAM feed.
# Press 'q' to exit the loop
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Press 'q' to exit the loop and stop the program.
Cleanup
cv2.destroyAllWindows()
Closes all OpenCV windows when the program exits.
Setting Up Python Environment
Install Dependencies:
Create a virtual environment:
python -m venv ocr_env
source ocr_env/bin/activate # Linux/Mac
ocr_env\Scripts\activate # Windows
Install required libraries:
pip install opencv-python numpy easyocr requests
After setting up the Python environment, run the Python code to capture images from the ESP32-CAM and perform text recognition using EasyOCR.
Let’s test the setup!
Run the Python code and place your camera in front of a text. The text will be detected.

Fig: Sample
You will see the text in the output window.

Fig: Detected text shown

fig: sample

fig: Detected text
Wrapping It Up
Congratulations! You've successfully built a real-time OCR system using ESP32-CAM and Python. With this setup, your ESP32-CAM captures images and streams them to your Python script, where OpenCV and EasyOCR extract text from the visuals. Whether you're automating data entry, reading license plates, or enhancing accessibility, this project lays the foundation for countless applications.
Now that you have it running, why not take it a step further? You could improve accuracy with better lighting, add pre-processing filters, or even integrate the results into a database or web dashboard. The possibilities are endless!

If you run into any issues or have ideas for improvements, feel free to experiment, tweak the code, and keep learning. Happy coding!