Enhancing TrOCR: Fine-Tuning for Curved Text Recognition

  Рет қаралды 5,527

LearnOpenCV

LearnOpenCV

10 ай бұрын

📚 Blog post Link: learnopencv.com/fine-tuning-t...
📚 Check out our FREE Courses at OpenCV University : opencv.org/university/free-co...
Welcome back to LearnOpenCV! In this session, we are delving into enhancing TrOCR, a top-tier, transformer-based OCR model, to adeptly recognize curved text, a feat it initially struggled with. Make sure to catch up on our last video for a detailed introduction to TrOCR and its functionality!
📌 In today’s tutorial, we are focusing on fine-tuning the TrOCR small model, utilizing text images from the wild, specifically tackling those from the SCUT-CTW1500 dataset. Our primary aim is to bolster the model's proficiency in recognizing curved and vertical images and push the boundaries of what TrOCR models can achieve.
🚀 We will walk through the entire process, from preparing and analyzing the curved text images dataset to running inference post-training, using the innovative Hugging Face Trainer API. We will assess the upgraded model's capability and analyze the results to understand the extent of improvement in recognizing curved text.
💡 Stay tuned as we embark on this journey to unravel the possibilities of fine-tuning TrOCR models and explore their enhanced applications in real-world scenarios, contributing to the advancements in the OCR domain!
Steps Covered:
✅ Prepare and analyze the curved text images dataset.
✅ Load the TrOCR Small Printed model from Hugging Face.
✅ Initialize the Hugging Face Sequence to Sequence Trainer API.
✅ Define the evaluation metric
✅ Train the model and run inference.
Resources:
🖥️ On our blog - learnopencv.com we also share tutorials and code on topics like Image
Processing, Image Classification, Object Detection, Face Detection, Face Recognition, YOLO, Segmentation, Pose Estimation, and many more using OpenCV(Python/C++), PyTorch, and TensorFlow.
🤖 Learn from the experts on AI: Computer Vision and AI Courses
YOU have an opportunity to join the over 5300+ (and counting) researchers, engineers, and students that have benefited from these courses and take your knowledge of computer vision, AI, and deep learning to the next level.🤖
opencv.org/university/
#️⃣ Connect with Us #️⃣
📝 Linkedin: / satyamallick
📱 Twitter: / learnopencv
🔊 Facebook: profile.php?...
📸 Instagram: / learnopencv
🔗 Reddit: / spmallick
🔖Hashtags🔖
#OpticalCharacterRecognition #Transformers #NeuralNetworks #HuggingFace #VisionTransformer #CurvedTextRecognition #LearnOpenCV #MachineLearning #ArtificialIntelligence #DeepLearning #AI #opencvuniversity #deeplearning #computervision #learnopencv #opencv

Пікірлер: 18
@user-vh1xh4jq2n
@user-vh1xh4jq2n 9 ай бұрын
I don't know why there are very low subscribers, but this is the best KZbin channel for Computer Vision I have ever seen, in my opinion.
@LearnOpenCV
@LearnOpenCV 9 ай бұрын
Thank you so much for your kind words! Please share our channel and help us grow. :D
@PhamDuc8504
@PhamDuc8504 4 ай бұрын
It's great, I hope you have more fine tuning model videos like this. I come from Vietnam.
@LearnOpenCV
@LearnOpenCV 4 ай бұрын
Glad you like it. We are working on such content. Please check our channel for updates.
@vbalaji4824
@vbalaji4824 9 ай бұрын
nice one bro
@WesleyAlcoforado
@WesleyAlcoforado 8 ай бұрын
That's amazing, thank you so much for sharing. What do you suggest for training on historical handwritten texts ? The handwriting changes from person to person and language to language, so it's hard to find a dataset that is ready to use. What tools can we use to extract the lines from scanned documents and transcribe a training set?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
I would recommend using the pretrained trocr-large-handwritten model for initial experiments. Its a large model and has been trained on a large corpus of handwritten text. Mostly, it should suit your use case.
@anirudhvenkhatesh4985
@anirudhvenkhatesh4985 3 ай бұрын
Im not getting v100 gpu on co lab anymore.Im getting out of memory error.Please post a detailed video to resolve errors.Also how to inference on custom images away from test data.
@LearnOpenCV
@LearnOpenCV 3 ай бұрын
Hi, I am still getting V100 GPU. Please check your colab subscription details.
@mohsenimani6652
@mohsenimani6652 7 ай бұрын
can we run this ocr on a single board computer like raspberry pi or Nvidia Jetson Nano?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
It is difficult. Even the smallest TrOCR model is slow on CPU.
@conexcol
@conexcol 5 ай бұрын
How do you use trOCR on images with multiple lines of text?
@LearnOpenCV
@LearnOpenCV 5 ай бұрын
TrOCR cannot directy work on multiple lines. It either works on a single word or on a sentence with a few words. To make it work on multiple lines, we first need to detect the words/sentences using a sentence detector and then use TrOCR for character recognition.
@user-sy8lw1xk3o
@user-sy8lw1xk3o 5 ай бұрын
Have anyone trained it on small text data?
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
this model is slow on cpu. can you provide a tutorial on paddle ocr training?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
You can refer to this article learnopencv.com/optical-character-recognition-using-paddleocr/. We don't train the model but we show how to use the pre-trained model
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
wont you make a cool video tutorial for us?@@LearnOpenCV plzzzz
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
i know to use the model. i want a tutorial to train it. help plz@@LearnOpenCV
Exploring TrOCR: Unleashing the Power of Transformer-Based OCR
6:42
Rip out Drug Labels using Deep Learning with PaddleOCR & Python
36:12
Nicholas Renotte
Рет қаралды 39 М.
Clown takes blame for missing candy 🍬🤣 #shorts
00:49
Yoeslan
Рет қаралды 46 МЛН
Spot The Fake Animal For $10,000
00:40
MrBeast
Рет қаралды 189 МЛН
Идеально повторил? Хотите вторую часть?
00:13
⚡️КАН АНДРЕЙ⚡️
Рет қаралды 5 МЛН
How to Fine Tune Llama 3 for Better Instruction Following?
8:55
Mervin Praison
Рет қаралды 29 М.
Water powered timers hidden in public restrooms
13:12
Steve Mould
Рет қаралды 683 М.
Microsoft Table Transformer HuggingFace Demo
13:52
Rithesh Sreenivasan
Рет қаралды 10 М.
331 - Fine-tune Segment Anything Model (SAM) using custom data
44:07
Optical Character Recognition (OCR) - Computerphile
14:16
Computerphile
Рет қаралды 189 М.
How to Preprocess Images for Text OCR in Python (OCR in Python Tutorials 02.02)
53:24
Python Tutorials for Digital Humanities
Рет қаралды 151 М.
Text detection with Python and Opencv | OCR using EasyOCR | Computer vision tutorial
15:39