Enhancing TrOCR: Fine-Tuning for Curved Text Recognition

  Рет қаралды 5,526

LearnOpenCV

LearnOpenCV

10 ай бұрын

📚 Blog post Link: learnopencv.com/fine-tuning-t...
📚 Check out our FREE Courses at OpenCV University : opencv.org/university/free-co...
Welcome back to LearnOpenCV! In this session, we are delving into enhancing TrOCR, a top-tier, transformer-based OCR model, to adeptly recognize curved text, a feat it initially struggled with. Make sure to catch up on our last video for a detailed introduction to TrOCR and its functionality!
📌 In today’s tutorial, we are focusing on fine-tuning the TrOCR small model, utilizing text images from the wild, specifically tackling those from the SCUT-CTW1500 dataset. Our primary aim is to bolster the model's proficiency in recognizing curved and vertical images and push the boundaries of what TrOCR models can achieve.
🚀 We will walk through the entire process, from preparing and analyzing the curved text images dataset to running inference post-training, using the innovative Hugging Face Trainer API. We will assess the upgraded model's capability and analyze the results to understand the extent of improvement in recognizing curved text.
💡 Stay tuned as we embark on this journey to unravel the possibilities of fine-tuning TrOCR models and explore their enhanced applications in real-world scenarios, contributing to the advancements in the OCR domain!
Steps Covered:
✅ Prepare and analyze the curved text images dataset.
✅ Load the TrOCR Small Printed model from Hugging Face.
✅ Initialize the Hugging Face Sequence to Sequence Trainer API.
✅ Define the evaluation metric
✅ Train the model and run inference.
Resources:
🖥️ On our blog - learnopencv.com we also share tutorials and code on topics like Image
Processing, Image Classification, Object Detection, Face Detection, Face Recognition, YOLO, Segmentation, Pose Estimation, and many more using OpenCV(Python/C++), PyTorch, and TensorFlow.
🤖 Learn from the experts on AI: Computer Vision and AI Courses
YOU have an opportunity to join the over 5300+ (and counting) researchers, engineers, and students that have benefited from these courses and take your knowledge of computer vision, AI, and deep learning to the next level.🤖
opencv.org/university/
#️⃣ Connect with Us #️⃣
📝 Linkedin: / satyamallick
📱 Twitter: / learnopencv
🔊 Facebook: profile.php?...
📸 Instagram: / learnopencv
🔗 Reddit: / spmallick
🔖Hashtags🔖
#OpticalCharacterRecognition #Transformers #NeuralNetworks #HuggingFace #VisionTransformer #CurvedTextRecognition #LearnOpenCV #MachineLearning #ArtificialIntelligence #DeepLearning #AI #opencvuniversity #deeplearning #computervision #learnopencv #opencv

Пікірлер: 18
@user-vh1xh4jq2n
@user-vh1xh4jq2n 9 ай бұрын
I don't know why there are very low subscribers, but this is the best KZbin channel for Computer Vision I have ever seen, in my opinion.
@LearnOpenCV
@LearnOpenCV 9 ай бұрын
Thank you so much for your kind words! Please share our channel and help us grow. :D
@PhamDuc8504
@PhamDuc8504 4 ай бұрын
It's great, I hope you have more fine tuning model videos like this. I come from Vietnam.
@LearnOpenCV
@LearnOpenCV 4 ай бұрын
Glad you like it. We are working on such content. Please check our channel for updates.
@vbalaji4824
@vbalaji4824 9 ай бұрын
nice one bro
@WesleyAlcoforado
@WesleyAlcoforado 8 ай бұрын
That's amazing, thank you so much for sharing. What do you suggest for training on historical handwritten texts ? The handwriting changes from person to person and language to language, so it's hard to find a dataset that is ready to use. What tools can we use to extract the lines from scanned documents and transcribe a training set?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
I would recommend using the pretrained trocr-large-handwritten model for initial experiments. Its a large model and has been trained on a large corpus of handwritten text. Mostly, it should suit your use case.
@anirudhvenkhatesh4985
@anirudhvenkhatesh4985 3 ай бұрын
Im not getting v100 gpu on co lab anymore.Im getting out of memory error.Please post a detailed video to resolve errors.Also how to inference on custom images away from test data.
@LearnOpenCV
@LearnOpenCV 3 ай бұрын
Hi, I am still getting V100 GPU. Please check your colab subscription details.
@mohsenimani6652
@mohsenimani6652 7 ай бұрын
can we run this ocr on a single board computer like raspberry pi or Nvidia Jetson Nano?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
It is difficult. Even the smallest TrOCR model is slow on CPU.
@conexcol
@conexcol 5 ай бұрын
How do you use trOCR on images with multiple lines of text?
@LearnOpenCV
@LearnOpenCV 5 ай бұрын
TrOCR cannot directy work on multiple lines. It either works on a single word or on a sentence with a few words. To make it work on multiple lines, we first need to detect the words/sentences using a sentence detector and then use TrOCR for character recognition.
@user-sy8lw1xk3o
@user-sy8lw1xk3o 5 ай бұрын
Have anyone trained it on small text data?
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
this model is slow on cpu. can you provide a tutorial on paddle ocr training?
@LearnOpenCV
@LearnOpenCV 7 ай бұрын
You can refer to this article learnopencv.com/optical-character-recognition-using-paddleocr/. We don't train the model but we show how to use the pre-trained model
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
wont you make a cool video tutorial for us?@@LearnOpenCV plzzzz
@user-fp9om5pz9m
@user-fp9om5pz9m 7 ай бұрын
i know to use the model. i want a tutorial to train it. help plz@@LearnOpenCV
Exploring TrOCR: Unleashing the Power of Transformer-Based OCR
6:42
331 - Fine-tune Segment Anything Model (SAM) using custom data
44:07
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 112 МЛН
Spot The Fake Animal For $10,000
00:40
MrBeast
Рет қаралды 188 МЛН
ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤
00:45
Chapitosiki
Рет қаралды 4,9 МЛН
Опасность фирменной зарядки Apple
00:57
SuperCrastan
Рет қаралды 11 МЛН
Violence Detection (LSTM Neutral Network)
0:38
Thieu Long
Рет қаралды 11 М.
Rip out Drug Labels using Deep Learning with PaddleOCR & Python
36:12
Nicholas Renotte
Рет қаралды 39 М.
Microsoft Table Transformer HuggingFace Demo
13:52
Rithesh Sreenivasan
Рет қаралды 10 М.
Step-by-Step Handwriting Recognition Tutorial Using TensorFlow
22:25
Python Lessons
Рет қаралды 56 М.
Fine-tuning LLMs with PEFT and LoRA
15:35
Sam Witteveen
Рет қаралды 120 М.
How to Preprocess Images for Text OCR in Python (OCR in Python Tutorials 02.02)
53:24
Python Tutorials for Digital Humanities
Рет қаралды 151 М.
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 112 МЛН