Enhancing TrOCR: Fine-Tuning for Curved Text Recognition

Рет қаралды 5,527

10 ай бұрын

📚 Blog post Link: learnopencv.com/fine-tuning-t...
📚 Check out our FREE Courses at OpenCV University : opencv.org/university/free-co...
Welcome back to LearnOpenCV! In this session, we are delving into enhancing TrOCR, a top-tier, transformer-based OCR model, to adeptly recognize curved text, a feat it initially struggled with. Make sure to catch up on our last video for a detailed introduction to TrOCR and its functionality!
📌 In today’s tutorial, we are focusing on fine-tuning the TrOCR small model, utilizing text images from the wild, specifically tackling those from the SCUT-CTW1500 dataset. Our primary aim is to bolster the model's proficiency in recognizing curved and vertical images and push the boundaries of what TrOCR models can achieve.
🚀 We will walk through the entire process, from preparing and analyzing the curved text images dataset to running inference post-training, using the innovative Hugging Face Trainer API. We will assess the upgraded model's capability and analyze the results to understand the extent of improvement in recognizing curved text.
💡 Stay tuned as we embark on this journey to unravel the possibilities of fine-tuning TrOCR models and explore their enhanced applications in real-world scenarios, contributing to the advancements in the OCR domain!
Steps Covered:
✅ Prepare and analyze the curved text images dataset.
✅ Load the TrOCR Small Printed model from Hugging Face.
✅ Initialize the Hugging Face Sequence to Sequence Trainer API.
✅ Define the evaluation metric
✅ Train the model and run inference.
Resources:
🖥️ On our blog - learnopencv.com we also share tutorials and code on topics like Image
Processing, Image Classification, Object Detection, Face Detection, Face Recognition, YOLO, Segmentation, Pose Estimation, and many more using OpenCV(Python/C++), PyTorch, and TensorFlow.
🤖 Learn from the experts on AI: Computer Vision and AI Courses
YOU have an opportunity to join the over 5300+ (and counting) researchers, engineers, and students that have benefited from these courses and take your knowledge of computer vision, AI, and deep learning to the next level.🤖
opencv.org/university/
#️⃣ Connect with Us #️⃣
📝 Linkedin: / satyamallick
📱 Twitter: / learnopencv
🔊 Facebook: profile.php?...
📸 Instagram: / learnopencv
🔗 Reddit: / spmallick
🔖Hashtags🔖
#OpticalCharacterRecognition #Transformers #NeuralNetworks #HuggingFace #VisionTransformer #CurvedTextRecognition #LearnOpenCV #MachineLearning #ArtificialIntelligence #DeepLearning #AI #opencvuniversity #deeplearning #computervision #learnopencv #opencv

Пікірлер: 18

@user-vh1xh4jq2n 9 ай бұрын

I don't know why there are very low subscribers, but this is the best KZbin channel for Computer Vision I have ever seen, in my opinion.

@LearnOpenCV 9 ай бұрын

Thank you so much for your kind words! Please share our channel and help us grow. :D

@PhamDuc8504 4 ай бұрын

It's great, I hope you have more fine tuning model videos like this. I come from Vietnam.

@LearnOpenCV 4 ай бұрын

Glad you like it. We are working on such content. Please check our channel for updates.

@vbalaji4824 9 ай бұрын

nice one bro

@WesleyAlcoforado 8 ай бұрын

That's amazing, thank you so much for sharing. What do you suggest for training on historical handwritten texts ? The handwriting changes from person to person and language to language, so it's hard to find a dataset that is ready to use. What tools can we use to extract the lines from scanned documents and transcribe a training set?

@LearnOpenCV 7 ай бұрын

I would recommend using the pretrained trocr-large-handwritten model for initial experiments. Its a large model and has been trained on a large corpus of handwritten text. Mostly, it should suit your use case.

@anirudhvenkhatesh4985 3 ай бұрын

Im not getting v100 gpu on co lab anymore.Im getting out of memory error.Please post a detailed video to resolve errors.Also how to inference on custom images away from test data.

@LearnOpenCV 3 ай бұрын

Hi, I am still getting V100 GPU. Please check your colab subscription details.

@mohsenimani6652 7 ай бұрын

can we run this ocr on a single board computer like raspberry pi or Nvidia Jetson Nano?

@LearnOpenCV 7 ай бұрын

It is difficult. Even the smallest TrOCR model is slow on CPU.

@conexcol 5 ай бұрын

How do you use trOCR on images with multiple lines of text?

@LearnOpenCV 5 ай бұрын

TrOCR cannot directy work on multiple lines. It either works on a single word or on a sentence with a few words. To make it work on multiple lines, we first need to detect the words/sentences using a sentence detector and then use TrOCR for character recognition.

@user-sy8lw1xk3o 5 ай бұрын

Have anyone trained it on small text data?

@user-fp9om5pz9m 7 ай бұрын

this model is slow on cpu. can you provide a tutorial on paddle ocr training?

@LearnOpenCV 7 ай бұрын

You can refer to this article learnopencv.com/optical-character-recognition-using-paddleocr/. We don't train the model but we show how to use the pre-trained model

@user-fp9om5pz9m 7 ай бұрын

wont you make a cool video tutorial for us?@@LearnOpenCV plzzzz

@user-fp9om5pz9m 7 ай бұрын

i know to use the model. i want a tutorial to train it. help plz@@LearnOpenCV