Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Рет қаралды 27,049

Күн бұрын

Пікірлер: 69

@NicolaiAI Жыл бұрын

Join My AI Career Program www.nicolai-nielsen.com/aicareer Enroll in My School and Technical Courses www.nicos-school.com

@theuser810 Жыл бұрын

The repository link is not in the description

@axelanderson2030 2 жыл бұрын

For anyone who is getting poor results: 1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another 2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely. 3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit. Hope this helps!

@kalifardiansyah5863 Жыл бұрын

have a question!. how to avoid miss detect of character? especially between two similiar character. example. letter Z detected 2, letter S detected 5, letter I detected 1, etc

@axelanderson2030 Жыл бұрын

@@kalifardiansyah5863 you may require more training data, or a larger CNN architecture

@HarshpreetSingh-jz2lf Жыл бұрын

I tried it with 60000 images, used morphological techniques but still doesn't provide accuracy, val_loss just doesn't go below 14

@axelanderson2030 Жыл бұрын

@@HarshpreetSingh-jz2lf do you have a class imbalance in the dataset? Is the model built correctly? Is the data preprocessed correctly? I can't help you if you don't provide any context except for "it no work"

@souhailel-ghayam4714 3 жыл бұрын

Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer. Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' layers.experimental.preprocessing.StringLookup''). I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text. I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .

@NicolaiAI 3 жыл бұрын

Thank you very much for watching! The code should not depend on anything and should be working every time, hmm 🤔

@nadyasudusinghe2213 2 жыл бұрын

Hi, I'm getting the same error. Did you find the solution?

@traderdaniel4749 2 жыл бұрын

Same here. I used only digits as labels therefore I removed "char_to_num" and "num_to_char"

@benoitd94 Жыл бұрын

Do you think I can use your code to decode the digits of my water counter?

@NicolaiAI Жыл бұрын

Maybe u Can try easyocr for that!

@megistone Ай бұрын

I've finally ended with this working configuration: images = sorted(map(str, list(data_dir.glob("*.png")))) labels = [img.split(path.sep)[-1].split(".png")[0] for img in images] vocab = sorted(set("".join(labels))) max_length = max(len(label) for label in labels) char_to_num = StringLookup(vocabulary=vocab, mask_token=None, num_oov_indices=0, oov_token="[UNK]") num_to_char = StringLookup(vocabulary=char_to_num.get_vocabulary(), invert=True, mask_token=None, num_oov_indices=0, oov_token="[UNK]") And rest of the code like in video.

@mortezarisan3261 Ай бұрын

Hello, do you have the captcha code for this clip, please send me?

@user-kw9cu 28 күн бұрын

Thank you

@megistone 24 күн бұрын

@@mortezarisan3261 if u mean model code, yes: train_model = build_train_model(vocab) train_model.summary() early_stopping = EarlyStopping(monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True, min_delta=1e-5) history = train_model.fit(train_dataset, validation_data=validation_dataset, epochs=epochs, callbacks=[early_stopping], verbose=1) prediction_model = get_prediction_model(train_model) compile_prediction_model(prediction_model) prediction_model.summary() ____ def decode_batch_predictions(pred, num_to_char): results = ctc_decode(pred, tf.ones(pred.shape[0]) * pred.shape[1], "greedy")[0][0][:, :] return [tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8").replace(num_to_char.oov_token, "") for res in results] def build_train_model(vocab: list) -> Model: input_img = Input(shape=(img_width, img_height, 1), name="image") labels = Input(name="label", shape=(None,), dtype="float32") x = Conv2D(32, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv1")(input_img) x = MaxPooling2D((2, 2), name="pool1")(x) x = Conv2D(64, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv2")(x) x = MaxPooling2D((2, 2), name="pool2")(x) new_shape = ((img_width // 4), (img_height // 4) * 64) x = Reshape(target_shape=new_shape, name="reshape")(x) x = Dense(64, activation="relu", name="dense1")(x) x = Dropout(.2)(x) x = Bidirectional(LSTM(128, return_sequences=True, dropout=.25))(x) x = Bidirectional(LSTM(64, return_sequences=True, dropout=.25))(x) x = Dense(len(vocab) + 1, activation="softmax", name="out2vec")(x) output = CTCLayer(name="ctc_loss")(labels, x) # Define the model model = Model(inputs=[input_img, labels], outputs=output, name="ocr_model") model.compile(Adam()) return model def get_prediction_model(train_model: Model) -> Model: return Model(inputs=train_model.get_layer(name="image").output, outputs=train_model.get_layer(name="out2vec").output) def compile_prediction_model(prediction_model: Model): prediction_model.compile(Adam())

@EnsignerTV 3 жыл бұрын

thanks a lot !

@NicolaiAI 3 жыл бұрын

Thanks for watching!

@ehsanroshan7068 2 жыл бұрын

Hi Nicolai, thanks for great explanation. Could you please explain how to measure accuracy?

@adepusairahul7375 10 ай бұрын

where is the repository link i am not able to find it in description

@hsnhsynglk 3 жыл бұрын

## Preprocessing # Mapping characters to integers char_to_num = layers.experimental.preprocessing.StringLookup( vocabulary=list(characters), mask_token=None ) # Mapping integers back to original characters num_to_char = layers.experimental.preprocessing.StringLookup( vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True

@badihaboulhosn8178 2 жыл бұрын

Thanks, thought i was the only one!

@syedmuzammilahmed6872 Жыл бұрын

Thanks Man

@UZMAALFATMI 10 ай бұрын

thanks so much!

@GuyJustCool 3 жыл бұрын

Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images

@HassanKhan-ei2wh Жыл бұрын

@syedmuzammilahmed6872 Жыл бұрын

@@HassanKhan-ei2wh Thanks Man

@syedmuzammilahmed6872 Жыл бұрын

@@HassanKhan-ei2wh When i add num_oov_indices = 0 parameter in stringLookup code then model training code work but it post labels on wrong images. So i removed num_oov_indices and now my model training code of earlystopping is not working. Any solution for this ?

@megistone Ай бұрын

@@syedmuzammilahmed6872 Just add num_oov_indices=0 to num_to_char also, it help me

@alexmoruz1993 2 жыл бұрын

Hi Nicolai, I was wondering would there be a way to feed in this kind of network wider images with text or have kind of dynamic input with size?

@omkarmestry4117 Жыл бұрын

I m trying to run this code but m getting error like InvalidArgumentError : graph execution error Anyone can help with this

@şulemeşe-z7w 9 ай бұрын

can i extract text from images by the way ? My final project is extract text from images but i can not coding . I need to help please .

@syedmuzammilahmed6872 Жыл бұрын

Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?

@coconutnut21 Жыл бұрын

Can I use this for model for license plates?

@user-kw9cu 2 жыл бұрын

can you provide library versions you used

@abhisekseal8044 2 жыл бұрын

Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺

@GODS_CODM 2 жыл бұрын

Have you found the answer to this?

@chelvanchelvam4332 3 жыл бұрын

can it suitable for text recognition task?

@NicolaiAI 3 жыл бұрын

Yes if u just train it on what u want to recognize

@chelvanchelvam4332 3 жыл бұрын

@@NicolaiAI Thank you I will try.

@tricialamjingyi 2 жыл бұрын

Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6] How should I change or how do I understand what I need to change?

@arslanmushtaq9774 2 жыл бұрын

Did you find the solution?

@kentoky6568 Жыл бұрын

Hello, in my case I tried changing the dataset for images with 4 characters and it was adapted to all 4, it would mean that you should make a model for each different length.

@aryangupta2051 Жыл бұрын

hey did you fix it?

@aryangupta2051 Жыл бұрын

@@arslanmushtaq9774 hey did you fix it?

@prathamshah5521 8 ай бұрын

Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do

@LucasDM4 3 ай бұрын

Fix the code / fix the labels

@alokthakur3298 9 ай бұрын

can anyone provide me wiyh the code

@Cordic45 2 жыл бұрын

Sir Why we can't use regular objects detection to detect the number ?

@Konnits Жыл бұрын

Hi! Im trying the code but i having an error while training : Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [5], [batch]: [6]. Anyone can help me to fix this?

@arpittalmale6468 Жыл бұрын

same bro

@lanasillomaster7034 10 ай бұрын

I was replicating this project with another dataset i made and got that error because I forgot a letter when labelling a file

@hendrywijaya1017 3 жыл бұрын

Excuse me bro, i have an issue when im running build_model() function after CTC Loss its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 73 74 # Panggil Functionnya buat bkin model ---> 75 model = build_model() 76 model.summary() in build_model() 41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa 42 new_shape = ((img_width // 4), (img_height // 4) * 64) ---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x) 44 x = layers.Dense(64, activation='relu', name='dense1')(x) 45 x = layers.Dropout(0.2)(x) /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs) 975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list): 976 return self._functional_construction_call(inputs, args, kwargs, --> 977 input_list) 978 979 # Maintains info about the `Layer.call` stack. /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list) 1113 # Check input assumptions set after layer building, e.g. input shape. 1114 outputs = self._keras_tensor_symbolic_call( -> 1115 inputs, input_masks, args, kwargs) 1116 1117 if outputs is None: /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs) 846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature) 847 else: --> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks) 849 850 def _infer_output_signature(self, inputs, args, kwargs, input_masks): /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks) 886 self._maybe_build(inputs) 887 inputs = self._maybe_cast_inputs(inputs) --> 888 outputs = call_fn(inputs, *args, **kwargs) 889 890 self._handle_activity_regularization(inputs, outputs) /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in call(self, inputs) 537 # Set the static shape for the result since it might lost during array_ops 538 # reshape, eg, some `None` dim in the result could be inferred. --> 539 result.set_shape(self.compute_output_shape(inputs.shape)) 540 return result 541 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in compute_output_shape(self, input_shape) 528 output_shape = [input_shape[0]] 529 output_shape += self._fix_unknown_dimension(input_shape[1:], --> 530 self.target_shape) 531 return tf.TensorShape(output_shape) 532 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape) 516 output_shape[unknown] = original // known 517 elif original != known: --> 518 raise ValueError(msg) 519 return output_shape 520 --------------------------------------------------------------------------- and this the error message ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]

@bbtvines 3 жыл бұрын

how to impliment it???You just read all docs

@NicolaiAI 3 жыл бұрын

Hi, 80% of the video is implementation

@creatur 3 жыл бұрын

@@NicolaiAI I am having a single captcha and I trained my modes. So how can I solve that captcha?

@NicolaiAI 3 жыл бұрын

What do u mean by single captcha? In the video they are passed through the model one by one too

@creatur 3 жыл бұрын

@@NicolaiAI 😔😔😔I am noob with tf. I wanted to make a api which gets captcha by base6 4 and solves captcha and send back the captcha response

@GODS_CODM 2 жыл бұрын

@@NicolaiAI i want to input a single CAPTCHA and I want the model to predict it

@ZainAbdin-e7s Жыл бұрын

How to crack 6 digits and characters captcha

@aryangupta2051 Жыл бұрын

hey did you get a method?

@traderdaniel4749 2 жыл бұрын

Anyone else has the same error?: File "C:\Users\user\PycharmProjects\ocr_gas\ocr.py", line 135, in call * label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") ValueError: slice index 1 of dimension 0 out of bounds. for '{{node ocr_model_v1/ctc_loss/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](ocr_model_v1/ctc_loss/Shape_2, ocr_model_v1/ctc_loss/strided_slice_2/stack, ocr_model_v1/ctc_loss/strided_slice_2/stack_1, ocr_model_v1/ctc_loss/strided_slice_2/stack_2)' with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = , input[2] = , input[3] = . Call arguments received by layer "ctc_loss" " f"(type CTCLayer): • y_true=tf.Tensor(shape=(None,), dtype=float32) • y_pred=tf.Tensor(shape=(None, 50, 12), dtype=float32)