Image Caption Generator using Flickr Dataset | Deep Learning | Python

  Рет қаралды 88,411

Hackers Realm

Hackers Realm

Күн бұрын

⭐️ Content Description ⭐️
In this video, I have explained on how to develop a image caption generator using flickr dataset in python. The project uses keras & tensorflow framework for the implementation. It uses both image features and text features in the project for building the model. This will give a better understanding of how we can leverage the model architecture of different domains for a specific application.
Text-based Tutorial: www.hackersrealm.net/post/ima...
GitHub Code Repo: bit.ly/dlcoderepo
Dataset link: www.kaggle.com/adityajn105/fl...
🌐 Website: www.hackersrealm.net
🔔 Subscribe: bit.ly/hackersrealm
🗓️ 1:1 Consultation with Me: calendly.com/hackersrealm/con...
📷 Instagram: / aswintechguy
🔣 Linkedin: / aswintechguy
🎯 GitHub: github.com/aswintechguy
🎬 Share: • Image Caption Generato...
⚡️ Data Structures & Algorithms tutorial playlist: bit.ly/dsatutorial
😎 Hackerrank problem solving solutions playlist: bit.ly/hackerrankplaylist
🤖 ML projects tutorial playlist: bit.ly/mlprojectsplaylist
🐍 Python tutorial playlist: bit.ly/python3playlist
💻 Machine learning concepts playlist: bit.ly/mlconcepts
✍🏼 NLP concepts playlist: bit.ly/nlpconcepts
🕸️ Web scraping tutorial playlist: bit.ly/webscrapingplaylist
Make a small donation to support the channel 🙏🙏🙏:-
🆙 UPI ID: hackersrealm@apl
💲 PayPal: paypal.me/hackersrealm
🕒 Timeline
00:00 Introduction to Image Caption Generator
01:00 Import Modules
07:26 Extract Image Features using VGG16
18:38 Load Captions Data
26:43 Preprocess the Caption Data
38:15 Train Test Split
40:28 Create Data Generator Function
53:34 Model Creation - CNN LSTM
01:06:10 Generate Captions for Images
01:19:46 Visualize the Results of Image Caption
01:27:36 Improving the Results
#imagecaptioning #deeplearning #hackersrealm #imagecaptiongenerator #flickr #imagecaption #machinelearning #datascience #model #project #artificialintelligence #beginner #analysis #python #tutorial #aswin #ai #dataanalytics #data #bigdata #programming #datascientist #technology #coding #datavisualization #computerscience #pythonprogramming #analytics #tech #dataanalysis #iot #programmer #statistics #developer #ml #business #innovation #coder #dataanalyst

Пікірлер: 487
@HackersRealm
@HackersRealm Жыл бұрын
Hey Hackers, I have updated the code to test the model with real image. You can find the latest code in my website or GitHub. For users getting the following error: `output_signature` must contain objects that are subclass of `tf.TypeSpec` Please update the code snippets in data_generator and model creation like I updated in my website. It's working with latest version of tensorflow as well without issues. Happy Learning!!!
@petenallan24
@petenallan24 Жыл бұрын
Sir what will be the base dir and work dir if working in jupyter notebook
@HackersRealm
@HackersRealm Жыл бұрын
@@petenallan24 you can change to your dataset directory and some new folder as working directory!!!
@rohith646
@rohith646 Жыл бұрын
@@HackersRealm sir i have uploaded dataset and captions file in my google drive and started doing in google colab now what i have to keep my base dir and working dir??
@HackersRealm
@HackersRealm Жыл бұрын
@@rohith646 the base dir will be the dataset folder... Try to check if that works or change the code accordingly
@beatx2173
@beatx2173 8 ай бұрын
thanks
@plabmadeeasy
@plabmadeeasy Жыл бұрын
Beautiful explanation! Thanks for this!
@HackersRealm
@HackersRealm Жыл бұрын
Glad you like it!!!😃
@wildshore8580
@wildshore8580 2 жыл бұрын
Loved the implementation and the explanation. Could you please do an end to end chatbot implementation like this, using cornell movie dataset?
@HackersRealm
@HackersRealm 2 жыл бұрын
chatbot application is already done for generic messages, check the python projects playlist
@prodevmahi4901
@prodevmahi4901 Жыл бұрын
Kaggle in "Accelerator" tab now provides even TPU, out of 4 options shown in the drop down, which to choose?
@JannatulFerdous-ew5ko
@JannatulFerdous-ew5ko 2 жыл бұрын
Appreciated your project details. It took me almost 3 weeks to reproduce similar results.
@HackersRealm
@HackersRealm 2 жыл бұрын
Glad it helped you!!!
@keerthyk2284
@keerthyk2284 5 ай бұрын
My final year project also this topic
@amineakkati7501
@amineakkati7501 3 ай бұрын
can I please contact you i need some ideas about the project
@riyanagar2619
@riyanagar2619 9 ай бұрын
Thank you for great explanation. I have a question, what's the accuracy of this project?
@mayur7452
@mayur7452 6 ай бұрын
Hello sir. I am doing this project but using EfficientNetV2B0 and GRU. But my bleu1 score is not getting more than 0.22. What needs to be changed? Is it possible to get bleu1 score more than 0.5? also, how can we load this model so that retraining is not required and how to implement it in the GUI
@mohamedsahli9935
@mohamedsahli9935 10 ай бұрын
thank u sir best explained IC video so far
@HackersRealm
@HackersRealm 10 ай бұрын
Glad to hear that!!!
@MsBothyna
@MsBothyna 5 ай бұрын
By the way, I forgot to thank you for all this excellent explanation. You, sir, are a truly great person. I am very grateful to you.
@HackersRealm
@HackersRealm 5 ай бұрын
Thanks for your kind words!! Happy to help!!!
@MsBothyna
@MsBothyna 5 ай бұрын
Yes, indeed your explanation and video have helped me understand machine learning and models much better than my professor's explanation. 😅😅 However, I have a simple question. When I try to use the part for "Test with Real Image", I get an incorrect prediction result. Could you please explain to me what I should do? Keep in mind that all the results in the code are correct and all the steps match exactly as in your explanation.@@HackersRealm
@HackersRealm
@HackersRealm 5 ай бұрын
@@MsBothynaCurrently we are using a smaller dataset, if you train with flikr 32k dataset, you might see better results.
@tazarhussain22
@tazarhussain22 2 жыл бұрын
Thank you for the nice implementation. I have a question, can i use the same approach to generate text from numbers (like tabular data) instead of image features?
@HackersRealm
@HackersRealm 2 жыл бұрын
Yes, It may possible but you have to properly adjust the layers and features accordingly
@hariom6910
@hariom6910 2 жыл бұрын
Bro,Have you got the output of the code
@HackersRealm
@HackersRealm 2 жыл бұрын
@@hariom6910 you can see at the end of the video
@sreelakshminarayanan.m6609
@sreelakshminarayanan.m6609 4 ай бұрын
Thanks for the wonderful video , code and explanation
@HackersRealm
@HackersRealm 4 ай бұрын
glad you liked it!!!
@trangle1506
@trangle1506 5 ай бұрын
Well explained. Thank you so much bro
@HackersRealm
@HackersRealm 5 ай бұрын
Glad you like it!!!
@SaiKumar-mf3pw
@SaiKumar-mf3pw Жыл бұрын
Can we use jupyter notebook for this project
@rappaivo5779
@rappaivo5779 2 жыл бұрын
May I know why the 'return_sequence' and 'return_state' of LSTM set as False (default) in a text prediction network?
@akshayhasabe8766
@akshayhasabe8766 4 ай бұрын
Bcz there is only 1 lstm layer.... We don't need output of every time step to pass to next layer here.. if u are stacking multiple lstm or gru you will need output from every time steps
@sreelakshmi7932
@sreelakshmi7932 5 ай бұрын
Helle Sir when i try to Extract image features it shows gaierrors , url errors, exception in model=VGG16 etc How Can i fix it? Plz help me..
@manishakumari4501
@manishakumari4501 5 ай бұрын
I really liked this video, great!!!
@HackersRealm
@HackersRealm 5 ай бұрын
Glad you liked it!!!
@tanviladdha4120
@tanviladdha4120 2 жыл бұрын
this is the best video and so perfectly explained. sir can you please make a video on video captioning using MSVD dataset. thankyou 👍🏼
@HackersRealm
@HackersRealm 2 жыл бұрын
Planning to do it as upcoming project, will do. Glad you liked this video!!!
@tanviladdha4120
@tanviladdha4120 2 жыл бұрын
@@HackersRealm thats great! 😊 will be waiting for it and hoping to see it soon
@subhayanbhattacharya2674
@subhayanbhattacharya2674 2 жыл бұрын
Wonderful video. Very insightful. Can you please mention what version of tensorflow and keras you are working with. Thanks in advance!
@HackersRealm
@HackersRealm 2 жыл бұрын
I am using the modules available in kaggle, you can check the version of the modules there itself
@hamnamatloob8231
@hamnamatloob8231 2 жыл бұрын
when i load the VGG16 model it pass an error
@user-ho6bv7kr9b
@user-ho6bv7kr9b 4 ай бұрын
In the original extract features from image step, I followed your steps and displayed 'Error displaying widget: model not found'. How to solve it? I've been looking for a solution for a long time, but there's no solution.
@soumyasingh8500
@soumyasingh8500 9 ай бұрын
is the predicted output, not wrong in every case?
@free-Palestine11
@free-Palestine11 Жыл бұрын
Thank you for this video! Had a question. How can I pickle the implemented model to use it in some app. I am having trouble getting models out in .h5 or pkl formats in general. Can anyone help with that?
@HackersRealm
@HackersRealm Жыл бұрын
Usually we store in the model in h5 format and it works well without any issues while reloading!!! What error you're facing in this?
@percyjackson583
@percyjackson583 4 ай бұрын
The predicted caption is empty, only startseq and endseq is there, I am too trying to resolve, any suggestion whIch part should I check?
@user-sz7ip3tn9r
@user-sz7ip3tn9r 4 ай бұрын
I am getting ZeroDivision error when finding the BLEU score, can you please help me what to do?
@akulasaimanasa3344
@akulasaimanasa3344 Жыл бұрын
I am getting gaierror when running the cell consisting of creating a model.Could u plz help me
@manojshendre-np9nw
@manojshendre-np9nw Ай бұрын
can anyone please reply What if i also have the bounding boxes for each image along with the captions. how should i map it??
@aditya95775
@aditya95775 5 ай бұрын
could you please create a video captioning model using MSR VTT dataset it will be very helpful for my major project which is due in2 weeks thank you sir
@meriemsabour2830
@meriemsabour2830 Жыл бұрын
When i try model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1) i have this error: KeyError: '1000268201_693b08cb0e' & when i do len(features) i find 80 While len(image_names) = 8101, whyy it did not process all the images ??
@shubhamjaware7718
@shubhamjaware7718 3 ай бұрын
have you solved this error?
@akulasaimanasa3344
@akulasaimanasa3344 Жыл бұрын
Does BASE_DIR consists of only images or the folder consisting images and captions? And what does working_dir holds?Is that an empty folder?
@HackersRealm
@HackersRealm Жыл бұрын
we will store extracted features there in working directory
@pankajghaywat
@pankajghaywat Жыл бұрын
Hello, what changes I need to do if I want to implement video captioning i.e. generating captions for short video clips?
@HackersRealm
@HackersRealm Жыл бұрын
The whole structure has to be changed... from features to the model. It will be a big task for sure
@karimbaig8573
@karimbaig8573 Жыл бұрын
How to do this for dense captioning task ?
@user-jo7pq2ti7r
@user-jo7pq2ti7r 2 жыл бұрын
The video was great,so much love. Can you tell me how can I apply the same code for Bengali caption generation? where will be the changes?
@HackersRealm
@HackersRealm 2 жыл бұрын
If you have the dataset similar to this, you can proceed with the same workflow
@Mehedi.25
@Mehedi.25 Жыл бұрын
vai apni ki ai project niye r kaj korecen?
@rishabhvyas4969
@rishabhvyas4969 Жыл бұрын
why you didn't use keras imagedatagenerator to extract the features from the model. It create whole image preprocessing pipeline so you don't have to do it manually. Btw great tutorial!
@HackersRealm
@HackersRealm Жыл бұрын
It will extract the features step again for rerun. By extracting separately and storing helps to avoid the rerun from scratch.
@charltondsouza9140
@charltondsouza9140 Жыл бұрын
Hi, DO you have an attention mechanism code applied to the same code. I am not quite sure about how to go about it. If not can you please explain briefly how it can be done
@HackersRealm
@HackersRealm Жыл бұрын
You just have to add corresponding layers to the text model here, flow remains the same
@litalshytrit1490
@litalshytrit1490 2 жыл бұрын
I have a few questions please. How did you choose the hyperparameters of the model? Why is the decoder after the encoder? Why use a dropout layer for the images if they're already gone through VGG16? How can I add a validation_data in the fit function? It shows a compatibility error. Thanks!
@HackersRealm
@HackersRealm 2 жыл бұрын
You can change the model parameters or layers for experimentation too, but make sure the sequence of flow does not break
@daniasalameh8579
@daniasalameh8579 11 ай бұрын
Really appreciate your videos ! I want to ask you what if we want the system to answer the user's query about the text file ID , and then the system generate the picture file that represents ID. How can we change the code?
@HackersRealm
@HackersRealm 11 ай бұрын
I didn't get the full context here, could you type it fully?
@rakshitashetty7461
@rakshitashetty7461 2 жыл бұрын
When I do it in colab hw do i set the working directory ....i understood the path for base directory but I'm unable to do it for working directory... please help
@HackersRealm
@HackersRealm 2 жыл бұрын
For colab, you can mount the drive and give the dataset path directly to use it
@mytv2362
@mytv2362 Жыл бұрын
I made the gui for this thanks for the code btw
@HackersRealm
@HackersRealm Жыл бұрын
That's great
@winwiths.g6155
@winwiths.g6155 10 ай бұрын
Hey greatly explained can you please tell me that if how can i reduce the model complexity to run it in raspberry pi4 and can you explain how do i run this image captioning through my webcam
@biancaar8032
@biancaar8032 8 ай бұрын
Yeah you can run it in rpi by converting this model.hdf5 file into tflite file ... For doing it with webcam,u have to capture each frame and pass it as input to the model using cv2
@bhushanambhore8378
@bhushanambhore8378 Жыл бұрын
hi, I wanted to ask do we need to train 1:04:00 model here every time after opening kaggle. Isnt there any other way to save this?
@HackersRealm
@HackersRealm Жыл бұрын
You can save the model using model.save method
@koushikguptabonthala2429
@koushikguptabonthala2429 2 жыл бұрын
If the BLEU is above 0.5 then what is accuracy in percentage. Can you please tell that
@HackersRealm
@HackersRealm 2 жыл бұрын
accuracy is not a meaningful metric for this problem
@syedhussainshah3766
@syedhussainshah3766 2 жыл бұрын
hello sir as image captioning has been done previously as my project is on video captioning can u plz make a video or guide with the same procedure but for video captioning
@HackersRealm
@HackersRealm 2 жыл бұрын
Sure I will add that to the list
@syedhussainshah3766
@syedhussainshah3766 2 жыл бұрын
@@HackersRealm sir please can u make it quick as i have less time remaining and im really worried about my project sir it would be really great and i would be really thankful sir
@muhammedmehdi8893
@muhammedmehdi8893 6 ай бұрын
I have a question why we are using both image features and sequences from captions, we can just image features for converting into captions, after vg16 we can use bi-lstm and get our output.
@HackersRealm
@HackersRealm 6 ай бұрын
Could you explain the last few lines in detail
@mahidiwijayantha2722
@mahidiwijayantha2722 Жыл бұрын
Thank you for the nice explanation. I have few questions. 1. Can we use this flow with larger dataset? 2. Can we use this flow for an image caption generator of fashion product images?
@HackersRealm
@HackersRealm Жыл бұрын
yes you can use the same!!!
@mahidiwijayantha2722
@mahidiwijayantha2722 Жыл бұрын
@@HackersRealm Thank you for your response. I've two more questions. 1. Can we use this flow for generating a caption for a new image which is not in the training dataset? 2. I want to create an image caption generator for fashion products. I created a dataset with images and captions for training. Can I use this flow to generate captions by extracting features (attributes and categories) of the fashion products?
@HackersRealm
@HackersRealm Жыл бұрын
@@mahidiwijayantha2722 yes, it's possible for both scenarios
@tanviladdha4120
@tanviladdha4120 Жыл бұрын
FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/flickr8k/Images'. - i m getting this error though i have the dataset folder and the project file at the same place. i m trying this in jupyter notebook. can you please whats wrong i m doing ?
@HackersRealm
@HackersRealm Жыл бұрын
seems correct only, check with different folder structure, if you're using local machine
@dotnet8925
@dotnet8925 Жыл бұрын
how did you add kaggle data to jupyter notebook. Which version notebooks is this ?
@HackersRealm
@HackersRealm Жыл бұрын
If you go to the dataset and click new notebook in kaggle. It will automatically add the dataset to that notebook
@aniketpatra4474
@aniketpatra4474 7 ай бұрын
Hi thanks a lot for this awesome tutorial. Can you please make a tutorial on how to deploy this model on cloud eg AWS?
@HackersRealm
@HackersRealm 7 ай бұрын
I have already made a local deployment for basic ml model... I will try to make a video for cloud deployment soon
@user-fl6ry7le1f
@user-fl6ry7le1f 7 ай бұрын
Can you provide me the link of the research paper ?
@YCSKeerthana
@YCSKeerthana 2 жыл бұрын
hello sir i'm getting zero division error while calculating the corpus bleu scores. How to correct it ?
@HackersRealm
@HackersRealm 2 жыл бұрын
it may happen if the actual or predicted caption is empty, check it
@sudeshnakundu3909
@sudeshnakundu3909 Ай бұрын
Thanks for this video, explained well! Can the model predict on monuments and historical structures? I mean can the model predict on totally unseen data and can you please make a video of how to put entity awareness on top of it
@HackersRealm
@HackersRealm Ай бұрын
yes, but you have to train with more data for better results, I have used smaller dataset for the demo
@I.II..III...IIIII.....
@I.II..III...IIIII..... Жыл бұрын
Hello. I've followed your video and I tried to train a model on flickr30k. My problem is that the captions that I generate are repetitive. What I mean by that is that whatever is in the image, my captions are always something like: "A man in a black shirt is walking down the street". How can I make the model more diverse?
@HackersRealm
@HackersRealm Жыл бұрын
Is this showing for any image you try? But that shouldn't happen as there should be slight difference in output even the input is changed
@HackersRealm
@HackersRealm 9 ай бұрын
@@user-px8qq6on1p it's very unlikely happen if you follow the same steps, as you can see in the video... it's generating different results for each image... we need to find out where it's going wrong as there are so many moving parts
@noorulameenasm
@noorulameenasm Жыл бұрын
when i run the epoch it shows the value error
@bhaskarmondal7461
@bhaskarmondal7461 4 ай бұрын
Hey does anybody here know. How I might be able to turn this project into an Streamlit App ?
@MarehAboGhanem
@MarehAboGhanem 3 ай бұрын
Hello! I tried with 70 epochs and the result doesn’t improve than 52 BELU score and I want to try hyper parameter using grid search but it’s not work without “y-train”, could you tell me How to get the y and how to apply this technique?!
@Esraaalsaede
@Esraaalsaede 3 ай бұрын
me too have same issues
@harshith24
@harshith24 9 ай бұрын
It's a wonderful project and I could easily get the output by following your instructions , but after completing everything , if I try predicting the output for a new image , the output is not relevant , how can I correct this , It would be very helpful if you could help us do this . Thank you
@HackersRealm
@HackersRealm 8 ай бұрын
You could use flickr 32k dataset which has much variety so that new image can work very well
@Waliul_The_Wall-E
@Waliul_The_Wall-E 9 ай бұрын
Thanks for the implementation. But I have a question and that is, what is the LSTM layer doing (1:00:12)? What's the use of this layer? All the papers use the LSTM for the word generation but you're not using the LSTM layer for word generation, you are using a Dense layer for word generation. Then why are you using the LSTM layer? And also, how is the Embedding layer learning here? TIA.
@HackersRealm
@HackersRealm 9 ай бұрын
All the mentioned layers are used for the lstm model to generate a new word at a time
@pratyushpandey6139
@pratyushpandey6139 Жыл бұрын
nice
@NareshBalla7
@NareshBalla7 7 ай бұрын
Hi, Thanks for the tutorial. I used your code without modifications and it is generating the same caption for every image. "startseq two people are sitting on the street endseq" I didn't change anything in the code. Imported the dataset and using kaggle. What should I change for the model to predict correctly?
@HackersRealm
@HackersRealm 7 ай бұрын
is this occurring for all the images? which i tested in the video?
@prajaktadhamanskar21
@prajaktadhamanskar21 7 ай бұрын
getting error at this line " yhat = model.predict([image, sequence], verbose=0)" ValueError: Layer "model" expects 1 input(s), but it received 2 input tensors. Inputs received: [, ]
@HackersRealm
@HackersRealm 7 ай бұрын
Have you used the same notebook to train and test?
@keerthanarajendran5791
@keerthanarajendran5791 Ай бұрын
How to fix "gaierror" at extracting image features? Please help.
@IfrahRaoofdcs
@IfrahRaoofdcs 2 жыл бұрын
Hello! Thanks for your video. I am trying your code but while extracting features i am getting this error "cannot identify image file ". Can you please help me in fixing this! please
@HackersRealm
@HackersRealm 2 жыл бұрын
I think image may be corrupted, try removing the image which is corrupted and do the process again
@Manojkumar-vh4tc
@Manojkumar-vh4tc 2 жыл бұрын
Here you have used a 8K dataset along with the captions, but if I give a new image why the model is not working, if it should work for any image how the approach should be ? can you give a flow of approach
@HackersRealm
@HackersRealm 2 жыл бұрын
If you want to test with a new image, you can try the same flow with flickr32k dataset, that will improve your results
@Manojkumar-vh4tc
@Manojkumar-vh4tc 2 жыл бұрын
@@HackersRealm Thanks and epoch, batch size should be higher with GPU ?
@HackersRealm
@HackersRealm 2 жыл бұрын
@@Manojkumar-vh4tc For bigger networks, 16 or 32 is the optimal number
@mdmujeeb3670
@mdmujeeb3670 Жыл бұрын
i am getting url fetch failure while loading the VGG16 model...please tell me what to do
@HackersRealm
@HackersRealm Жыл бұрын
please enable internet in the settings of kaggle. It's in right pane of the notebook
@harshith24
@harshith24 4 ай бұрын
can you please give the versions of the packages u installed , because I am trying to make a user interface using streamlit in pycharm and the versions should match
@HackersRealm
@HackersRealm 4 ай бұрын
Sorry, I didn't note the packages for this.
@patilsanket644
@patilsanket644 Жыл бұрын
Hey, ModuleNotFoundError: No module named 'tensorflow.security' Getting this error while importing , i have installed tensorflow , please show me a way!!
@HackersRealm
@HackersRealm Жыл бұрын
If you're running locally. You can uninstall and reinstall the module or create a new environment and install the module!!!!
@andrzeypl
@andrzeypl Жыл бұрын
Hi, When I try to run the testing, I get a KeyError. I'm running this with my own dataset, any ideas? Thanks
@HackersRealm
@HackersRealm Жыл бұрын
Please check whether the dataset in the same format as in the video. If it's different, please make the code changes accordingly
@taruntammana7960
@taruntammana7960 Жыл бұрын
sir at 17th cell the output coming only start and end the caption doesn't coming in between.please tell how to solve(after clean(mapping))
@HackersRealm
@HackersRealm Жыл бұрын
Are you using the same notebook and the dataset?
@pradnyeshdoshi348
@pradnyeshdoshi348 2 жыл бұрын
Thanks for wonderful implementation 😊. I run it successfully but Can you tell me how context.txt fill is created because I saw that our input image should be in particular format and we get correct results only for 8000 images. Is it possible for other images? and I think it's not extracting text from image, it's extracting from context.text file. If I am wrong then please correct me. Thank you 😊
@HackersRealm
@HackersRealm 2 жыл бұрын
I can able to predict for new images that are not in the dataset as well; for better prediction use flickr32k dataset and use it
@pradnyeshdoshi348
@pradnyeshdoshi348 2 жыл бұрын
@@HackersRealm Thanks for replied. Can you tell me how you get new input image. Images need proper name. How you set that name?
@pradnyeshdoshi348
@pradnyeshdoshi348 2 жыл бұрын
@@HackersRealm can you make short video ? Then everyone get idea about it. We are not looking for accuracy. We just excited to know how image processing done by CNN. Amd i don't have enough resources to train model with 30000 images. 8000 images is sufficient for me 😅
@HackersRealm
@HackersRealm 2 жыл бұрын
@@pradnyeshdoshi348 Then you can try to predict with new image and check the results, the process is same for the prediction
@mr.anonymous8410
@mr.anonymous8410 2 жыл бұрын
@@HackersRealm Hi how to predict captions for new images (which is not present in the flickr dataset)?.
@rancoraider9702
@rancoraider9702 10 ай бұрын
Thanks bro. I get an error when I run this part of the code: # load vgg16 model model = VGG16() # restructure the model model = Model(inputs=model.inputs, outputs=model.layers[-2].output) # summarize print(model.summary())
@HackersRealm
@HackersRealm 10 ай бұрын
if you're running in kaggle, please enable internet connection in the settings which will be on right pane.
@soumyasingh8500
@soumyasingh8500 9 ай бұрын
hi, so whenever the session ends, on restarting or resuming it, it loads all the data and training again, so it takes 3 hours again. Even on saving the model, it does the same. what to do?
@HackersRealm
@HackersRealm 9 ай бұрын
If you save the model, you can skip some the steps used for training. Else saving the model is no use for us.
@bhushanambhore8378
@bhushanambhore8378 Жыл бұрын
Does you model doing training for all 8000 images in the dataset? Because when I tried different model it only taking at the most 1600 images for training from dataset due to memory issue.
@HackersRealm
@HackersRealm Жыл бұрын
the memory issue won't happen due to custom data generator function.
@bhushanambhore8378
@bhushanambhore8378 Жыл бұрын
@@HackersRealm Okay, but approx how many image does your model using for training, is it using all the 8000 images from the dataset?
@HackersRealm
@HackersRealm Жыл бұрын
@@bhushanambhore8378 i think around 6.5k something, you can check the video again as i have split the data for train and test
@shouryatyagi8947
@shouryatyagi8947 Жыл бұрын
Hey in model training i am getting the error that is it failed to convert a numpy array to a tensor
@shouryatyagi8947
@shouryatyagi8947 Жыл бұрын
No woory its good now😂😂
@LK-cp5ow
@LK-cp5ow 9 ай бұрын
It will be great if you add the pickled model in the git repo, as it's going to take my pc about 4hrs to train the model... :(. Other than that, fantastic video!
@HackersRealm
@HackersRealm 9 ай бұрын
I will try to upload that if possible
@user-kv8oh8lx7y
@user-kv8oh8lx7y Жыл бұрын
you are amazing
@HackersRealm
@HackersRealm Жыл бұрын
thanks for your kind words!!!😄
@user-vt6yw2td3v
@user-vt6yw2td3v 6 ай бұрын
can you please tell me the which alogrithms are used in image captioning???
@HackersRealm
@HackersRealm 6 ай бұрын
I have used vgg and lstm models for the neural network
@VinoconTino
@VinoconTino Жыл бұрын
Hey, is it also possible to generate a longer description than only one sentence?
@HackersRealm
@HackersRealm Жыл бұрын
yeah if you train with longer description for the whole model. Then the model can predict longer descriptions
@pranavip999
@pranavip999 Жыл бұрын
Hii, please tell me how to reload the model using .h5 file and without running all the epochs. I tried it but it is giving me error
@HackersRealm
@HackersRealm Жыл бұрын
you can use keras, load_model() function to load the model. What error you're facing?
@garvitgupta792
@garvitgupta792 9 ай бұрын
Could you please tell me which application are you using to code? I am new to this and only know about Colab and Notebook.
@HackersRealm
@HackersRealm 9 ай бұрын
This is kaggle
@stzgaming2430
@stzgaming2430 2 жыл бұрын
Can you please help me? AFTER TRANING next time when i open this proj i need to run whole code again, can you tell me to run just testing part and use previous saved traning model for that..?
@HackersRealm
@HackersRealm 2 жыл бұрын
you can check for how to save and load keras model, it will help you
@stzgaming2430
@stzgaming2430 2 жыл бұрын
@@HackersRealm i tried soo many things but can't find as in your code there is save features pickel and also load features code can you please provide me save traning and load traning code ?
@HackersRealm
@HackersRealm 2 жыл бұрын
@@stzgaming2430 www.tensorflow.org/guide/keras/save_and_serialize
@stzgaming2430
@stzgaming2430 2 жыл бұрын
@@HackersRealm hi bro sorry to disturb you again, now i get the code to load my save model but after that i don't understand what to do for prediction, i find a code for prediction but don't know which function to call or which perameter to pass 😞
@lalitagarwal9155
@lalitagarwal9155 3 ай бұрын
Good evening sir, actually I am building a food website where I want to implement a feature like taking input food image from the user and generate caption of that image and then search in the database using that caption.. So my question is just that can I use the same code to generate name for the food image inputted from user using Food-101 dataset.
@HackersRealm
@HackersRealm 3 ай бұрын
If you have a similar dataset, you can train the model.
@TheYashO
@TheYashO Жыл бұрын
Thank you for such a nice implementation and explanation ,I have 1 doubt So can you please guide me for changes to be done to get captions for random internet images ?Thank you
@HackersRealm
@HackersRealm Жыл бұрын
The code snippet is already available in my website. link is in the description. For better results, you have to train with more images.
@bhushanambhore8378
@bhushanambhore8378 Жыл бұрын
getting error in model = vgg16. I think it requires to enable internet in kaggle but there is no internet option there. please help bhai..
@HackersRealm
@HackersRealm Жыл бұрын
You can enable internet on the right pane in kaggle notebook, there is settings to do the configuration
@Suchithrads2003-rb5sm
@Suchithrads2003-rb5sm 2 ай бұрын
Can u make a vedio of software installation n setting environments for image captions generating..
@HackersRealm
@HackersRealm 2 ай бұрын
You can use kaggle notebook which is a online IDE, it's simple to use like I showed in the video
@Mehedi.25
@Mehedi.25 Жыл бұрын
sir this project ,i can be use in cse final year project?..please help me sir
@HackersRealm
@HackersRealm Жыл бұрын
yeah you could use this as base paper
@satyamrawat4079
@satyamrawat4079 3 ай бұрын
Sir i m getting this error --> TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not. And "2.6.2" version of tensorflow is not available. Is anyone else facing this same issue? How to solve this?
@HackersRealm
@HackersRealm 3 ай бұрын
it's resolved, please check the github code for latest update
@kannan1427
@kannan1427 Жыл бұрын
Is it necessary to run the code everytime when we open or can we save the trained model
@HackersRealm
@HackersRealm Жыл бұрын
yes you can save the trained model
@user-qv1wv2vy6y
@user-qv1wv2vy6y 7 ай бұрын
Sir, after successfully importing a dataset using a Kaggle token, all steps proceeded smoothly until the 'summarize' stage. However, an error emerged during the 'extracting' step, indicating 'No such file or directory: '/kaggle/input/flicr8k/Images.'' can you please guide
@HackersRealm
@HackersRealm 7 ай бұрын
are you using the notebook in kaggle environment as shared in the video?
@user-qv1wv2vy6y
@user-qv1wv2vy6y 7 ай бұрын
i am using google colab@@HackersRealm
@amalajohn4053
@amalajohn4053 Жыл бұрын
Can we do image caption generation to identify leaf structures?
@HackersRealm
@HackersRealm Жыл бұрын
why we need caption for leaf?
@amalajohn4053
@amalajohn4053 Жыл бұрын
@@HackersRealm can we convert the image caption to audio format and how it will work?
@manmaaze
@manmaaze 2 жыл бұрын
What if I give an image other than in the dataset? Will it preditct the caption ?
@HackersRealm
@HackersRealm 2 жыл бұрын
It will try to predict the caption in general manner; You can train with more images for better prediction
@FatimaYousif
@FatimaYousif 2 жыл бұрын
@@HackersRealm how will we achieve the finding of unseen image's captions in the code? Would be grateful if you help me in this regard, since I have a demo to present on new/unseen images the next week.
@HackersRealm
@HackersRealm 2 жыл бұрын
@@FatimaYousif You can train the model with flickr 32k dataset, that will give good predictions on new image data
@dataS-lr2nx
@dataS-lr2nx Жыл бұрын
Hi nice tutorial. I did the same coding in jupyter notebook (browser) by opening it using "python -m notebook" . it was not progressing after line 4.# extract features from image .its only showing 0%| | 0/8091 [00:00
@HackersRealm
@HackersRealm Жыл бұрын
If you need gpu to process that quicker like in the gpu... It's seperate setup that needs to be done
@dataS-lr2nx
@dataS-lr2nx Жыл бұрын
@@HackersRealm what is the separate gpu setup needed.
@HackersRealm
@HackersRealm Жыл бұрын
@@dataS-lr2nx If you have nvidia gpu, need to install cuda toolkit and related libraries
@nishantchauhan9862
@nishantchauhan9862 2 жыл бұрын
Sir i tried changing VGG16 to ResNet but i am facing error while training the model. i just imported resnet and changed the part where vgg16 was called to resnet50, do i need to change something else also? please help.
@HackersRealm
@HackersRealm 2 жыл бұрын
what error it's showing?
@nishantchauhan9862
@nishantchauhan9862 2 жыл бұрын
@@HackersRealm Invalid argument: In[0] mismatch In[1] shape: 2048 vs. 4096: [1749,2048] [4096,256] 0 0 [[node model_2/dense/MatMul (defined at tmp/ipykernel_38/2162571047.py:10) ]] [[categorical_crossentropy/softmax_cross_entropy_with_logits/Shape_2/_5]]
@Itsdeyasini
@Itsdeyasini Жыл бұрын
@@nishantchauhan9862 were you able to solve I am getting the same error
@IndrajeetKumar-dm1xp
@IndrajeetKumar-dm1xp 25 күн бұрын
While training the model on kaggle the training part in not executing taking a lot of time and returns ram error
@Robin_Tdhr
@Robin_Tdhr Жыл бұрын
hey bro i tried this to train on 30k but crashed, lower downed the train data to 50%, batch size155 and 20 epcoh still crashed. Could you help here?
@HackersRealm
@HackersRealm Жыл бұрын
reduce the batch size and run.... use the same code for running that!!!
@sid8777
@sid8777 Жыл бұрын
What's your IDE? Looks pretty cool
@HackersRealm
@HackersRealm Жыл бұрын
It's kaggle notebook, online ide
@SaiKumar-mf3pw
@SaiKumar-mf3pw Жыл бұрын
Can we use jupyter notebook for this?
@HackersRealm
@HackersRealm Жыл бұрын
yes, I am using jupyter notebook in kaggle, that's the only difference
@percyjackson583
@percyjackson583 5 ай бұрын
Great video but can anyone tell what application is being used like what is the name of IDE that is being used ? anyone pls quick
@HackersRealm
@HackersRealm 5 ай бұрын
it's kaggle notebook
@hariom6910
@hariom6910 Жыл бұрын
IS GUI FOR THE ABOVE PROJECT AVAILABLE?? IT WILL BE HELPFUL IF U PROVIDE SIRR..
@HackersRealm
@HackersRealm Жыл бұрын
no, gui is not available for this
@utkarshmarathe7028
@utkarshmarathe7028 Жыл бұрын
My colab environment is crashing because of RAM errors. I have used same code for batch generation. But keras is starting all the batches in parallel until RAM crashes. How to solve this problem or limit number of batches executing in parallel
@HackersRealm
@HackersRealm Жыл бұрын
this is not parallel processing, if you use the same code, you won't get memory issue, set the batch size below 128
@atulsingh2813
@atulsingh2813 11 ай бұрын
hi did you solve the issue in colab i am getting the same problem
@atulsingh2813
@atulsingh2813 11 ай бұрын
My session in colab environment is crashing each time i am using 20 epochs and 64 batch size on t4 Gpu
@HackersRealm
@HackersRealm 11 ай бұрын
check how much memory usage is happening or reduce the batch size and try again
@NoobIGL-Yt
@NoobIGL-Yt 6 ай бұрын
so what is methodology used CNN-LSTM?
@HackersRealm
@HackersRealm 6 ай бұрын
Yes
@usmanyousaaf
@usmanyousaaf Жыл бұрын
1:26:12 error while call the generate_caption function [Errno 2] No such file or directory:
@HackersRealm
@HackersRealm Жыл бұрын
Are you running it in kaggle notebook?
@jodgamezoo2076
@jodgamezoo2076 4 ай бұрын
Getting there error while triainng the model . Code : # train the model epochs = 20 batch_size = 32 steps = len(train) // batch_size model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) for i in range(epochs): # create data generator generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size) # fit for one epoch model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1) Error: TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not.
@jodgamezoo2076
@jodgamezoo2076 4 ай бұрын
I tried actual code also :# train the model epochs = 20 batch_size = 32 steps = len(train) // batch_size ​ for i in range(epochs): # create data generator generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size) # fit for one epoch model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1)
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 242 М.
How I'd Learn AI in 2024 (if I could start over)
17:55
Dave Ebbelaar
Рет қаралды 872 М.
Survival skills: A great idea with duct tape #survival #lifehacks #camping
00:27
路飞被小孩吓到了#海贼王#路飞
00:41
路飞与唐舞桐
Рет қаралды 75 МЛН
Now THIS is entertainment! 🤣
00:59
America's Got Talent
Рет қаралды 36 МЛН
ТАМАЕВ УНИЧТОЖИЛ CLS ВЕНГАЛБИ! Конфликт с Ахмедом?!
25:37
How I'd Learn AI (If I Had to Start Over)
15:04
Thu Vu data analytics
Рет қаралды 749 М.
How I would learn Machine Learning (if I could start over)
7:43
AssemblyAI
Рет қаралды 749 М.
How I’d learn ML in 2024 (if I could start over)
7:05
Boris Meinardus
Рет қаралды 998 М.
How to Make Your Images Talk: The AI that Captions Any Image
12:58
Pritish Mishra
Рет қаралды 50 М.
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 182 М.
The future of AI looks like THIS (& it can learn infinitely)
32:32
TensorFlow in 100 Seconds
2:39
Fireship
Рет қаралды 918 М.
Survival skills: A great idea with duct tape #survival #lifehacks #camping
00:27