AI Blog Post Summarization with Hugging Face Transformers & Beautiful Soup Web Scraping

Рет қаралды 16,899

Nicholas Renotte

Күн бұрын

Пікірлер: 77

@alokkumar8793 11 ай бұрын

i Cant import pipeline from transformer what to do?

@d3v487 3 жыл бұрын

Very nice explanation. How should I use this for a whole dataset. Please provide if you have any link.

@NicholasRenotte 3 жыл бұрын

You can use this for a whole dataset, it chunks it up :)

@upalkundu2872 3 жыл бұрын

Your videos are always useful. The explanation along with the work just makes it WoW. Again, really useful for beginners like me who want to get into data science.

@toriqhasmen9129 Жыл бұрын

Nick, the text you have results = soup.find_all('h1','p') only gets the headline of the article, when I tried it. Are you sure it gets all the text from 'soup'. Seems not to work for me. Also can you not use some python library that does html to text directly, instead of removing the html tags this way?

@abhishekshandilya5644 7 ай бұрын

made this last weekend for a hackathon, good little project to add to the arsenal. I’m still concerned about inference time. Is there an algorithmic way we can accelerate it?

@ingoampt 2 ай бұрын

Can you or someone here tell me how can I make it now as an api and use it in Swift Xcode for an app !

@ThemanB1997 11 ай бұрын

Is this technique viable for hardcopy files if it's not a blog post online.

@testingemailstestingemails4245 2 жыл бұрын

how to do that trained huggingface model on my own dataset? how i can start ? i don't know the structure of the dataset? help.. very help how I store voice and how to lik with its text how to orgnize that I an looking for any one help me in this planet Should I look for the answer in Mars?

@rachelroselinarul8055 3 жыл бұрын

great job, I am doing research in abstractive text summarization so kindly upload more videos for abstractive text summarization from basics to advance. thank you.

@haardrao4387 Жыл бұрын

I am trying to use this model, but i am not able to extract the whole blog. can you please help me out with it?

@Venkatesh-vm4ll Жыл бұрын

can we able to ask question and return the summary what we teach

@muhmmedmomen8948 3 жыл бұрын

Here is a like from my side before ending the video 👍 the intro tells alot. Appreciated effort bro.

@NicholasRenotte 3 жыл бұрын

Thanks a ton @Muhmmed!!

@peshangjaafar8469 3 жыл бұрын

thank you so much. from iraq. peace...

@sameerpatel3201 3 жыл бұрын

Me: Likes the video even before watching it.

@NicholasRenotte 3 жыл бұрын

YEAHHHYAAA! Thanks so much @Sameer!

@islamrighi8395 3 жыл бұрын

Thank you very much for this tutorial. For my work I want to summarize pdf text in French, is it possible?

@NicholasRenotte 3 жыл бұрын

Hmmm, doesn't look like there's an explicit french model. You could use translation to convert to english then summarize and translate back to french though!

@hambaba2 3 жыл бұрын

Hi Nicholas, thank you for your wonderful videos. Have a question on this one, is it possible you add a feature that not only summarizes for each chunk, also for the same chunk it provide a number ( maybe between -1 to 1) that reflects the sentiment for that chunk , was it Positive or negative, in the form of a dataframe with two columns , "Summary", "Sentiment". Thanks again for awesome work you are doing.

@NicholasRenotte 3 жыл бұрын

Could definitely do that, try passing the chunk to something like this: kzbin.info/www/bejne/qavGq6OdhKqXjtU

@sarahelizabethnajeraespino7464 2 жыл бұрын

This is so amazingl! I am a complete beginner with data science but seems so useful! Would it be possible to do the same but for a list of URLs exported in a CSV file?

@NicholasRenotte 2 жыл бұрын

Sure could! Scrape them first using BeautifulSoup then run the summarizer over it!

@the_python_guide Жыл бұрын

Hey nick, here is another way to extract chunks. for i in range(len(res_arr)): length=len(res_arr[i].split(' ')) if(length+count

@vendroid6193 3 жыл бұрын

I can't thank you enough for all these videos Also as a suggestion for the next video, I would like to suggest building a chatbot from scratch Keep up the good work Sir

@NicholasRenotte 3 жыл бұрын

I think I've got a walk through using Watson Assistant as one of my earlier videos!

@CODTALES-KILLSTREAKS 3 жыл бұрын

Can you do this for tree care blog posts? I’m interested in seeing a summary of tree care posts

@NicholasRenotte 3 жыл бұрын

Definitely, grab some tree care blog posts and give it a crack!

@Maicolacola 3 жыл бұрын

Hi Nicholas, thanks for putting together an incredible tutorial. I was able to get this going in no time. I managed to use your script (with slight modifications) to summarize an approximately 10.5k scientific article down to 1.4k. Which brings me to my question. I set the max length to 300, but it returned a summary of 1.4k words. Do you know what might be going on here? I'm going to make a loop that keeps repeatedly runs the summarize code until the length of the text is below 300 words. I'll report back!

@Maicolacola 3 жыл бұрын

Update: I made a while loop that would keep going until the length of the summary was below the max_length I specified. It took two runs instead of one to achieve that. Despite it being a summary of a summary, it reads really close to the actual abstract. I think with some fine tuning, it could get most of the way there.

@NicholasRenotte 3 жыл бұрын

THIS IS AWESOME, yeah I've had mixed results with setting max_length and even min_length. Need to do some more digging into it. Wold love to hear mroe about your use case!

@anirbanpatra3017 2 жыл бұрын

Thanks For the Tutorial.I am really struggling with the chunking part.Is there a way I can understand it in a better way?? Is it possible to deploy this on streamlit??

@sebastianmayer5418 3 жыл бұрын

Thank you very much for this tutorial. I tried it with a very long text. I chunked the text (length of chunks < 500 words) and parse the chunks to the Transformer, like you did. My text has more than 20 chunks. After the input of the 16th chunk to the transformer i get an index-out of-range-error. (IndexError: index out of range in self) Does this transformer has a limit there? Do you have a solution for this problem? Thank you for response.

@NicholasRenotte 3 жыл бұрын

Weird, does it work with 15 chunks?

@sebastianmayer5418 3 жыл бұрын

@@NicholasRenotte yes

@NicholasRenotte 3 жыл бұрын

@@sebastianmayer5418 would it work if you break it up and do it in two runs?

@jorgerios4091 Жыл бұрын

Hi Nicho, I learned a lot from your vid, I don't know if the YT algo takes this in consideration but I wanted to say it anyway: Thank you.

@dab0927 3 жыл бұрын

Great tutorial and excellent explanations from beginning to end. My question for you is can you take several articles and produce 1 summary. In other words, it would be great to have a single summary of several related articles. Is that possible? If so, how is it different from the process up walked through in this video?

@NicholasRenotte 3 жыл бұрын

You could generate a summary for each. If you were looking at extracting key topics from each, topic discovery might be a better technique.

@upalkundu2872 3 жыл бұрын

Maybe append the result texts and run the summarizer on the extended one?

@syedalinaqi6274 3 жыл бұрын

Thanks for the great content. Channel subscribed!!! Can you please answer a question. Is it possible to create your own Language Model by using the web scraping data ? and then later do the transfer learning with Hugging Face transformers ?

@NicholasRenotte 3 жыл бұрын

Sure can, you can fine tune the underlying language models!

@syedalinaqi6274 3 жыл бұрын

@@NicholasRenotte Can you please make a video on how to do it? fine tune Language model using GPT2. Thanks in advance.

@NicholasRenotte 3 жыл бұрын

@@syedalinaqi6274 definitely, I've got it planned!

@slowedReverbJunction 3 жыл бұрын

I don't code in python and all just JavaScript , but this one seems interesting , can a noob in NLP like me can try this ??

@NicholasRenotte 3 жыл бұрын

If you can code in JS, you can probably code in anything 😅. Definitely give it a crack, you'll love it. Python ML and JS are the perfect combo!

@slowedReverbJunction 3 жыл бұрын

@@NicholasRenotte that's gr8 to know I will definitely give it a try now

@NicholasRenotte 3 жыл бұрын

@@slowedReverbJunction awesome stuff!

@captainng97 3 жыл бұрын

Hi, is this Abstractive or Extractive? 😅

@NicholasRenotte 3 жыл бұрын

Heya @Ng, it's using the same model as before which I believe is Extractive!

@muditrustagi5775 3 жыл бұрын

this was much needed!! Thank you !!!!!

@NicholasRenotte 3 жыл бұрын

Thanks for checking it out!!

@MyChris128 3 жыл бұрын

Great video, very well explained 👍

@NicholasRenotte 3 жыл бұрын

Thanks so much @Chris D!

@rokeyasiddiqua9375 3 жыл бұрын

awesome...! thanks a lot

@NicholasRenotte 3 жыл бұрын

Anytime @Rokeya, thanks for checking it out!

@thepythonprogrammer4338 2 жыл бұрын

Hats off brother love your videos

@ruchisehgal893 2 жыл бұрын

Hi Nick. This was a really great blog, but what if i have write the data in a word file(docx file) and have to put the sentences in bullet and add margins?

@ruchisehgal893 2 жыл бұрын

Also, there is an issue when a number that appears like 18.8 or 9.1 then this data is separated in different lines if we have to bullet it into different points. Can you let me know how to solve this

@ElTallerDeTD 3 жыл бұрын

Amazing video! 🤩

@NicholasRenotte 3 жыл бұрын

Thanks so much, glad you enjoyed it!

@henkhbit5748 3 жыл бұрын

As always, great intro. But as you know, not all viewers have a kangaroo 😎in their backyard and have English as their default language. It might be helpful if you're doing NLP to give some side notes about other languages. BTW: I did a small test and provided Dutch text without translation. And also translate the same text into English as input, summarize and back to Dutch. I compare both summaries and both versions are almost the same. In the non-translated version, a few words have been chopped.

@NicholasRenotte 3 жыл бұрын

😂😂 I had a good laugh at the kangaroo reference, was going to say even I don't have kangaroos in my yard. But tbh, I've got some living 20 minutes away from me so that argument was null and void. Wait so in the non-translated version it didn't really summarize?

@diegocaumont5677 3 жыл бұрын

dope dope dope

@NicholasRenotte 3 жыл бұрын

Yeahyyaa, thanks @Diego!

@Jandoesrun 3 жыл бұрын

I'm really thankful for your videos. You're my life savior for my thesis research. Can you make a video on how to use MediaPipe by google for hand gesture recorginition? Also, I'm really at a loss for how I can do my thesis. Can I consult with you? I'm trying to make use of the hand landmark values to classify certain words for sign language. I'm confused whether to use LSTMs, Transformers, BERT , GPT2. This is honestly overwhelming.

@NicholasRenotte 3 жыл бұрын

Yup, definitely! It'll be coming soon!

@Jandoesrun 3 жыл бұрын

@@NicholasRenotte Thank you very much! Sir Nicholas, I'm trying to research a tensor flow implementation of GPT models. Do you have any ideas?

@NicholasRenotte 3 жыл бұрын

@@Jandoesrun hot off the press: kzbin.info/www/bejne/mXncnoCqZriEpJo it uses PyTorch but you can change the backend to TF as well