How Web Scrape Multiple Pages with ONE Function with Python

Рет қаралды 45,322

John Watson Rooney

Күн бұрын

Пікірлер: 70

@jagen_happy 3 жыл бұрын

This is the most incredible tutorial related to scrapping that I have ever seen. Fank you Dr .Mister John Watson Rooney

@niklasklotz8456 3 жыл бұрын

Been searching a whole day through the internet and didn't find a solution. This is hands down the best tutorial regarding scrapping I've seen. Subscribed for more tutorials like this one :)

@saniabbas1982 Жыл бұрын

Literally all questions answered by amazing instructor. Really appreciate

@abdel-karimosmanuazumah499 2 жыл бұрын

you are more than a legend . This is the best web scraping channel ever.

@JohnWatsonRooney 2 жыл бұрын

Thanks 🙏

@Gaz86JPN 2 жыл бұрын

This was incredible and a really great explanation. As someone new to Python it was really easy to follow and created a lot of questions for me to dig into to find the answers to (this is the best kind of teacher!)

@JohnWatsonRooney 2 жыл бұрын

Thanks! Very kind I’m glad it helped

@augustinfrancotte3163 3 жыл бұрын

Nice, thanks for this video ! It made me understand quite quickly and clearly the operation of beautifulsoup and scraping. Perfectly what I needed

@arif-fadhillah 3 жыл бұрын

Thank you John, very helpful in solving my problem finding tags on all pages

@kecvu Жыл бұрын

Wow Pandas to excel is amazing, didn't know it existed, thanks

@edoardopontecorvi5743 2 жыл бұрын

Best video on the topic so far. Thanks!

@JagjeetSingh-rw3rf 3 жыл бұрын

This really makes the process clear. Thanks a lot John !!

@jerrychoi3714 4 жыл бұрын

Thanks for the step by step tutorial. I now have clearer concept.

@AliHussein-pb6cz 2 жыл бұрын

Great Video John. Thanks for sharing your knowledge :)

@JohnWatsonRooney 2 жыл бұрын

Glad you enjoyed it

@rileyhansen2426 2 жыл бұрын

Hey John, Thank you for the top-tier tutorial! It is everything I was hoping to find. I am attempting to replicate some of your strategies and I can't seem to get the "print(len(questions))" aspect of your code to return anything...How do I trouble shoot something like that when all I am getting is zero back?

@nathan9771 3 жыл бұрын

i cant even begin to describe... just take my subscription

@JohnWatsonRooney 3 жыл бұрын

Haha thanks Nathan

@alpacino5025 3 жыл бұрын

Really helpful, what would you do if the URL of the page does not change? (using javascript to change page)

@martyrd0m 3 жыл бұрын

Can you do the same with dynamic website. I'm working on one but I failed

@eldadimatteo7409 3 жыл бұрын

great tutorial! 1 question: how can i edit the range if my urls are like these?c=burgers&find_loc=V7L%20Vancouver&start=0 c=burgers&find_loc=V7L%20Vancouver&start=10 c=burgers&find_loc=V7L%20Vancouver&start=20 thanks!

@Shadden 2 жыл бұрын

Have you discovered how to fix this problem?

@lakchchayamdivyakhare2163 3 жыл бұрын

Awesome!!!! good explaination

@k-melj2118 3 жыл бұрын

you are just the best man thank you so much 🙏🏾 🙏🏾 🙏🏾

@forceman1982 2 жыл бұрын

Hi John, congratulations on the video is amazing. When reproducing the code I get the following error: 'votes': str(item.find('span', {'class': 'vote-count-post'}).text), AttributeError: object 'NoneType' has no attribute 'text' Do you know what could be due? Thanks in advance.

@vy-canis4957 Жыл бұрын

Hello @JohnWatsonRooney thanks for the help.. I'm currently working on some large data which require longer time to scrap. However, python stops sometimes and I get nothing in excel sheet. Is there any way we can scrap and save in excel at the same time so in case if python stops we still have saved data? please

@JohnWatsonRooney Жыл бұрын

Sure, after each successful page or item scraped you could open and append to a csv file, meaning anytime it fails or stops you’d still have the data up to that point

@vy-canis4957 Жыл бұрын

@@JohnWatsonRooney Thank you so much. it worked..

@techtbe 2 жыл бұрын

so amazing, to the point..great tutorial

@serageibraheem2386 3 жыл бұрын

super super awesome. thank you

@oussmayo 3 жыл бұрын

How do you scrape an item that can hold multiple values? Im trying to scrape a foodblog webpage that has an element that can hold multiple values such as vegan, gluten free, dairy free etc, my code will only print the first value of this element and skip the rest. Also how would i append the values to the same row of a list once im able to scrape all the values of this element.

@sarahshah3172 3 жыл бұрын

Thanks John. This was a great video. I am not a programmer and i am looking for real time news headline scraping software. Is there such software I could purchase online?

@raniasaleh3999 2 жыл бұрын

Hello John, Thank you for your great content, I'm a beginner in python and I'd appreciate it if you help me, I copied your code, just added my user agent but the produced excel file is empty. what could be the reason?

@Vetixpr Жыл бұрын

@rania have you ever figured it out? I came across the same issue. Previously was attempting with my own approach to include csv/exel export and while file was generating, it would either be empty or contain only couple numbers. After using the same exact python file - both excel and csv files are created as empty.

@teknologiinformasi4686 3 жыл бұрын

thank you, for tutorials

@liamalam 3 жыл бұрын

Thanks for the great tutorial.

@JohnWatsonRooney 3 жыл бұрын

Glad it was helpful!

@rameshks5281 4 жыл бұрын

Hi sir, I’ve multiple URLs in my excel file (example in 'A ' column) and I need to extract desired values from url's(some urls are invalid too) and extract data to another cells like ( B, C and D etc. columns) and how to scrape data from multiple url present in excel A column by one by one and extract sir

@JohnWatsonRooney 4 жыл бұрын

Sure that’s very possible - build the scraper for one site first, then loop through the excel file for each one. I’d create a function for the scraper to make life easier

@rameshks5281 4 жыл бұрын

@@JohnWatsonRooney thank you ☺️

@EUU100 3 жыл бұрын

Thanks, really clear and helpful

@bidhanry9740 3 жыл бұрын

hello sir i have a query - like you are extracting date, votes, question in the same way I am scraping linked in profile details like in experience section- company name,duration,job title, now most the time all these details are not filled up so the code throws an error whenever it finds nothing. Sir how to make change in the code if found nothing then just keep blank. here Is my code's part - exp_section = soup.find('section', {'id': 'experience-section'}) # print(exp_section) exp_section = exp_section.find('ul') div_tag = exp_section.find('div') a_tag = div_tag.find('a') job_title = a_tag.find('h3').get_text().strip() company_name = a_tag.find_all('p')[1].get_text().strip() joining_date = a_tag.find_all('h4')[0].find_all('span')[1].get_text().strip() exp = a_tag.find_all('h4')[1].find_all('span')[1].get_text().strip() info.append(company_name) info.append(job_title) info.append(joining_date) info.append(exp) info please help me to get my desired output

@erenhan 2 жыл бұрын

I wish I could have chance to give multiple likes

@ebohnenb 3 жыл бұрын

what about if you all pages at once?

@virendram1744 2 жыл бұрын

How can i print all the questions in that related to python help me

@wesleybaird2752 3 жыл бұрын

how if you didn't have the class index the results to sort through them?

@TheAlexander775 2 жыл бұрын

I'm getting duplicates with the for x in range loop, it's not changing page.

@JohnWatsonRooney 2 жыл бұрын

yeah to be honest I messed it up I think, my newer videos are much better I promise!

@tabmax22 2 жыл бұрын

how do you then get data to the frontend of a web app t do something with it?

@ALANAMUL 4 жыл бұрын

Thanks

@PrincePrincess13 2 жыл бұрын

How do when the data extracted are repeated as 1 page. What should I do to remove the error?

@SunDevilThor 3 жыл бұрын

I could not get the votes section to work properly. No matter what I tried, I kept getting None types returned.

@jibran738 3 жыл бұрын

What should be selected from the following when we use the code [ questions =soup.find_all( ‘ what to ‘, { ‘insert’ : ‘here’ } ) ] : …. #this is where the main body content is Please help, Thank you

@maynafred7522 3 жыл бұрын

hmm,fire..

@kashyapkumbhani3457 2 жыл бұрын

I want to scrape h1 of the whole website !! that website doesn't have a pagination system and has more than 1million pages. what to do ?

@ManishKumar-br5sf 4 жыл бұрын

sir plz do a amazon web scraping with multiple page scraping plz

@JohnWatsonRooney 4 жыл бұрын

Sure I can do that

@eligr8523 Жыл бұрын

how do I insert this data into a database with sqlite?

@SanjayFuloria 3 жыл бұрын

How do I scrape tables from different pages of a website using BeautifulSoup?

@ollie5845 2 жыл бұрын

I have been creating an Amazon Web scraper. This video may be helpful: kzbin.info/www/bejne/aofJoommid9nh5Y

@darrinreed9675 2 жыл бұрын

at kzbin.info/www/bejne/o17OoHyPjKiUf9E you mention needing to introduce times in order to not get blocked. How exactly could we do that? Is it possible to do maybe 5 at a time then wait how ever long is needed to not get blocked? I haven't been able to find a video explaining the proper way of adding delays with looping through url's.

@nassimbouhaouita1697 2 жыл бұрын

Thats just one page ty for wasting my time

@bradygovender9500 2 жыл бұрын

clear to see that you didn't bother watching the entire video because he clearly shows you how to get the data from other pages. looks like you're wasting your own time buddy

@srenlindbo4523 2 жыл бұрын

Hello, any idea why I cannot extract the .text from this website element: $40,000 - $100,000 a year

@JohnWatsonRooney 2 жыл бұрын

Try printing the element without the text and see if you get “None”, also try filling in the space in the class name with a “.” Hope that helps!

@srenlindbo4523 2 жыл бұрын

@@JohnWatsonRooney Thanks for the reply - I do indeed get "none" if I print without .text. And for some of the elements I also get a print of the entire class like so: $40,000 - $100,000 a year They use two different classes for salary and some of the values are empty because there is no salary listed. 'jobposting-salary SerpJob-salary' and 'jobposting-salary SerpJob-salary SerpJob-salary--is-estimate' I am only interested in printing the text from the element but it only works if I print without .text and thus I receive the whole class. But I will try and play around with it - thanks a lot for the video :)