How I Scrape Amazon Product Reviews with Python

  Рет қаралды 17,192

John Watson Rooney

John Watson Rooney

Күн бұрын

Пікірлер: 61
@SunDevilThor
@SunDevilThor 3 жыл бұрын
I'm finally all caught up on your videos. I have watched all of them (except for the livestreams) and want to say thank you since you have increased my programming skills 10x since I started. Going forward, I would love to see the following from your channel (these are just suggestions): 1. Tutorials on building shopping bots (GPUs, CPUs, PS5, Xbox, etc. - or even everyday items that are sold out due to supply shortages). 2. Freelancing content with Python (Upwork, Fivver, Freelancer, etc.) and projects regarding something that people might pay money for. 3. Projects combining web-scraping and OSInt 4. A video regarding how you got so good at coding and what your top resources or projects were that got you there.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
hey, thanks for the feedback I really appreciate it and I'm really pleased that you have feel like you've learnt a lot from watching my videos! Some cool video ideas I will keep them in mind for going forward with my channel!
@jude3046
@jude3046 2 жыл бұрын
Hey John, earned a new sub with this video for sure. I'm new(er) to python and enjoyed watching how you logically used classes, definitions and your main loop. And the difficulty of the project is spot on for a KZbin vid. Appreciate it!
3 жыл бұрын
Followed since very early days, thanks for great content as always. I am also happy to see that your chancel gains more popularity and your way of delivery is also more and more confident and inspired. (y)
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thank you I really appreciate it!
@ShenderRamos
@ShenderRamos 2 жыл бұрын
great video, love your content as software engineer and working on web scrapping for personal fun project love it. always learning something new
@stewart5136
@stewart5136 2 жыл бұрын
Another great video John. I note that you now have to render the page or you get "TypeError: 'bool' object is not iterable" - because it wasn't rendered. Added the one line of code and that fixed the issue: After r = self.session.get(self.url + str(page)) Insert r.html.render(sleep=1)
@HappyMrGhost
@HappyMrGhost Жыл бұрын
After adding that in I get the following errror: pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded edit: I've found a fix for the error I got, just change it to: r.html.render(timeout=20)
@bruce_its_me4695
@bruce_its_me4695 6 ай бұрын
Straight forward and amazing. Subbed.
@ReverendZen
@ReverendZen 2 жыл бұрын
Hi John, I am not seeing the code in your description and I am having an error with "TypeError: 'bool' object is not iterable". Can you provide a link to the code? Thank you, great video by the way...
@hrithiksharma9355
@hrithiksharma9355 2 жыл бұрын
hey did you find solution ??
@Cre8Babies
@Cre8Babies 2 жыл бұрын
Posting for anyone else stuck: There's something wrong with the link you've selected. I spent an hour troubleshooting this before realizing it was an issue with my specific amazon link. The pagination was just returning false every time regardless of what I changed. I copied John's link word for word and I didn't get the error any more.
@eduardotejeda
@eduardotejeda 3 жыл бұрын
Man, you make it look soo easy. 😂 Great content.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thank you very kind!
@krikztv1952
@krikztv1952 Жыл бұрын
I love you mate! It works
@juanignaciolopezlopez45
@juanignaciolopezlopez45 2 жыл бұрын
Hi John. Happy year 2022. Lately I'm scraping with puppeteer + selectors. A good scraper has to have a lot of resources
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Absolutely, puppeteer is a great choice.
@SizzleSan
@SizzleSan 2 жыл бұрын
Hey, great tutorial. Why do you extract the asin and put it back into url when it is already in there? Couldn't you give the url directly to the class instead of the asin?
@Valentin439
@Valentin439 3 жыл бұрын
Great video! Thank you
@Manveer_Dhindsa
@Manveer_Dhindsa Жыл бұрын
The code is not in the description
@seungho1001
@seungho1001 2 жыл бұрын
SOS! how can we solve error: 'NoneType' object has no attribute 'text' ? Also, I notices, one whole page of reviews only return the first 2.
@abhishekdoke6102
@abhishekdoke6102 2 жыл бұрын
facing same problem
@hollow6831
@hollow6831 3 жыл бұрын
I like your content man!! anyway you can do more advanced projects using concurrency modules? (threading, multiprocessing, asyncio)
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks! Yeah sure I can look into that
@fubarace1027
@fubarace1027 2 жыл бұрын
So I played around with this recently, it was my experience that the pagination didn't work. I didn't understand it really, as you're right, an out of bounds page of reviews DOES return a false... What I did was grab a line of text off the review page that said, "{x} total ratings, {y} with reviews ", did the math, and formed my pagination that way. Also, I had issues with returning NoneTypes from time to time, I added some exception handling to each line of the parse function, for title, rating, and body. try: except AttributeError: body = None, or title = None, or rating = None. This did the trick for me. Good video, I learned a bunch implementing, and updating it to work with the new reality of Amazon!
@itsnotberto
@itsnotberto 2 жыл бұрын
I had the same issue with NoneTypes. Going to try a similar solution. Do you mind sharing how you handled the pagination? Always curious to see other's solutions
@fubarace1027
@fubarace1027 2 жыл бұрын
@@itsnotberto I couldnt get it to work via any of the python solutions I thought should work. So I went another direction. If you look at the all reviews page on an amazon product, you'll see to the left near the top there's a line that says something like, "3,512 total ratings, 446 with reviews". I captured that line of text, did a split on it since it's always in that format, did an isdigit to get the comma out of larger numbers like 3,200. I took that number and did the math, 10 reviews per page, did a modulus to get the extras when the last page is less than 10. I found once I set up the exception handling and the pagination I still had 2 glitches. 1) Items that include international reviews towards the end, happens most often with PC hardware, you'll grab those reviews as well. The problem is, for those reviews only two of the three tags are the same. The title tag will return empty as the selectors for the titles there are different, but for the stars and review text, international reviews following the all reviews page use the same CSS selectors. 2) For the particularly long ones with over 5k reviews, I would cap out at 5040 or so reviews. I didn't look too hard into why yet. You get about 40 reviews a second, which seems fast, but really isnt. The only reason I kept the code I put together for this was to see if I could thread it. I know threading works when you're doing multiple pages, I wasn't sure if I could get multiple threads while pulling from the same product. Pretty sure it'll work, just wanted to give it a try one of these days when I get a sec. Hope that was helpful.
@KawalpreetKauur
@KawalpreetKauur Жыл бұрын
@@fubarace1027 Hey, my code is traversing the pages but, couldn't extract reviews from other ages except the first page. Meanwhile, just 10 or 11 reviews are imported to csv. Would you mind in sharing your code?
@itsnotberto
@itsnotberto 2 жыл бұрын
Hey John, really appreciate you walking us through how to do this. I was trying to add in review-date but can't get it to work properly
@Pitta905
@Pitta905 3 жыл бұрын
Great tutorial!
@jobinnelson
@jobinnelson 3 жыл бұрын
do we need to manually close the session, or is it automatically taken care? I've always wondered this because I do see people closing it and some not closing
@wicked9299
@wicked9299 Жыл бұрын
for page 2 results are not getting only working for page 1
@tyricshuck3355
@tyricshuck3355 2 жыл бұрын
Really enjoying requests-html, thanks for introducing this. For this tutorial I can’t seem to get it to return the elements. It will return false or nothing at all. Does Amazon block scrapers? I have headers in place and have tried a basic script not using a class and the full URL. I’ve gotten it to return the element once and then it won’t again. My only thought is Amazon is blocking it. Same script works on the woo-commerce site. Thanks for any insight.
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Hey, are you adding in a real user agent to the header? That usually unblocks Amazon
@VideoTechDude
@VideoTechDude 2 жыл бұрын
@@JohnWatsonRooney I added my user agent and am still getting blocked.
@davidbautis
@davidbautis 2 жыл бұрын
@@JohnWatsonRooney in the pagination function if you print r.text this i see this Enter the characters you see below Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.
@boomboom-pro
@boomboom-pro 3 жыл бұрын
Thanks. Would you be interested to do python csv to wordpress posting?
@sejalgarg6764
@sejalgarg6764 2 жыл бұрын
hey, John thankyou, the video is very helpful. Can any video for how to work with Wordcloud for this scrapped data.
@adityapathak9442
@adityapathak9442 2 жыл бұрын
Hey john i really like your videos. I am stuck at scraping a website which has multiple links and clicking on each link creates a new network request. How do i scrape data from all links at once??
@fischl_main5631
@fischl_main5631 2 жыл бұрын
Keep getting when using the pagination method. Should I change to scrapy?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
A 200 response is good! Try printing r.text
@Alex-dd3oy
@Alex-dd3oy 2 жыл бұрын
Have you been able to scrape all the reviews made by a list of reviewers?
@htcsaj7876
@htcsaj7876 2 жыл бұрын
Great content. How to scrape Amazon products with scrapy because it doesn't work with anything else. Any help?
@jobinnelson
@jobinnelson 3 жыл бұрын
Also does requests_html only work on python 3.6. I'm using python 3.10
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
3.6 and above, I've been using it on 3.9 and 3.10
@ahmedwaseem2806
@ahmedwaseem2806 8 ай бұрын
It's really amazing & It works Thanks alot Man 😍
@velorexvelorex4605
@velorexvelorex4605 2 жыл бұрын
Can this work for reviews and ratings on Amazon Video?
@ThespecialOtaku
@ThespecialOtaku 3 жыл бұрын
If You can do a video about how to scrape Private Apis/ Cloudflare protected Apis that would be great!! Love your Content btw.
@JesFinkJensen
@JesFinkJensen Жыл бұрын
Unfortunately, it doesn't work for me. I can see in the response that Amazon wants me to solve a Captcha to prove that my script is not a bot...
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Unfortunately Amazon updated some of their protection. My latest video has a new solution but it does involve a headless browser. I am also updating the review scraper script too
@GhazKhan99
@GhazKhan99 Жыл бұрын
Where's the code in the link below?
@timorider6575
@timorider6575 2 жыл бұрын
Please i want proxy rotating for offerwall
@kasperborgbjerg
@kasperborgbjerg 28 күн бұрын
Does this work November 2024?
@interestingplanet1046
@interestingplanet1046 2 жыл бұрын
trying to scarp 5k asins and robot test keep blocking me after 1.2k requests
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
You’ll either have to slow it down or look at using proxies - some free ones work but be careful and don’t send any sensitive data
@ShivKatira
@ShivKatira Жыл бұрын
@@JohnWatsonRooney Hi John, I tried to run your code very recently but came to know that Amazon has put some kind of authentication page to avoid bots. Is there any way to bypass this. Looking forward to your reply.
@rifatmahmud5266
@rifatmahmud5266 3 ай бұрын
Can you please share the code?
@ShivamPatel-yg3kd
@ShivamPatel-yg3kd 2 жыл бұрын
can I get the code please
@JohnRoodAMZ
@JohnRoodAMZ Жыл бұрын
"Rewiews" lol, you must be lysdexic like me
Scraping Amazon's best Black Friday DEALS with Python
41:50
John Watson Rooney
Рет қаралды 14 М.
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 200 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
[BEFORE vs AFTER] Incredibox Sprunki - Freaky Song
00:15
Horror Skunx 2
Рет қаралды 20 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 43 МЛН
Scrape Amazon Data using Python (Step by Step Guide)
24:14
Darshil Parmar
Рет қаралды 173 М.
Beautiful Soup 4 Tutorial #1 - Web Scraping With Python
17:01
Tech With Tim
Рет қаралды 491 М.
Web Scraping with Python and BeautifulSoup is THIS easy!
15:51
Thomas Janssen
Рет қаралды 45 М.
The Biggest Mistake Beginners Make When Web Scraping
10:21
John Watson Rooney
Рет қаралды 122 М.
Scraping Amazon With Python: Step-By-Step Guide
23:03
Oxylabs
Рет қаралды 29 М.
Scraping Data from a Real Website | Web Scraping in Python
25:23
Alex The Analyst
Рет қаралды 524 М.
How to Scrape Amazon Reviews in 90 Seconds
4:45
Ryan Kulp
Рет қаралды 1,5 М.