I'm finally all caught up on your videos. I have watched all of them (except for the livestreams) and want to say thank you since you have increased my programming skills 10x since I started. Going forward, I would love to see the following from your channel (these are just suggestions): 1. Tutorials on building shopping bots (GPUs, CPUs, PS5, Xbox, etc. - or even everyday items that are sold out due to supply shortages). 2. Freelancing content with Python (Upwork, Fivver, Freelancer, etc.) and projects regarding something that people might pay money for. 3. Projects combining web-scraping and OSInt 4. A video regarding how you got so good at coding and what your top resources or projects were that got you there.
@JohnWatsonRooney3 жыл бұрын
hey, thanks for the feedback I really appreciate it and I'm really pleased that you have feel like you've learnt a lot from watching my videos! Some cool video ideas I will keep them in mind for going forward with my channel!
@jude30462 жыл бұрын
Hey John, earned a new sub with this video for sure. I'm new(er) to python and enjoyed watching how you logically used classes, definitions and your main loop. And the difficulty of the project is spot on for a KZbin vid. Appreciate it!
3 жыл бұрын
Followed since very early days, thanks for great content as always. I am also happy to see that your chancel gains more popularity and your way of delivery is also more and more confident and inspired. (y)
@JohnWatsonRooney3 жыл бұрын
Thank you I really appreciate it!
@ShenderRamos2 жыл бұрын
great video, love your content as software engineer and working on web scrapping for personal fun project love it. always learning something new
@stewart51362 жыл бұрын
Another great video John. I note that you now have to render the page or you get "TypeError: 'bool' object is not iterable" - because it wasn't rendered. Added the one line of code and that fixed the issue: After r = self.session.get(self.url + str(page)) Insert r.html.render(sleep=1)
@HappyMrGhost Жыл бұрын
After adding that in I get the following errror: pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded edit: I've found a fix for the error I got, just change it to: r.html.render(timeout=20)
@bruce_its_me46956 ай бұрын
Straight forward and amazing. Subbed.
@ReverendZen2 жыл бұрын
Hi John, I am not seeing the code in your description and I am having an error with "TypeError: 'bool' object is not iterable". Can you provide a link to the code? Thank you, great video by the way...
@hrithiksharma93552 жыл бұрын
hey did you find solution ??
@Cre8Babies2 жыл бұрын
Posting for anyone else stuck: There's something wrong with the link you've selected. I spent an hour troubleshooting this before realizing it was an issue with my specific amazon link. The pagination was just returning false every time regardless of what I changed. I copied John's link word for word and I didn't get the error any more.
@eduardotejeda3 жыл бұрын
Man, you make it look soo easy. 😂 Great content.
@JohnWatsonRooney3 жыл бұрын
Thank you very kind!
@krikztv1952 Жыл бұрын
I love you mate! It works
@juanignaciolopezlopez452 жыл бұрын
Hi John. Happy year 2022. Lately I'm scraping with puppeteer + selectors. A good scraper has to have a lot of resources
@JohnWatsonRooney2 жыл бұрын
Absolutely, puppeteer is a great choice.
@SizzleSan2 жыл бұрын
Hey, great tutorial. Why do you extract the asin and put it back into url when it is already in there? Couldn't you give the url directly to the class instead of the asin?
@Valentin4393 жыл бұрын
Great video! Thank you
@Manveer_Dhindsa Жыл бұрын
The code is not in the description
@seungho10012 жыл бұрын
SOS! how can we solve error: 'NoneType' object has no attribute 'text' ? Also, I notices, one whole page of reviews only return the first 2.
@abhishekdoke61022 жыл бұрын
facing same problem
@hollow68313 жыл бұрын
I like your content man!! anyway you can do more advanced projects using concurrency modules? (threading, multiprocessing, asyncio)
@JohnWatsonRooney3 жыл бұрын
Thanks! Yeah sure I can look into that
@fubarace10272 жыл бұрын
So I played around with this recently, it was my experience that the pagination didn't work. I didn't understand it really, as you're right, an out of bounds page of reviews DOES return a false... What I did was grab a line of text off the review page that said, "{x} total ratings, {y} with reviews ", did the math, and formed my pagination that way. Also, I had issues with returning NoneTypes from time to time, I added some exception handling to each line of the parse function, for title, rating, and body. try: except AttributeError: body = None, or title = None, or rating = None. This did the trick for me. Good video, I learned a bunch implementing, and updating it to work with the new reality of Amazon!
@itsnotberto2 жыл бұрын
I had the same issue with NoneTypes. Going to try a similar solution. Do you mind sharing how you handled the pagination? Always curious to see other's solutions
@fubarace10272 жыл бұрын
@@itsnotberto I couldnt get it to work via any of the python solutions I thought should work. So I went another direction. If you look at the all reviews page on an amazon product, you'll see to the left near the top there's a line that says something like, "3,512 total ratings, 446 with reviews". I captured that line of text, did a split on it since it's always in that format, did an isdigit to get the comma out of larger numbers like 3,200. I took that number and did the math, 10 reviews per page, did a modulus to get the extras when the last page is less than 10. I found once I set up the exception handling and the pagination I still had 2 glitches. 1) Items that include international reviews towards the end, happens most often with PC hardware, you'll grab those reviews as well. The problem is, for those reviews only two of the three tags are the same. The title tag will return empty as the selectors for the titles there are different, but for the stars and review text, international reviews following the all reviews page use the same CSS selectors. 2) For the particularly long ones with over 5k reviews, I would cap out at 5040 or so reviews. I didn't look too hard into why yet. You get about 40 reviews a second, which seems fast, but really isnt. The only reason I kept the code I put together for this was to see if I could thread it. I know threading works when you're doing multiple pages, I wasn't sure if I could get multiple threads while pulling from the same product. Pretty sure it'll work, just wanted to give it a try one of these days when I get a sec. Hope that was helpful.
@KawalpreetKauur Жыл бұрын
@@fubarace1027 Hey, my code is traversing the pages but, couldn't extract reviews from other ages except the first page. Meanwhile, just 10 or 11 reviews are imported to csv. Would you mind in sharing your code?
@itsnotberto2 жыл бұрын
Hey John, really appreciate you walking us through how to do this. I was trying to add in review-date but can't get it to work properly
@Pitta9053 жыл бұрын
Great tutorial!
@jobinnelson3 жыл бұрын
do we need to manually close the session, or is it automatically taken care? I've always wondered this because I do see people closing it and some not closing
@wicked9299 Жыл бұрын
for page 2 results are not getting only working for page 1
@tyricshuck33552 жыл бұрын
Really enjoying requests-html, thanks for introducing this. For this tutorial I can’t seem to get it to return the elements. It will return false or nothing at all. Does Amazon block scrapers? I have headers in place and have tried a basic script not using a class and the full URL. I’ve gotten it to return the element once and then it won’t again. My only thought is Amazon is blocking it. Same script works on the woo-commerce site. Thanks for any insight.
@JohnWatsonRooney2 жыл бұрын
Hey, are you adding in a real user agent to the header? That usually unblocks Amazon
@VideoTechDude2 жыл бұрын
@@JohnWatsonRooney I added my user agent and am still getting blocked.
@davidbautis2 жыл бұрын
@@JohnWatsonRooney in the pagination function if you print r.text this i see this Enter the characters you see below Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.
@boomboom-pro3 жыл бұрын
Thanks. Would you be interested to do python csv to wordpress posting?
@sejalgarg67642 жыл бұрын
hey, John thankyou, the video is very helpful. Can any video for how to work with Wordcloud for this scrapped data.
@adityapathak94422 жыл бұрын
Hey john i really like your videos. I am stuck at scraping a website which has multiple links and clicking on each link creates a new network request. How do i scrape data from all links at once??
@fischl_main56312 жыл бұрын
Keep getting when using the pagination method. Should I change to scrapy?
@JohnWatsonRooney2 жыл бұрын
A 200 response is good! Try printing r.text
@Alex-dd3oy2 жыл бұрын
Have you been able to scrape all the reviews made by a list of reviewers?
@htcsaj78762 жыл бұрын
Great content. How to scrape Amazon products with scrapy because it doesn't work with anything else. Any help?
@jobinnelson3 жыл бұрын
Also does requests_html only work on python 3.6. I'm using python 3.10
@JohnWatsonRooney3 жыл бұрын
3.6 and above, I've been using it on 3.9 and 3.10
@ahmedwaseem28068 ай бұрын
It's really amazing & It works Thanks alot Man 😍
@velorexvelorex46052 жыл бұрын
Can this work for reviews and ratings on Amazon Video?
@ThespecialOtaku3 жыл бұрын
If You can do a video about how to scrape Private Apis/ Cloudflare protected Apis that would be great!! Love your Content btw.
@JesFinkJensen Жыл бұрын
Unfortunately, it doesn't work for me. I can see in the response that Amazon wants me to solve a Captcha to prove that my script is not a bot...
@JohnWatsonRooney Жыл бұрын
Unfortunately Amazon updated some of their protection. My latest video has a new solution but it does involve a headless browser. I am also updating the review scraper script too
@GhazKhan99 Жыл бұрын
Where's the code in the link below?
@timorider65752 жыл бұрын
Please i want proxy rotating for offerwall
@kasperborgbjerg28 күн бұрын
Does this work November 2024?
@interestingplanet10462 жыл бұрын
trying to scarp 5k asins and robot test keep blocking me after 1.2k requests
@JohnWatsonRooney2 жыл бұрын
You’ll either have to slow it down or look at using proxies - some free ones work but be careful and don’t send any sensitive data
@ShivKatira Жыл бұрын
@@JohnWatsonRooney Hi John, I tried to run your code very recently but came to know that Amazon has put some kind of authentication page to avoid bots. Is there any way to bypass this. Looking forward to your reply.