I love this guy.. It's incredible how he always uploads videos with the most generic titles and casually drops the most specific and incredibly useful piece of information I need but cannot find anywhere else.. Thanks man
@bakasenpaidesu2 ай бұрын
We need advanced concept related to scraping.
@databasemadness2 ай бұрын
the best channel for scraping on youtube period
@Cheenaah-tw8xx2 ай бұрын
you are acually the best, you check comments and replay and response
@JohnWatsonRooney2 ай бұрын
:) do my best
@alexeykazmin75392 ай бұрын
Many thanks! You are the god of scraping! Seriously!
@divyanshugogna61522 ай бұрын
Love it thanks so much John !!! much requested video finally
@elu12 ай бұрын
Fantastic video. Looking forward to seeing more especially json to dataframe.
@christcombiccombichrist26512 ай бұрын
I think I have a concept solution. why not create an algorithm that use screen capture to get what user are seeing then store it to be edited later in to be stored in any structure desing, change contras edit text and headers. background change by using a prompt like chat gpt to state what you want. I know it sound complex but that tool would make webscraping easy, because what user are seeing on their screen that is what you want in both text and image with altering process
@notjpengineer2 ай бұрын
I'm an data engineer who works with data daily. I know that scrapping is getting harder than before, but that open opportunity for those who can crack these sites walls to extract data. Remember that data is vital to all businesses and there's a lot of not yet created products that needs scrapped data available out there (one of those products that succeed I hope is the one I'm building). Good coding for everyone.
@hamzahalli35002 ай бұрын
I hope you succeed, from Morocco 🇲🇦
@notjpengineer2 ай бұрын
@@hamzahalli3500 Thanks, my friend!
@werthersoriginal2 ай бұрын
I have to do something similar with KZbin. What KZbin does is they send you the the {key:values} in the HTML which you then have to extract to make any additional ajax calls. I would imagine this site does this as well when you want to look up an item by ID.
@NewEra-us9gd2 ай бұрын
thank you, but please can you make a full playlist tutorial of playwrite and python? like beginner from zero to hero
@wiresploit2 ай бұрын
Don't those cookies have expiration time attached to them? You could create a video about gaining valid cookies from a website dynamically.
@aarontalua94562 ай бұрын
Have you found a way to do that? That's been my problem lots of times.
@wiresploit2 ай бұрын
First of all, you need to send a GET request to that site. After you've got the response data set your HTTP session cookie attribute as response.cookies. And you're done.
@manic14142 ай бұрын
Can you do an overall flow-chart-approach video? Helping us understand the fundamentals and when/where to apply them would be gold! Maybe take 5 common e-commerce sites, and try to extract the same info between all? I feel like we miss nuggets when a site you scrape only requires a certain approach, as we miss potential logic to be applied in other circumstances. However, I much appreciate your explanations on the situations you encounter. Perhaps I'm just ignorant of your content; only 3 videos in, still lots to learn. Thanks for the great content!
@kexec.2 ай бұрын
seems current curl_cffi version (0.7.1) doesn’t include recent PR that supports requests style exception. I think it’s better to stay with other solutions until new version bumps up.
@maxeratorrАй бұрын
Im playing around with this, and noticed that after a while the backend API link no longer shows up in the inspect element and can thus no longer copy the curl for my code, when it was possible before. Any work around for this?
@CrazyFanaticMan2 ай бұрын
Another blessing of a video
@SiyaMedia2 ай бұрын
excellent video as usual, the question is how this information applies to the biggest anti-bot sites, social media sites especially those optimized for cellphones
@JohnWatsonRooney2 ай бұрын
Those are always going to have the best and most updated protection and are always a challenge to scrape, but essentially they work the same way, it’s just much harder to find and manipulate
@hamzabendi97512 ай бұрын
What if you need to automate the process of finding the id of the product ? How would you approach that
@tsp86952 ай бұрын
Good video! Can you make a video about capturing and reusing cookies and/or browser fingerprinting?
@kexec.2 ай бұрын
yeah I want this ❤ launching browser to get cookie seems too much for me (and binary size is gonna be crazy)
@JohnWatsonRooney2 ай бұрын
Yes, I'm working on a video along those lines already, shouldnt be too much longer!
@mdsegara1012 ай бұрын
Forgive my foolishness, does this means doing scraping manually in this case copying one by one the product code?
@EmanueleCannizzaro2 ай бұрын
Thank you John. Can this approach be used within scrapy?
@JohnWatsonRooney2 ай бұрын
Yes it can
@Cheenaah-tw8xx2 ай бұрын
hey, can you make a latest video about scraping amazon because your old one is kinda old and not really meaning full, u inspired me to learn scraping :D
@JohnWatsonRooney2 ай бұрын
I haven’t done much Amazon in a while sure I’ll take a look
@Cheenaah-tw8xx2 ай бұрын
@@JohnWatsonRooney thank you
@Ice.wallowcam1012 ай бұрын
Thank you so much!!
@rexsybimatrimawahyu32922 ай бұрын
So is it possible to use curl cffi with scrapy?
@JohnWatsonRooney2 ай бұрын
Yes, it works well. I don’t think I have a video with it but I’ve definitely done it on my own projects
@rexsybimatrimawahyu32922 ай бұрын
@@JohnWatsonRooney i tried to look around but cant find a solution to it. I think its a good video idea of how to use other http like package to override scrapy own http package, especially how strong curl_cffi is for scraping, i already use curl_cffi before but i use it with bs4 to do the scraping
@mml135-252 ай бұрын
Backend API scraping is definitely a good approach but it doesn't work everywhere. Sites like Amazon of Linkedin can be scraped only from HTML.
@luisechevarria186Ай бұрын
Is this because they are SSR? Like when you got to inspect them, there are really no requests to inspect?
@SigKappel2 ай бұрын
yes please!
@RaffayDoesTech2 ай бұрын
What do you think about all the guys using AI stuff for scraping?
@JohnWatsonRooney2 ай бұрын
IMO AI can help a lot with parsing and organising the data but it won’t let much with actually extracting the data which is the hard part
@alexdin15652 ай бұрын
but if the website use somthing like nextjs its very hard because all content are server rendered
@kexec.2 ай бұрын
in my experience, most websites expose their backend api
@kexec.2 ай бұрын
true in modern day we can’t just use httpx or playwright without impersonate or fingerprint 😢
@hamed68992 ай бұрын
Please scrape with more advanced and difficult problems
@nageshnaik53432 ай бұрын
chrom120 to chrome120😄😄
@JohnWatsonRooney2 ай бұрын
Haha thought no one would notice
@khazaddum65992 ай бұрын
am sorry, this video is way too superficial. Too many commands are thrown at the viewer without a proper explanation. Overall not a(n) insightful/helpful video other than "indeed, scraping is somewhat harder now, I guess". I have seen many nice videos of yours. This one missed the mark for me. Maybe make it clearer from the beginning what the purpose of this video is (general idea vs. teaching more in depth). Help me manage my expectations so I know whether or not it's right for me. Thank you!
@JohnWatsonRooney2 ай бұрын
Fair enough, thanks for the feedback I always appreciate constructive criticism