Learning Scraping is MUCH harder now.

Рет қаралды 7,893

Күн бұрын

Пікірлер: 54

@adamriha4502 2 ай бұрын

I love this guy.. It's incredible how he always uploads videos with the most generic titles and casually drops the most specific and incredibly useful piece of information I need but cannot find anywhere else.. Thanks man

@bakasenpaidesu 2 ай бұрын

We need advanced concept related to scraping.

@databasemadness 2 ай бұрын

the best channel for scraping on youtube period

@Cheenaah-tw8xx 2 ай бұрын

you are acually the best, you check comments and replay and response

@JohnWatsonRooney 2 ай бұрын

:) do my best

@alexeykazmin7539 2 ай бұрын

Many thanks! You are the god of scraping! Seriously!

@divyanshugogna6152 2 ай бұрын

Love it thanks so much John !!! much requested video finally

@elu1 2 ай бұрын

Fantastic video. Looking forward to seeing more especially json to dataframe.

@christcombiccombichrist2651 2 ай бұрын

I think I have a concept solution. why not create an algorithm that use screen capture to get what user are seeing then store it to be edited later in to be stored in any structure desing, change contras edit text and headers. background change by using a prompt like chat gpt to state what you want. I know it sound complex but that tool would make webscraping easy, because what user are seeing on their screen that is what you want in both text and image with altering process

@notjpengineer 2 ай бұрын

I'm an data engineer who works with data daily. I know that scrapping is getting harder than before, but that open opportunity for those who can crack these sites walls to extract data. Remember that data is vital to all businesses and there's a lot of not yet created products that needs scrapped data available out there (one of those products that succeed I hope is the one I'm building). Good coding for everyone.

@hamzahalli3500 2 ай бұрын

I hope you succeed, from Morocco 🇲🇦

@notjpengineer 2 ай бұрын

@@hamzahalli3500 Thanks, my friend!

@werthersoriginal 2 ай бұрын

I have to do something similar with KZbin. What KZbin does is they send you the the {key:values} in the HTML which you then have to extract to make any additional ajax calls. I would imagine this site does this as well when you want to look up an item by ID.

@NewEra-us9gd 2 ай бұрын

thank you, but please can you make a full playlist tutorial of playwrite and python? like beginner from zero to hero

@wiresploit 2 ай бұрын

Don't those cookies have expiration time attached to them? You could create a video about gaining valid cookies from a website dynamically.

@aarontalua9456 2 ай бұрын

Have you found a way to do that? That's been my problem lots of times.

@wiresploit 2 ай бұрын

First of all, you need to send a GET request to that site. After you've got the response data set your HTTP session cookie attribute as response.cookies. And you're done.

@manic1414 2 ай бұрын

Can you do an overall flow-chart-approach video? Helping us understand the fundamentals and when/where to apply them would be gold! Maybe take 5 common e-commerce sites, and try to extract the same info between all? I feel like we miss nuggets when a site you scrape only requires a certain approach, as we miss potential logic to be applied in other circumstances. However, I much appreciate your explanations on the situations you encounter. Perhaps I'm just ignorant of your content; only 3 videos in, still lots to learn. Thanks for the great content!

@kexec. 2 ай бұрын

seems current curl_cffi version (0.7.1) doesn’t include recent PR that supports requests style exception. I think it’s better to stay with other solutions until new version bumps up.

@maxeratorr Ай бұрын

Im playing around with this, and noticed that after a while the backend API link no longer shows up in the inspect element and can thus no longer copy the curl for my code, when it was possible before. Any work around for this?

@CrazyFanaticMan 2 ай бұрын

Another blessing of a video

@SiyaMedia 2 ай бұрын

excellent video as usual, the question is how this information applies to the biggest anti-bot sites, social media sites especially those optimized for cellphones

@JohnWatsonRooney 2 ай бұрын

Those are always going to have the best and most updated protection and are always a challenge to scrape, but essentially they work the same way, it’s just much harder to find and manipulate

@hamzabendi9751 2 ай бұрын

What if you need to automate the process of finding the id of the product ? How would you approach that

@tsp8695 2 ай бұрын

Good video! Can you make a video about capturing and reusing cookies and/or browser fingerprinting?

@kexec. 2 ай бұрын

yeah I want this ❤ launching browser to get cookie seems too much for me (and binary size is gonna be crazy)

@JohnWatsonRooney 2 ай бұрын

Yes, I'm working on a video along those lines already, shouldnt be too much longer!

@mdsegara101 2 ай бұрын

Forgive my foolishness, does this means doing scraping manually in this case copying one by one the product code?

@EmanueleCannizzaro 2 ай бұрын

Thank you John. Can this approach be used within scrapy?

@JohnWatsonRooney 2 ай бұрын

Yes it can

@Cheenaah-tw8xx 2 ай бұрын

hey, can you make a latest video about scraping amazon because your old one is kinda old and not really meaning full, u inspired me to learn scraping :D

@JohnWatsonRooney 2 ай бұрын

I haven’t done much Amazon in a while sure I’ll take a look

@Cheenaah-tw8xx 2 ай бұрын

@@JohnWatsonRooney thank you

@Ice.wallowcam101 2 ай бұрын

Thank you so much!!

@rexsybimatrimawahyu3292 2 ай бұрын

So is it possible to use curl cffi with scrapy?

@JohnWatsonRooney 2 ай бұрын

Yes, it works well. I don’t think I have a video with it but I’ve definitely done it on my own projects

@rexsybimatrimawahyu3292 2 ай бұрын

@@JohnWatsonRooney i tried to look around but cant find a solution to it. I think its a good video idea of how to use other http like package to override scrapy own http package, especially how strong curl_cffi is for scraping, i already use curl_cffi before but i use it with bs4 to do the scraping

@mml135-25 2 ай бұрын

Backend API scraping is definitely a good approach but it doesn't work everywhere. Sites like Amazon of Linkedin can be scraped only from HTML.

@luisechevarria186 Ай бұрын

Is this because they are SSR? Like when you got to inspect them, there are really no requests to inspect?

@SigKappel 2 ай бұрын

yes please!

@RaffayDoesTech 2 ай бұрын

What do you think about all the guys using AI stuff for scraping?

@JohnWatsonRooney 2 ай бұрын

IMO AI can help a lot with parsing and organising the data but it won’t let much with actually extracting the data which is the hard part

@alexdin1565 2 ай бұрын

but if the website use somthing like nextjs its very hard because all content are server rendered

@kexec. 2 ай бұрын

in my experience, most websites expose their backend api

@kexec. 2 ай бұрын

true in modern day we can’t just use httpx or playwright without impersonate or fingerprint 😢

@hamed6899 2 ай бұрын

Please scrape with more advanced and difficult problems

@nageshnaik5343 2 ай бұрын

chrom120 to chrome120😄😄

@JohnWatsonRooney 2 ай бұрын

Haha thought no one would notice

@khazaddum6599 2 ай бұрын

am sorry, this video is way too superficial. Too many commands are thrown at the viewer without a proper explanation. Overall not a(n) insightful/helpful video other than "indeed, scraping is somewhat harder now, I guess". I have seen many nice videos of yours. This one missed the mark for me. Maybe make it clearer from the beginning what the purpose of this video is (general idea vs. teaching more in depth). Help me manage my expectations so I know whether or not it's right for me. Thank you!

@JohnWatsonRooney 2 ай бұрын

Fair enough, thanks for the feedback I always appreciate constructive criticism