The Biggest Issues I've Faced Web Scraping (and how to fix them)

  Рет қаралды 64,479

ForrestKnight

ForrestKnight

Күн бұрын

Пікірлер: 104
@yafethtb
@yafethtb 8 ай бұрын
Yeah. Scraping a dynamic website really makes me want to scream like Linus Torvalds to NVIDIA. And I also hate CloudFlare 😂
@gamecast4432
@gamecast4432 Ай бұрын
You can start a new browser or new context for every "goto()" with a different user-agent, that's how i do with CloudFare
@PaoloAnzani_1
@PaoloAnzani_1 6 ай бұрын
In my opinion as i developed multiple web scraping application, half of the time is not spent coding but instead trying to reverse engineer the web application. Simple ones are just matter of looking at requests from dev tools and manually make api calls, while most complicated ones involve backtracing how content is loaded on the page to find the js code responsable to do that. Basically its 70% reverse engineering and 30% coding, if you do things the smart way.
@pranitmane
@pranitmane 5 ай бұрын
Yep!
@mateusb09
@mateusb09 3 ай бұрын
What's the benefit of manually doing API calls instead of just letting selenium click the buttons which will do the exact same thing?
@kaj1543
@kaj1543 3 ай бұрын
​@@mateusb09selenium has overhead
@Anthony-qg5hj
@Anthony-qg5hj 3 ай бұрын
​@@mateusb09 because it's faster, less code, lower cost, easier to maintain
@mateusb09
@mateusb09 3 ай бұрын
@@Anthony-qg5hj I had a selenium project in which I tried the approach you’re talking about. Not only needed to attach the login cookies (which expire) to the request anyway but also I needed to manually construct the request skeleton. So in the end I had a similar effort as I would have if I just force selenium to click buttons
@delsix1222
@delsix1222 8 ай бұрын
interesting timing to see this video, literally the day after I completed my first full-stack application which literally revolves around web-scraping :D
@flipygmd
@flipygmd 8 ай бұрын
You're the next Mark Zuckerberg
@Noumaan_Ahamed
@Noumaan_Ahamed 8 ай бұрын
How do you web scrape secure website?
@IshaqKhan010
@IshaqKhan010 5 ай бұрын
share website url
@delsix1222
@delsix1222 5 ай бұрын
@@IshaqKhan010 cant share url in yt comments, gets autofiltered
@pablom8854
@pablom8854 3 ай бұрын
And I'm starting a web scraping project
@rikawrites7104
@rikawrites7104 12 күн бұрын
i started learning about web scraping YESTERDAY, and stumbled upon your video today. GODDAMN the way you explain stuff and speak really stuck with me! thank you for providing such value and motivating me to improve my communication skills as well :D
@Dalamain
@Dalamain 8 ай бұрын
I used to web scrape all the time, but stupid js frameworks obsfucated css class names has made it very difficutlt.
@gamecast4432
@gamecast4432 Ай бұрын
I use the "[data-something="foo"], luckly most of the sites i need to scrape make use of this attr
@JefCollier
@JefCollier 3 ай бұрын
I saw this video recommended to me about two days after I had to scrape a ton of images and convert them to a PDF. The images are loaded dynamically and I will confess with shame that my script would scroll slowly down the entire page until it couldn't get any further. Then it would queue up all the appropriate image files and compile them into a local directory before turning them into a single PDF file.
@xlafxx
@xlafxx 8 ай бұрын
I remember starting to watch your videos when I was entering computer science Ba, and as a 28 year old 1 semester left to graduate, you’re still uploading good content that’s unique. Never get tired of your vids , keep it up brother . I’m also concerned with the job market , can you make a vid about new grad Cs students ? For example seems almost every job wants front end or something and my school never taught any of it
@mrrobot-mn6re
@mrrobot-mn6re 8 ай бұрын
You want to get a job from what your school taught you? You are in for a ride brother. Tech is about your own research and self learning, every fucking day.I pity people that majored in CS because they heard about a programmer earning 6figs
@Hshjshshjsj72727
@Hshjshshjsj72727 6 ай бұрын
Unless u went to ivy league and wanna be a quant then u gotta do front end js react sql are key for majority. School is duhm unless ivybleague except for piece of paper
@danielabraham3022
@danielabraham3022 8 ай бұрын
To be honest, i subscribed because the button lit up. Also, I love your content.
@redbill5197
@redbill5197 8 ай бұрын
Thank you for the amazing video! Much appreciated as a young web developer. By the way, none of the buttons lit up or did any animations... I am a subscriber, so I don't know if that's why. Peace!!!
@beaconxy
@beaconxy 7 ай бұрын
It actually didn't.
@xdcountry
@xdcountry 8 ай бұрын
This guy gets it-I’ve been there. I can’t wait to make this all an easy ass python plugin
@v1d300
@v1d300 8 ай бұрын
I am working on building a project that heavily requires scraping so I been doing a lot of research. And its really hard to find anything good that is not sponsored by brightdata. I get it, their marketing team has done a great job with tapping a perfect niche of creators who provide valuable information but this also creates a problem to ending up finding that almost each good resource is related to using brightdata and its not something I want to pay for when starting a hobby project. Anyway, this is a great video either way. I learned a lot of things I hadn't considered in my planning. Like the ETL(thats a new rabbit hole I need to dive into) or adaptive content extraction to account of layout changes. I was just assuming I will set up reporting to notify me when I start getting no content and then I will fix it. So thank you for that. Do you setup redis or something to make sure some requests are accessed from the cache of recently requested data than scraping again or accessing the db? is that necessary? And at what point should a webhook be setup and for what purpose exactly? Thank you
@V4rrow
@V4rrow 8 ай бұрын
dude is literally gilfoyle from silicon valley(love your vids)
@theparten
@theparten 8 ай бұрын
i wasn't looking for web scraping video but his face drew my attention, i was like wait this is Gilfoyle right😂❤...
@FFl1s
@FFl1s 8 ай бұрын
Fr
@EduardoEscarez
@EduardoEscarez 8 ай бұрын
AFAIK the button highlighting is a feature based on video subtitles, including those generated automatically, but still somewhat random. I didn't catch those because I was already subscribed and like the video a moment before you said it.
@v1d300
@v1d300 8 ай бұрын
I don't think its a video subtitles feature. It just happens randomly in my experience. The thumb up button shakes and subscribe highlights. Didn't happen for me on this video though :(
@Smallbusiness0007
@Smallbusiness0007 8 ай бұрын
The JD bottle in the background 😉
@obiwanfisher537
@obiwanfisher537 Ай бұрын
The cigars on the shelf ;)
@robinbreed2439
@robinbreed2439 Ай бұрын
Great video and really nice energy, and I think you answered my question by using scrape browser to render javascipt headlessly. Thank you
@olhodetamarutaca
@olhodetamarutaca 5 ай бұрын
I really like the way you explain things and also the pronunciation issues
@LM-ty8xg
@LM-ty8xg Ай бұрын
Amazing content, Brother, please make a video explaining how to scrape dybamically loading powerBI tables on a website. There is simply no change in the html/css structure when you engage😅
@doublesushi5990
@doublesushi5990 8 ай бұрын
such a chill vid
@nrgstudios612
@nrgstudios612 3 ай бұрын
The subscribe button didn't light up because I was already subscribed 👍
@tomasemilio
@tomasemilio 8 ай бұрын
Boom. Thanks
@ramelox
@ramelox 8 ай бұрын
When I see brightdata sponsorship, I instantly stop watching. Paying to brightdata is not a webscraping skill.
@zeddscarlxrd4331
@zeddscarlxrd4331 8 ай бұрын
Did u know how to bypass cloudflare or captcha without bright data?
@ZacMagee
@ZacMagee 8 ай бұрын
Some people 😂 That's like saying. "Oh well, these stupid people who drive cars, why would they do that when we still have horses?"
@vasyavasin7364
@vasyavasin7364 7 ай бұрын
​@@ZacMagee why should I pay it if I can do it free?😂
@vasyavasin7364
@vasyavasin7364 7 ай бұрын
​@@zeddscarlxrd4331 How to bypass cloudflare you can find easy.
@Ohiostategenerationx
@Ohiostategenerationx 7 ай бұрын
​@@vasyavasin7364do you still not need to scrap a bunch of proxies to use?
@olasunkanmioyetunji9254
@olasunkanmioyetunji9254 7 ай бұрын
Can you recommend a course to learn web scraping. A course that taught the tool and techniques you mentioned and other concepts
@ravimahto3606
@ravimahto3606 26 күн бұрын
i am searching for it too, beginner in webscraping
@manumartinezkcxu
@manumartinezkcxu 5 ай бұрын
what are the best ai scraping apps : suggestion/recommendations? Just looking for how our nonprofit organization is aligned with other organizations within a county of california in order to partner with them
@brianmorin5547
@brianmorin5547 7 ай бұрын
Is there a reason/advantage to using Bright Data's "scraping browser" product instead of integrating their proxy and IP rotation services into a script I'm running on my own server?
@phethindabamkhwanazi3546
@phethindabamkhwanazi3546 8 ай бұрын
Hey, man do you have another channel where you teach live?????
@phethindabamkhwanazi3546
@phethindabamkhwanazi3546 8 ай бұрын
If you have provide the link, please so I start learning more.
@johnknox4293
@johnknox4293 8 ай бұрын
interesting....thanks man
@dmytro-skh
@dmytro-skh 7 ай бұрын
this video is what I need. But whoaa so fast changes of screens with code... I'm too old at 35 to be able to push the pause button so fast 😅 Do you have some links with those hacks?
@Cryogenics12
@Cryogenics12 8 ай бұрын
Hi Forrest. I was wondering how you still feel about AI and the future of software engineering. With chat GPT out for over a year now, have your views changed much? Maybe a good topic for another vid.
@VishalJangid1
@VishalJangid1 8 ай бұрын
hopefully brightdata ain't a snitch 🫠
@storymode9085
@storymode9085 8 ай бұрын
wow... i got a long way to go
@realshiiiiiit8349
@realshiiiiiit8349 8 ай бұрын
Damn this guy is cool
@javancheongyujing2531
@javancheongyujing2531 8 ай бұрын
Is web scraping under data science or software engineering structure?
@dedswift
@dedswift 2 ай бұрын
Depends on the purpose of the data you’re scraping and how it’s used, but it can be both.
@consolemodding1015
@consolemodding1015 3 ай бұрын
The funny thing is when they block the ranges used by bright data xD
@sakibullah3577
@sakibullah3577 Ай бұрын
can anyone help me? I can't seem to bypass cloudflare loading page with heedless brightdata webscraper
@JoaquimDornelles95
@JoaquimDornelles95 8 ай бұрын
My fucking hero
@einekleineente1
@einekleineente1 8 ай бұрын
are there vids of that ???
@carsonjamesiv2512
@carsonjamesiv2512 8 ай бұрын
GOOD VIDEO🎉👍
@botobeni
@botobeni 6 ай бұрын
12:30 nuh uh 🗿🗿
@juan7114
@juan7114 4 ай бұрын
I hate 502 error, I don't know how to solve it
@paulshorey7528
@paulshorey7528 3 ай бұрын
I like your mustache
@oeerturk
@oeerturk Ай бұрын
u said u prepared the video without the need of brightdata but for every issue except data storage u propose using brightdata for the most important&challenging parts....................? :/
@OnlyUseMeEquip
@OnlyUseMeEquip 5 ай бұрын
if you are using selenium,puppeteer, or any other browser automation, you will never be a good web scraper, they are just too damn slow, if you are relying on them to get you passed the WAF javascript function and generate your cookies for you to then go scrape others will beat you to the punch with pure code
@consolemodding1015
@consolemodding1015 3 ай бұрын
Define slow?
@OnlyUseMeEquip
@OnlyUseMeEquip 3 ай бұрын
@@consolemodding1015 if you have to login repeatedly and solve captcha's, that delay is almost negated , pure code bots just generate new valid cookies, once you hit your 403 forbidden or 401 captcha new tokens are loaded and carry on, not to mention threads instead of instances, , reversing the WAF JS function is the key. a good pure code bot vs a good browser bot is likely to be around 100x more efficient
@mianashhad9802
@mianashhad9802 3 ай бұрын
How can you scrape dynamic content without these tools? Anything else besides trying to find the API endpoint? I am a beginner who knows how to scrape simple pages. I want to learn how to scrape dynamic content. Would love to know your thoughts.
@heritage1834
@heritage1834 2 ай бұрын
​@@mianashhad9802A method that works is to clone the api calls that get the data from the backend server. You can find it in the network tab (fetch) in your browser's developer tools tab
@gdolphy
@gdolphy Ай бұрын
​@mianashhad9802 : if attribute data changes, target the tag. If tag changes, target the Ajax calls.
@GEMSofGOD_com
@GEMSofGOD_com Ай бұрын
Thank you Jesus
@justcode_99
@justcode_99 8 ай бұрын
Your mustache looks like a hedgehog 😂
@YouStillNeedToSleep
@YouStillNeedToSleep 7 ай бұрын
Examples. Are you a Leo? he he
@francishubertovasquez2139
@francishubertovasquez2139 8 ай бұрын
Speaking of Females, if Hitler's fuhrer have Magog carrier of motorized machine monsters then the Northern Magog have ice snow predominant in their place near Arctic circle, and ice surface can better conduct gases and science elements and compounds interaction which can attract those science things from everywhere, who between them is stronger except for the Super Magog Dark Matter? Will they suffice at full force during the final battle end times?
@abe_is_live
@abe_is_live 8 ай бұрын
stop web scraping
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 180 М.
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 47 МЛН
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 130 МЛН
5 Signs of an Inexperienced Self-Taught Developer (and how to fix)
8:40
Scraping Zillow in Python. hidden api revealed!
50:13
CodeMate TV
Рет қаралды 1,3 М.
How to DESTROY Developer Productivity (avoid at all costs)
13:04
ForrestKnight
Рет қаралды 22 М.
8 Data Structures Every Programmer Should Know
17:09
ForrestKnight
Рет қаралды 172 М.
Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING
45:36
Tech With Tim
Рет қаралды 176 М.
Stop Using Selenium or Playwright for Web Scraping
10:46
John Watson Rooney
Рет қаралды 10 М.
How Data Structures & Algorithms are Actually Used
11:39
ForrestKnight
Рет қаралды 207 М.
This is how I scrape 99% websites via LLM
22:44
AI Jason
Рет қаралды 99 М.