Stop Using Selenium or Playwright for Web Scraping

  Рет қаралды 10,624

John Watson Rooney

John Watson Rooney

Күн бұрын

Пікірлер: 51
@andydataguy
@andydataguy 23 күн бұрын
Your timing could not have been more perfect on this video brother. Thank you!! 🙌🏾💜 Idk if you have a community or program, but as soon as i get hired im buying it to help support you. As a selftaught dev (formerly digital marketer) your videos were vital in being able to finally wrap my head around data extraction
@hydrosis-v3k
@hydrosis-v3k 25 күн бұрын
holy shit, literally the first proper guide to a decent alternative to web automation.
@SigKappel
@SigKappel 25 күн бұрын
Thanks John for this update. I have some mission critical scrapes for inventory and this will come in handy when I need to be more stealth.
@zvnman
@zvnman 23 күн бұрын
Thanks for the kind advice! In gratitude to all my clients, I give your referral link)) Keep going bro!!!
@EnglishRain
@EnglishRain 19 күн бұрын
Thank you i didn't know Playwright isn't really recommended for scraping
@frankcasanova2132
@frankcasanova2132 25 күн бұрын
JUST WHAT I NEEDED WHEN I NEEDED IT
@realitywords-17398
@realitywords-17398 18 күн бұрын
Great Man.......... You are struggling a lot...... Keep your morale high!
@gamehubler
@gamehubler 18 күн бұрын
I learn a lot from your video explanations and like your style of sharing it with us, and comes with the fresh air because it is up to date. Can you maybe bring some more complex web scraping examples? Mostly the pages in your showcase were viewable directly or with a login. Would it be possible to bring some examples in the future with GQL sites, it is more difficult to solve these types.
@mvace
@mvace 24 күн бұрын
Thanks John, I was struggling to login to one website for a few days now. It was detecting my automation script. I used selenium-driverless and it works great now. So perfect timing with this video! Do you have any tips on how to handle uploads with selenium-driverless? After I log in to the website I need to upload a file from my local machine but the upload seems to be working differently for selenium-driverless compared to selenium. There is not much in the documentation regarding uploads in selenium-driverless.
@kinuthiamatata6040
@kinuthiamatata6040 25 күн бұрын
working with a website that uses WebSocket connections instead of traditional XHR/fetch requests. What's the best way to intercept traffic for scraping?
@JohnWatsonRooney
@JohnWatsonRooney 25 күн бұрын
honestly I haven't got a lot of experience with websockets, but i know you can connect to them via requests/httpx but i dont have any practical exp sorry
@munchcup
@munchcup 25 күн бұрын
Search on KZbin scraping Ohio state student emails but It's in js which same works for python.
@mehdikaraouet7980
@mehdikaraouet7980 25 күн бұрын
​@JohnWatsonRooney all the respect for honesty and humbleness ❤
@eyayawb
@eyayawb 25 күн бұрын
Reverse engineer the data transfer process. I successfully reverse-engineered an app that pulls data from a Firebase database using a WebSocket connection. I used websockets package to replicate the communication pattern.
@abdelkoddouslaarif1295
@abdelkoddouslaarif1295 23 күн бұрын
okay! this is all good and well, impressive even, but I wanna ask a slightly different question, how do you stay updated with the new alternatives that come out? you always seem to know when something better is out or if something old isn't as reliable anymore or not being maintained... what tech news forums/ community chats do you follow?
@EmanueleCannizzaro
@EmanueleCannizzaro 25 күн бұрын
As usual a great content explained in plain English.
@JohnWatsonRooney
@JohnWatsonRooney 25 күн бұрын
Many thanks!
@utsavgoswami5263
@utsavgoswami5263 14 күн бұрын
thank you so much for this. if it is okay, are there any better alternatives (i.e. cheaper) to proxyscrape?
@dragon3602010
@dragon3602010 24 күн бұрын
so when choosing "nodriver" instead of "SeleniumBase" ? thanks
@JohnWatsonRooney
@JohnWatsonRooney 24 күн бұрын
Try both and see which works best for what you are trying to do
@return_1101
@return_1101 23 күн бұрын
The best one! Thank you very much, mr. Rooney!
@graczew
@graczew 25 күн бұрын
Thanks, mate. This is really helpful.
@JohnWatsonRooney
@JohnWatsonRooney 25 күн бұрын
thanks mate appreciate it
@Analyse_US
@Analyse_US 25 күн бұрын
This is gold! Thanks.
@yafethtb
@yafethtb 23 күн бұрын
At last, a web scraping library that does not need another Chromium and uses only my Chrome! I'm waiting for something like nodriver.
@Optimusjf
@Optimusjf 6 күн бұрын
How can I send a PFX client certificate in Selenium-driverless?
@nickwoodward819
@nickwoodward819 25 күн бұрын
i'm a bit confused by "using the chrome already installed on your machine" - do you mean the server? or is this a python thing? (I'm using js and would be scraping either in a serverless function or in node)
@kinuthiamatata6040
@kinuthiamatata6040 25 күн бұрын
i agree this might not be easily "automateable" , "You may consider Playwright's connectOverCDP() with a containerized Chrome. Just make sure to expose Chrome's debug port via socat (socat TCP-LISTEN:9222,fork TCP:localhost:9222) and connect to that. This works great with Node/serverless and gives you full browser auto-capabilities."
@JohnWatsonRooney
@JohnWatsonRooney 25 күн бұрын
Sorry to clarify - when you install selenium or playwright,m they install their own version of chrome. these two use the existing install on your pc already (if you have it) so you install chrome as you would normally, no extra install steps etc
@cstanleyhns
@cstanleyhns 8 күн бұрын
Hi, is it possible to run this kind of thing in docker and then ultimately in a lambda?
@hamzahalli3500
@hamzahalli3500 25 күн бұрын
Thank you
@rewazilol
@rewazilol 24 күн бұрын
Do you have experience using Playwright and there being issues? On one hand I know its a test lib, but on the other hand its extremely well funded and well maintained. The nodriver lib looks like it was built by one guy. I'm still kind of not sure which way to go
@JohnWatsonRooney
@JohnWatsonRooney 24 күн бұрын
Playwright it an amazing library, it’s easy to use and works extremely well. But for web scraping you have to monkey around with it as even the most basic WAF can detect it and throw a captcha. Nodriver is build by one guy but it’s open source and removes a lot of the obvious flags that the browser is being controller automatically which is why I’m talking about it here
@wisjnujudho3152
@wisjnujudho3152 15 күн бұрын
wow. i think i need to modify all of my scrapers.
@saadkhan883
@saadkhan883 20 күн бұрын
Sir I am using playwright library to scrap Google maps so please tell is it a good choice to scrap data from GMpaps , because I am still thinking about blocking or recaptcha ...
@hw5622
@hw5622 20 күн бұрын
great video! thx!!!!!!!!
@amrogendiah198
@amrogendiah198 22 күн бұрын
Could you make a video about scraping using AI with python ?
@green-forest-23
@green-forest-23 25 күн бұрын
There's something like this for typescript?
@JohnWatsonRooney
@JohnWatsonRooney 25 күн бұрын
Yes but I forget what it’s called - hopefully Someone else will know
@boniazdaniel
@boniazdaniel 24 күн бұрын
Puppeteer extra plugin? The stealth one
@serkhetreo2489
@serkhetreo2489 23 күн бұрын
Hi, what if i want to build an unofficial api for a site . Is there a better way
@phantazzor
@phantazzor 24 күн бұрын
how to scrape an app if there are no web version
@DavidChavez-z2v
@DavidChavez-z2v 18 күн бұрын
Como sabes estas cosas excelente
@MohanadSaid-u8x
@MohanadSaid-u8x 25 күн бұрын
Liiiiiit 🔥🔥🔥🔥
@luisechevarria186
@luisechevarria186 23 күн бұрын
Where does one begin for web-scraping? I am trying to scrape, JS heavy websites, all I see are recommendations to use Selenium/Playwright, this video is not too clear about what to use instead.
@JohnWatsonRooney
@JohnWatsonRooney 22 күн бұрын
Try using selenium-driverless. Similar to selenium but less detectable causing you less issues scraping
@Canda-fh4xc
@Canda-fh4xc 22 күн бұрын
Thank you
A Basic Guide to Understanding Components in Angular
8:40
Itamar Tati (Luna Coding School)
Рет қаралды 24
This simple change unlocks sites for you
17:37
John Watson Rooney
Рет қаралды 4,9 М.
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,8 МЛН
бабл ти гель для душа // Eva mash
01:00
EVA mash
Рет қаралды 10 МЛН
Playwright vs Selenium: Which One to Choose
7:08
Oxylabs
Рет қаралды 2,4 М.
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 76 М.
My favorite browser is (kind of) dead
28:18
Theo - t3․gg
Рет қаралды 157 М.
Ollama on Kubernetes: ChatGPT for free!
18:29
Mathis Van Eetvelde
Рет қаралды 5 М.
The most important Python script I ever wrote
19:58
John Watson Rooney
Рет қаралды 208 М.
This is how I scrape 99% websites via LLM
22:44
AI Jason
Рет қаралды 99 М.
This script I threw together saves me hours.
13:38
John Watson Rooney
Рет қаралды 21 М.
The Biggest Issues I've Faced Web Scraping (and how to fix them)
15:03
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,8 МЛН