This script I threw together saves me hours.

  Рет қаралды 21,370

John Watson Rooney

John Watson Rooney

Күн бұрын

Finding out the best way to scrape data from a site is time consuming, this script uses selenium wire to view the network requests from a site and give you back a list of urls and json responses.
Proxies: nodemaven.com/...
Patreon: / johnwatsonrooney (NEW free tier)
Scraper API www.scrapingbe...
Donations: www.paypal.com...
Hosting: Digital Ocean: m.do.co/c/c7c9...
Gear I use: www.amazon.co....

Пікірлер: 73
@liketheduck
@liketheduck 8 ай бұрын
Fantastic “apprentice” content. This assumes a basic understand but also pushes the novice forward. I really appreciate it!
@jessejames3169
@jessejames3169 Жыл бұрын
Love your thought process behind writing this! It makes it easy to follow why you do a certain step, and if it’s necessary for others! Great vids keep it up!
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Glad it was helpful!
@Extrey
@Extrey Жыл бұрын
I didn't even know that selenium can be used like this, thank you very much, great work as always))
@jagdish1o1
@jagdish1o1 Жыл бұрын
I used seleniumwire for create a scraping bot. It’s a very good package to grab the backend requests. What i did was using selenium i logged-in than grab the cookies and the backend api ;) than i simply closed the browser and used the python requests lib to make the request to make thing little bit faster. Eventually, i dockerized everything and than i have this container image which i than pushed on aws ecr and run parallel on aws ecs. Pretty amazing.
@datacleaningchallenge2029
@datacleaningchallenge2029 Жыл бұрын
impressive, what's your email, need to ask you a question as relate to your code
@DerekMurawsky
@DerekMurawsky 7 ай бұрын
This is really great, and a great foundation, too. I can see this being extended to support so many things, too.
@VenitaSamaniego-c7g
@VenitaSamaniego-c7g Ай бұрын
This tutorial just saved me hours of confusion!
@sandunwijethunga6787
@sandunwijethunga6787 Жыл бұрын
great video. thank you john❤
@pldvs
@pldvs Жыл бұрын
"Because. I. Don't. Care..." 😂😂
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
haha
@TimoTalksTech
@TimoTalksTech Жыл бұрын
Amazing, just something I was looking for. Need to look into more if I could fetch all the IPs too
@dubey_ji
@dubey_ji 2 ай бұрын
This is really good thank you so much for this tutorial
@kocahmet1
@kocahmet1 Жыл бұрын
golden content here
@kite759
@kite759 Жыл бұрын
that's very useful, thank you
@darylhunt9070
@darylhunt9070 Жыл бұрын
good video . Do you capture keys for api in Selium wire as well. As some api use session keys
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
you can grab any headers and cookies yeah
@zakariaboulouarde4591
@zakariaboulouarde4591 6 ай бұрын
Hello thank you for the amazing video. Wanna ask please how can I bypass 403 forbidden, for cloudflare when I am requesting an Api? Thank you for all your efforts 🙏🏽
@ivanowdenis
@ivanowdenis Жыл бұрын
Hello John, could you make a video how to scrape data which a server send trough a websocket connection in live mode?
@andyscott710
@andyscott710 2 ай бұрын
I know this is an old post but it seems it could solve a few issues for me. However, it's not working. Within the def main() is resps = show_response(driver, target_url) If I do print(f'resps is {resps}') I get [] and this is the same in the json file. If I do for url in urls: for kw in keywords: if kw in url["url"]: print(url) that works fine. What is going wrong?
@KishanParmar-x4u
@KishanParmar-x4u Жыл бұрын
are you using JetBrains Mono font? If yes, then how it looks so thin?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
it is yeah, I don't know I didn't do anything other than select that font sorry
@AleksT28
@AleksT28 Жыл бұрын
i was working with selenium / selenium-wire until i could not debug the issue while selenium-wire is not listening the right port where selenium is running while dockerised.
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
that's interesting, i haven't tried dockerising it but i will keep an eye open for issues
@iamshiva003
@iamshiva003 Жыл бұрын
What is the vscode theme and the font used in this video?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
github dark theme and jet brains mono!
@iamshiva003
@iamshiva003 Жыл бұрын
@@JohnWatsonRooney thank you
@satyajeetkumar3993
@satyajeetkumar3993 Жыл бұрын
Hi John!! I really appreciate this new content. I have a query to ask. I was using selenium webdriver in chrome to fetch data from a website. The script is working just fine but after certain iterations, the driver is not working properly or the way it should. I am getting a NoneType error. I tried clearing the cookie and starting a new session and then continue from where I left off but it is still not working. Any suggestions on this?? I really appreciate it!! Thanks!!
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
hard to say but when i get problems like this i always check to see what the direct output from loading the page is, you could be hitting a captcha
@satyajeetkumar3993
@satyajeetkumar3993 Жыл бұрын
Actually that new page is loading properly. I didn't check for terminal output but the page is loading. After that when I am looking for an element on the same page which I know is available there, I am getting an error.
@tizianonakamader8177
@tizianonakamader8177 Жыл бұрын
Amazing content thank you
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Very welcome
@AllifIzzuddin
@AllifIzzuddin Жыл бұрын
So this is kinda like playwright network events right?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Yes same thing but I found it better to use
@TheCulpritgamer
@TheCulpritgamer 8 ай бұрын
can you please share the script that you created for my future reference ??
@Garycarlyle
@Garycarlyle 2 ай бұрын
How did this work without importing 'requests'?
@StonedApe420
@StonedApe420 Жыл бұрын
Can it make complete copy of requests with url, headers and payload?
@mitvpankaj2454
@mitvpankaj2454 Жыл бұрын
Great work bro!! And I have one question also if I want scrape Walmart everytime robot or human pop-up comes so can you please guide me how to Bypass this type of bot detection system? Thanks and love your content because of you i learned python!! 👍
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Check out undetected chrome driver - there’s some good information for it that might help
@mitvpankaj2454
@mitvpankaj2454 Жыл бұрын
I tried bro but still it's showing the same issue if you have any reference or video can you please suggest me it'll be very helpful for me and other also :)
@AndyTutify
@AndyTutify Жыл бұрын
Are you no longer using neovim?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
I still use neovim, i decided to use VS Code for video demos as i thought it would include more people
@satwikawasthi2002
@satwikawasthi2002 Жыл бұрын
What if api only called when any user action occurs then?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
the next step to upgrade this would be to run the same but insert clicks on various page links first and check each one
@satwikawasthi2002
@satwikawasthi2002 Жыл бұрын
@@JohnWatsonRooney thanks for reply🙏 also most important thing post method api which accept custom keys in its headers or payload, will not give expected response, please make video of this thing for executing it.
@maloukemallouke9735
@maloukemallouke9735 Жыл бұрын
thank you, i am wondering if you wine money with this tools ????
@linuxkerem
@linuxkerem Жыл бұрын
Are you using arch linux sir ? And thanks for the content ! 🥰
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
thanks! its actually just ubuntu + i3
@linuxkerem
@linuxkerem Жыл бұрын
​@@JohnWatsonRooney Wow, I guess my mind went straight to arch when I saw a hyperland style window manager 😁
@user-tk5ir1hg7l
@user-tk5ir1hg7l Жыл бұрын
is this better than pupeteet network events?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
I have limited experience with pupeteer, i expect it to be the same - although I prefer seelnium-wire to playwright for network events
@user-tk5ir1hg7l
@user-tk5ir1hg7l Жыл бұрын
@@JohnWatsonRooney ok, how about playwright network events, does it have similar functionality or would you still recommend going with seleniumwire
@ХайлайтыДлиннойВоли
@ХайлайтыДлиннойВоли Жыл бұрын
Can I bypass hqq.tv devtool blocking using this?
@Niuroteya
@Niuroteya Жыл бұрын
I don't really get it.. I mean you can filter Network tab by link or a word "api" too if you want to. Plus this solution will not work for everything, but Network tab will. Other than filtering only needed requests this solution doesn't seem to do anything. And yeah, you can do a bit more advanced filtering here, but.. Does this really saving a lot of time for some kind of task? It's just hard to see how for me. Did I miss something? I'm making AJAX scripts dealing with forms for the past year+ and for me it would be absolutely useless.
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
I use it when I am given a URL and want to do some quick checks - saving any JSON output so I can search inside all from my terminal. I chose to semi automate something I was doing regularly is all.
@markbennett5626
@markbennett5626 Жыл бұрын
Maybe not for everyone but once scripted including user prompt for url, it'll be quicker than using network tab and much nicer response, plus can see adding the ability for the additional steps of recording session keys and further calls.. Thanks John
@AhmedThahir2002
@AhmedThahir2002 Жыл бұрын
Hi John! Love your work. Could you share the codes of your videos.
@markbennett5626
@markbennett5626 Жыл бұрын
Maybe John has the code available to Patreon members ;)
@AhmedThahir2002
@AhmedThahir2002 Жыл бұрын
@@markbennett5626Ohhhhh okay no issues hehe :)
@abdelrahmankhaled8239
@abdelrahmankhaled8239 7 ай бұрын
complete noob here just started web scraping for some reason the seleniumwire import is giving me this error import blinker._saferef ModuleNotFoundError: No module named 'blinker._saferef' I've been searching online for help for hours. changed python versions (currently using the same one you're using in the video) nothing seems to work. please help thank you in advance
@DudethatGross
@DudethatGross 7 ай бұрын
pip install blinker ?
@valoclips2896
@valoclips2896 Жыл бұрын
Nice idea. But I will still prefer to log the requests via Network tab or Burp suite. The chromedriver detection will also kick in for some sites.
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
fair enough, it does have some uses but also limitations as you say.
@twelfth4927
@twelfth4927 8 ай бұрын
Guys, I'm watching with passion but for what it would be helpful? What are web-scrapers actually doing?
@DudethatGross
@DudethatGross 7 ай бұрын
Gathering data that would otherwise be difficult to get without a proper API
@Septumsempra8818
@Septumsempra8818 Жыл бұрын
Anyone else update chrome on their pc and had all their scrapers break?😅
@bakasenpaidesu
@bakasenpaidesu Жыл бұрын
.
@spab87
@spab87 10 ай бұрын
Hi, thanks a lot, this was very helpfull to learn. I use contextlib.surpress, its actually faster than try/except and it looks better i think. Your function would look like this: import contextlib for request in driver.requests: with contextlib.suppress(Exception): data = decodesw( request.response.body, request.response.headers.get("Content-Encoding", "identity") ) resp = json.loads(data.decode("UTF-16")) resps.append(resp) return resps
@MasoomNini
@MasoomNini Жыл бұрын
Hi John, big fan. Thanks for toturials ❤ I need to contact you on any social media, i need one site scrape help kindly
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 200 М.
The Biggest Mistake Beginners Make When Web Scraping
10:21
John Watson Rooney
Рет қаралды 122 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
37:51
bayGUYS
Рет қаралды 1000 М.
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 101 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 27 МЛН
How to Bypass Cloudflare Protection using SeleniumBase
11:08
Michael Kitas
Рет қаралды 4,2 М.
Web Scraping Made Easy Using this Method.
9:41
John Watson Rooney
Рет қаралды 12 М.
Best Web Scraping Combo? Use These In Your Projects
20:13
John Watson Rooney
Рет қаралды 44 М.
The Biggest Issues I've Faced Web Scraping (and how to fix them)
15:03
Scrape LIVE scores - No BeautifulSoup or Selenium NEEDED!
15:44
John Watson Rooney
Рет қаралды 53 М.
Browsers are Essential now? Scraping Amazon in 2023
14:22
John Watson Rooney
Рет қаралды 15 М.
How to Scrape SofaScore for Free Football Data (Updated Method)
17:16
still the best way to scrape data.
41:01
John Watson Rooney
Рет қаралды 17 М.
Web Scraping + Reverse Engineering APIs
52:33
Syntax
Рет қаралды 7 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
37:51
bayGUYS
Рет қаралды 1000 М.