Web Scraping with AIOHTTP and Python

  Рет қаралды 25,016

John Watson Rooney

John Watson Rooney

Күн бұрын

Пікірлер: 94
@Rasstag
@Rasstag 3 жыл бұрын
John, Great tutorial... many thanks... now I know how to juggle... ;) Wanted to pass on an observation... apparently, Windows, can be cranky with asyncio/aiohttp. Your example program throws a “RuntimeError: Event loop is closed ” error. However, adding towards the bottom: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) … on top of pages = asyncio.run(main(urls)) … solves whatever 'Event loop' issues that were present.
@rtxmax8223
@rtxmax8223 3 жыл бұрын
yes i was working on the async request and I got the similar error, the only way it works for me is if my program keeps on running. Since I am doing a constant request response after one ends, i dont get this error. but when I run my program for only one page to scrap then I get this error.
@luxeave
@luxeave 2 жыл бұрын
@Rasstag, dude ur a life saver. well done!
@Valentin439
@Valentin439 Жыл бұрын
Thank man!
@efferington
@efferington 2 жыл бұрын
Hi John, thank you very much for this. Found this video while trying to figure out how to include an async AIOHTTP loop in some API processing script I'm writing and this was invaluable for figuring out how to structure the code.
@andres777video
@andres777video 2 жыл бұрын
No requests from me... just love your videos John! - Thanks for spending the time... I need this code to pull data for 188,600 items (each one is a web page...with 3 tables each) - UIPath would take about 32 days to complete. - asyncio + aiohttp should be much, much faster! Thanks for the tip Rasstag ( had the same issue in Windows)
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
thank you very much!
@a_private_handle
@a_private_handle Жыл бұрын
Great wallpaper from Firewatch
@yacinehechmi6012
@yacinehechmi6012 Жыл бұрын
Thank you sir!!
@symbolminded5167
@symbolminded5167 Жыл бұрын
Your video helped a lot, thank you
@nsnilesh604
@nsnilesh604 3 жыл бұрын
Awesome way to get list of urls 👌😎
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks!
@tubelessHuma
@tubelessHuma 3 жыл бұрын
Always trying new ways of scraping. Great 👍🌹
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks!
@fuad471
@fuad471 3 жыл бұрын
Thank you it was really helpful to grasp the asyncio concept
@rrahll
@rrahll 3 жыл бұрын
thanks for the tutorial!
@МаксимМельников-ж4у
@МаксимМельников-ж4у 2 жыл бұрын
Really good explanation ! thks a lot !
@АнтонНазарук-щ5с
@АнтонНазарук-щ5с 2 жыл бұрын
Thank you, John! Realy nice tutorial, helped alot.
@dobcs3236
@dobcs3236 Жыл бұрын
Thank you very much 🥰
@coala6019
@coala6019 Жыл бұрын
@John Watson Rooney Good tutorial! Thanks! But the lines 23 to 26 are synchrone no?
@DerekMurawsky
@DerekMurawsky 5 ай бұрын
Subscribed. What are your thoughts on going about it this way vs something like scrapy?
@JohnWatsonRooney
@JohnWatsonRooney 5 ай бұрын
Honestly I use scrapy almost all the time now, unless it’s more of a throwaway script
@phudinhtruong
@phudinhtruong 2 жыл бұрын
great video
@serge2033
@serge2033 2 жыл бұрын
cool. thanks!!
@DarkPsychologyIQ
@DarkPsychologyIQ 3 жыл бұрын
Hi John, great videos by the way! I was wondering how can I scrape a website for the ASIN's, product title, stock levels and price?
@GelsYT
@GelsYT 2 жыл бұрын
So basically what's happening here in the whole program is that when on the event loop -- all of the tasks while the requests is being made, while it's waitiing it passes the resources to the other task functions? so on and so fort up until we got the response? I'm sorry if it's not that clear
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Yeah so basically instead of the whole program waiting for a server response it gets on doing other things, ie more requests out. It means there is minimal time spent waiting and maximum time spend doing
@GelsYT
@GelsYT 2 жыл бұрын
@@JohnWatsonRooney THANK YOU! To anyone who watches his videos and does not subscribe! PLEASE DO IT NOW! Thank you man! also please smile more! I like your beard! wish I have that as well
@GelsYT
@GelsYT 2 жыл бұрын
HEEY MAAAN RECENTLY WATCH YOUR VIDS ON SCRAPY! YOU'RE SAVING MY LIFE AGAIN! THANKS1
@Lorant1984
@Lorant1984 2 жыл бұрын
Where is the link for the video with "session objects", please?
@nachoeigu
@nachoeigu 2 жыл бұрын
How could we use proxy dictionary with aiohttp?
@peterpann7778
@peterpann7778 2 жыл бұрын
Hi John, thanks alot for your wonderful vedios, I was wondering which is faster async or multithreading in webscraping?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Thanks! Async is faster
@rangabharath4253
@rangabharath4253 3 жыл бұрын
Awesome 👍😎
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks!
@JesusTorres-bt2eb
@JesusTorres-bt2eb 3 жыл бұрын
Thank you so much, I had learn a lot from your videos, I have a question, It is possible that there is a similar option for pages in cloudflare, I currently use cloudscraper, but it has bugs, do you recommend something?
@bighneswar98
@bighneswar98 2 жыл бұрын
The webpage that I'm trying to scrape using asyncio and aiohttp throws an error saying "return self._body.decode( # type: ignore[no-any-return,union-attr] UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 25051-25052: invalid continuation byte" Was working fine in requests and html. What's the solution?
@nachoeigu
@nachoeigu 2 жыл бұрын
If you shoud define the best module for web scraping in terms of efficience and robust, what would be? I know selenium, requests, HTMLSessions, aiohttp, AsyncHTMLSession, scrapy, among others. What do you recommend to focus in specifically for its completeness. Thank you for your content.
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Hi, requests-html is my go to for most smaller projects. Otherwise scrapy
@anto3617
@anto3617 3 жыл бұрын
Can you explain please how to scrap products price from webstore and send telegram alert when price drop? Thanks for you're video
@nachoeigu
@nachoeigu 2 жыл бұрын
Wow, it is an amazing content. Thank you. I would like to know more about asyncio in Python. How did you learn it? Do you recommend some lecture or article?
@Suresh_Solution
@Suresh_Solution 3 жыл бұрын
Bro how to scrape sciencedirect website
@7488nishant
@7488nishant 3 жыл бұрын
Your videos are very informative... Bro....can you make video on web scraping where cookies expires after 30 mins...example website like NSE etc
@rotatingmind
@rotatingmind Жыл бұрын
Cool tutorial. Just one question: what could we do, if we want to add new urls to the task list from the parsed results?
@KirillBezzubkine
@KirillBezzubkine 2 жыл бұрын
I wouldbreally appreciate an answer: i have to make about 10000 api requests but the api supports only about 50 calls at a time so i need to split the whole 10k range into chunks. How do i make async requests in batches.?
@mohfatkurrozi4069
@mohfatkurrozi4069 3 жыл бұрын
Awesome... :-)
@alexsorrow6133
@alexsorrow6133 3 жыл бұрын
Where I can copy this code 🤷🏻‍♂️
@sadam32
@sadam32 2 жыл бұрын
Hi John, great tutorial again. What theme for VS Code is this? Best regards
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
thanks- i believe that is GruvBox Material. i change a lot!
@hamidrezarahimi6651
@hamidrezarahimi6651 3 жыл бұрын
Link to the quickstart page : docs.aiohttp.org/en/stable/client_quickstart.html
@powergaming-tu6wj
@powergaming-tu6wj 2 жыл бұрын
ok so i have a question for you i have over 6000+ links and all i want to do is read the url get a one thing off the page and add it to the corelating link in google sheets any tips
@powergaming-tu6wj
@powergaming-tu6wj 2 жыл бұрын
aka most of the items from the market on a game called runescape
@terrascape
@terrascape 2 жыл бұрын
hi john , it would be great if you could reply, how do i go about returning the url for each iteration of the event loop?
@mlgen7
@mlgen7 3 жыл бұрын
Thanks for these videos. I am starting my journey on web scraping because I want to try understanding my spend on Deliveroo. (a food delivery app - like uber eats). Some questions and cheekily a tutorial, if possible, would be to learn how to manage social login (I used to google login for Deliveroo) and then be able to go down a level to a page and gather the order history.
@akalamian
@akalamian 3 жыл бұрын
Nice one, the third lesson learned here! Can you show some database lessons like we scraped 2 sites of shopify which have the same products, then do some price matching work to find the lowest price, thanks
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Good stuff! Yes I am planning on some more database related things to come
@akalamian
@akalamian 3 жыл бұрын
@@JohnWatsonRooney absolutely great guide, just scrape 36000 product using 300s, just like a movie:)
@artabra1019
@artabra1019 3 жыл бұрын
thanks .
@kimkelleher6255
@kimkelleher6255 3 жыл бұрын
Hey John!
@kimkelleher6255
@kimkelleher6255 3 жыл бұрын
Also, my next attempt will be grabbing one image from each of the 120k products. Is there a video for that? :)
@spicer41282
@spicer41282 3 жыл бұрын
Nice! More tools in the chest! Thanks John. Any chance you can demo using an if-then-else scraping example? For example: If a set price or a range in price is found. Then scrape that object. Else, move on to the next page? Thanks for considering this request. Or, if you've done something similar in your previous Vids? Can you mention which ones? I've might've missed it... Thanks again!
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Sure sounds like a good idea I’ll have a look and see what I can come up with!
@KhalilYasser
@KhalilYasser 3 жыл бұрын
Thank you very much. I got this error `Traceback (most recent call last): File "aiohttp.py", line 34, in results = asyncio.run(main(urls)) File "C:\Users\Future\AppData\Local\Programs\Python\Python37\lib\asyncio unners.py", line 43, in run return loop.run_until_complete(main) File "C:\Users\Future\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 584, in run_until_complete return future.result() File "aiohttp.py", line 18, in main async with aiohttp.ClientSession() as session: AttributeError: module 'aiohttp' has no attribute 'ClientSession'`. Any ideas?
@KhalilYasser
@KhalilYasser 3 жыл бұрын
The problem occurred because the file name was aiohttp.py (It seems that causes conflict with the aiohttp module). I have put the solution so as to be a reference for others that may this problem happen with.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Great glad you worked it out!
@yuvalbra
@yuvalbra 3 жыл бұрын
can you publish the source code?
@mohfatkurrozi4069
@mohfatkurrozi4069 3 жыл бұрын
How to fix error 403 Forbidden sir?
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Try different headers - user agents and cookies to see if that works
@mohfatkurrozi4069
@mohfatkurrozi4069 3 жыл бұрын
@@JohnWatsonRooney yes sir, but i very confuse to fix it, can you some day or maybe later in the next videos. Made a easy to solve my problem forbidden 403.. Before for that.. Thank you sir.. Your replay my comment
@karthik-ex4dm
@karthik-ex4dm 3 жыл бұрын
One question...Since this is async, Will it wait for javascript to load fully before fetching the data from a page? Ex: If I put a brand filter in a ecommerce site which updates the page without loading it again, Will Aiohttp be able to fetch the updated the data ??
@manikandanmanickam9433
@manikandanmanickam9433 3 жыл бұрын
Can you show how to scrap form the delta airlines delta.com
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Sure I can have a look
@manikandanmanickam9433
@manikandanmanickam9433 3 жыл бұрын
@@JohnWatsonRooney thanks a lot John
@manikandanmanickam9433
@manikandanmanickam9433 3 жыл бұрын
Hai John, can you do example for scrap the Google search results with 15 results.(heading, url, content)
@artabra1019
@artabra1019 3 жыл бұрын
grequests is not working ?? i try treturns me none.
@artabra1019
@artabra1019 3 жыл бұрын
wers the source code
@thekarthik
@thekarthik 3 жыл бұрын
You're the best
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thank you!
@MEZHGANO
@MEZHGANO Жыл бұрын
This library is great. But how can we pass rich cookies with "path, expires, secure... etc", not just simple "key:value"? I found that documentation is extremely poor in terms of cookies.
@celerystalk390
@celerystalk390 3 жыл бұрын
Your videos and topics just keep getting better. Great job!
@jithin.johnson
@jithin.johnson 2 жыл бұрын
Great tutorial! How do we get around with the IP bans? Bombing the sever with async requests often gets me banned.
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Thanks! Yes you will need to use some good proxies to avoid the bans. Or I find async is great for getting data from multiple different sites at the same time
@OprichnikStyle
@OprichnikStyle 2 жыл бұрын
why did you put the await in return await r.text? I had to take it out in roder for my code to work
@justindi6061
@justindi6061 3 жыл бұрын
I've learned a lot about scraping through your videos! What is the chances of getting blocked if I host my script on a server and scrape a website multiple times a day? Should I be implementing proxy rotation or can I get away without it?
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks Justin! It depends on how often is multiple times and how many requests you are making. I’d say try it first and see if you start to run into issues
@Rani-wm1qq
@Rani-wm1qq 3 жыл бұрын
Ive been trying to use Selenium to Parse the Name and Price of some coins in a crypto exchange i use very often. But i keep getting this error. driver.find_element_by_xpath("//*[@id="root"]/div/div/div/div[3]/div[2]/div/form/input").click() What do i do
@Антмара
@Антмара Жыл бұрын
Hi John. Thank you for your video. Just one question: Is it possible to make def Pars to be asynсhronous too? I need two functions: first gets links for product card pages from 30 pages of site, and second function uses that links to get products price and other data from every card page. I have made first function asynсhronous and it works really quickly. I want second function o be asynсhronous too because the number of cards is more than 1000. But I still not know, how to gather these two functions together, how to create tasks for them both. Could you please give me your advise how to make it works
@nahomtsegaye
@nahomtsegaye 3 жыл бұрын
you are a wizard my friend
@tnssajivasudevan1601
@tnssajivasudevan1601 3 жыл бұрын
Great tutorial..Sir
@galichandreyschool
@galichandreyschool 2 жыл бұрын
Very good! :) Thank you!
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Thanks glad you enjoyed it
@DM-py7pj
@DM-py7pj 3 жыл бұрын
Will you do a trio video and cover nurseries please?
@mayurahir9340
@mayurahir9340 3 жыл бұрын
Hello i am doing sum website scraping with one websit but i gating the one problem that website have an button function but when i click it it will give me a one name but wan i cloae the new tabe and reopen the page the name will button givea is hang every time than i can i scrape that Advance thanks and please help me 🙏
Python Asyncio, Requests, Aiohttp | Make faster API Calls
17:56
Patrick Collins
Рет қаралды 132 М.
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 74 М.
Как мы играем в игры 😂
00:20
МЯТНАЯ ФАНТА
Рет қаралды 3,1 МЛН
WORLD BEST MAGIC SECRETS
00:50
MasomkaMagic
Рет қаралды 52 МЛН
Players vs Corner Flags 🤯
00:28
LE FOOT EN VIDÉO
Рет қаралды 64 МЛН
Create Your Own Scraper API with FastAPI and Python
14:43
John Watson Rooney
Рет қаралды 19 М.
Requests vs HTTPX vs Aiohttp
15:11
ArjanCodes
Рет қаралды 37 М.
Working With APIs in Python - Pagination and Data Extraction
22:36
John Watson Rooney
Рет қаралды 103 М.
How to Rotate Proxies with Python
13:05
John Watson Rooney
Рет қаралды 121 М.
Coding a Web Server in 25 Lines - Computerphile
17:49
Computerphile
Рет қаралды 337 М.
Supercharge Your Scraper With ASYNC (here's how)
14:03
John Watson Rooney
Рет қаралды 11 М.
Web Scraping with Python and BeautifulSoup is THIS easy!
15:51
Thomas Janssen | Tom's Tech Academy
Рет қаралды 34 М.
Always Check for the Hidden API when Web Scraping
11:50
John Watson Rooney
Рет қаралды 631 М.
Intro to async Python | Writing a Web Crawler
14:23
mCoding
Рет қаралды 78 М.
Как мы играем в игры 😂
00:20
МЯТНАЯ ФАНТА
Рет қаралды 3,1 МЛН