Don't think I ever did this so it's well over due... You helped me get a job as a software engineer. I used things I learned from your vids to make a project that was instrumental in getting a job offer. Thank you so much, you changed the financial trajectory of my whole family! (for others looking for the same, a major contributor in standing out is having an AWS cert)
@JohnWatsonRooney Жыл бұрын
thank you that's amazing, the reason I do this is to help people and its great to hear! congratulations on your job!
@IwoGda Жыл бұрын
What AWS cert is the best?
@sdriding Жыл бұрын
@@IwoGdaprobably developer associate
@damilolaowolabi91484 ай бұрын
what projects did you build?
@sdriding4 ай бұрын
@@damilolaowolabi9148 if you want to stand out build a project using the tools listed in a job posting you're really interested in. I had some random personal projects but what got their attention was that one of my projects listed many of the tools they were looking for. It was a ridiculously basic project and practically a laughing point during my interviews but it got me in the door
@ManuelGonzales-ni9sh Жыл бұрын
Great tutorial John! Would you please consider doing a full tutorial on your nvim theme & config?
@JohnWatsonRooney Жыл бұрын
Thank! Yes I will do a video on my nvim, I’ve been configuring it a little more recently and will share soon
@TheJFMR Жыл бұрын
John It would be nice if you make a video on how to apply unit testing or test Driven Development to a web scraping project 😉 You are a good teacher to teach that
@JohnWatsonRooney Жыл бұрын
Interesting idea, I’ll add it to my list thanks!
@Kicsa Жыл бұрын
I have been enjoying your good videos, thank you for everything. I hope in a couple of weeks, I can start making my own programs.
@runnrnr Жыл бұрын
Thank you for your videos! I now link them to people who ask me questions about selectolax. I'm the author of selectolax.
@JohnWatsonRooney Жыл бұрын
Oh cool thank you! Selectolax is great I use it all the time - appreciate your work!
@Алексей-й4з6ш6 ай бұрын
You should be written better manual, very poor documented
@adarshjamwal3448 Жыл бұрын
Awesome👍👍 tutorial. I learned a lot of things from your scraping series. Keep going on.
@JohnWatsonRooney Жыл бұрын
Thank you glad I can help
@JuanPerez-iu9vk3 ай бұрын
I love your VIM workflow. Could you make a video some day about VIM and your config file and plugins?
@samoylov1973 Жыл бұрын
Set Comprehension is a nice touch in this video. While watching, thought of converting to set afterwards. But making it in one and easy go, as you did, is better. One wish: when you explain such parts as "When you want to grab all these table information..." (20:19 on timing), please, show at least one piece of it to the end. How to do others, will figure out)
@mchahal222 ай бұрын
Great Content Thanks!
@yacinehechmi6012 Жыл бұрын
Greetings from Tunisia, Thanks John!!, waiting for that nvim video i would really love to know what you configured in nvim for python development.
@valuetraveler2026 Жыл бұрын
Good to see alternatives for parsing (selectolax), Will use rich now from now on. Dont personally like to use dataclass/pydantic for most work as it has hundreds of fields. But this is cleaner code than imperative style down the page
@JohnWatsonRooney Жыл бұрын
I really like selectolax. And fair enough regarding dataclasses - for me at the moment the benefits outweigh the downsides
@flashwade888 Жыл бұрын
Thank you so much for the detailed tutorial, John! I have a quick question - would it be possible to use dataclasses with Scrapy, please?
@JohnWatsonRooney Жыл бұрын
thanks glad you liked it! yes you can use dataclasses with scrapy since 2.2
@flashwade888 Жыл бұрын
@@JohnWatsonRooney Cheeeeers!! I cannot wait to give it go!
@malwaredev33 Жыл бұрын
Excellent video content, all videos are understandable for anyone, can you tell me what font/theme you're using in vs code in this video. Thnaks
@JohnWatsonRooney Жыл бұрын
Thanks! Editor is Neovim and colour scheme is called oxocarbon
@amarAK47khan Жыл бұрын
you are a life saver !
@AhmedAl-Yousofi Жыл бұрын
What editor are you using?
@JohnWatsonRooney Жыл бұрын
Neovim
@DrChrisCopeland Жыл бұрын
I have learned a lot from your videos. Can you do any type of tutorial on report generation for the scrapes. My main use case is once I identify a page that meets my requirements, I generate a PDF (or something) that would show the page as it was. I've had terrible luck with htmltopdf and similar libraries (or point me in the right direction). Thanks for what you do!
@JohnWatsonRooney Жыл бұрын
Are you after just a visual representation of the page? Playwright can do that very easily. Or are you grabbing data and want that in PDF sorry not quite sure what you mean!
@DrChrisCopeland Жыл бұрын
@@JohnWatsonRooney visual representation as far as I can tell (use case is still in the works/fluid). Once an item/listing on the page meets a requirement, save that individual info to a pdf, run some more stuff, then on to the next item/listing. Due to the subject matter, I don't want to put more in the comments, but yeah I'm learning a lot here and it's all going to work on a non-profit I run in the US.
@DrChrisCopeland Жыл бұрын
@@JohnWatsonRooney I will look at playwright as well!
@DucNguyen-in1xd Жыл бұрын
can you give example when select by class?
@JohnWatsonRooney Жыл бұрын
Class is separated by a dot “div.class”
@rz84vlog78 Жыл бұрын
The tutorial really helped me. Is it possible to scrape website like college board since the basic authentication of username and password doesn’t seem to work. Would love to at-least get some tips so that I can scrape the bit complex websites.
@JohnWatsonRooney Жыл бұрын
Hey thanks glad it helped. For websites that need a login I generally lean towards browser automation (playwright) simply because it is much quicker and easier to get something working. I’d suggest that if you haven’t looked into it already, a few videos on my channel that could help
@michakuczma4076 Жыл бұрын
Is this M+ 1M font you use in your ide? very nice and readable
@JohnWatsonRooney Жыл бұрын
Yes it is- although I think it’s m+ 2m. It’s great I’ve been using it for a while now
@codetechpro Жыл бұрын
Hey John I was wondering, is it possible to fill up visa card dynamic form with selenium or playwright?
@JohnWatsonRooney Жыл бұрын
I don’t know that on specifically but I’ve filled out loads of forms with playwright and selenium before, if it loads the page fine you’ll have access to the forms to j out data
@charlescharles4279 Жыл бұрын
Awesome tutorial, do you notice any performance drop when using dataclass to save data during web scraping compared to using dicts?
@JohnWatsonRooney Жыл бұрын
Thanks! Generally no, the time lost in scraping is in the network connections so I’ve never worried about it much
@anthonymunnelly20 Жыл бұрын
Excellent. Really, really well-done tutorial on a subject that seems straight-forward, but isn't.
@tm_Panda... Жыл бұрын
Hey, I was wondering why you stopped using Scrapy? Was it too big of a framework for the scraping projects you do? Great video as always!
@JohnWatsonRooney Жыл бұрын
I found that i preferred to write my own solutions from the ground up with what I was trying to do, scrapy is still a great framework though. I have a video on my channel about it if you are interested in more details
@mxdigitalmediamarketplace11 ай бұрын
Hello, I a newby at scrapping. When I wrote @Dataclass it did not let me do it, it says it is not an integrer. I using python 3.12, httpx, selectolax and rich. Ase you mentioned in the tutorial
@mxdigitalmediamarketplace11 ай бұрын
Hello, following your tutorial, I am getting a enrror on line 26 resp = client.get(url, headers=headers) Traceback (most recent call last): File "", line 1, in resp = client.get(url, headers=headers) NameError: name 'client' is not defined
@coyoteden8111 Жыл бұрын
Early morning web scraping lesgo
@ZhCrypto Жыл бұрын
U are innocent programmer ❤
@atatekeli9295 Жыл бұрын
Hi John, I tried turning your header code into this for macOS headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36" } I use Google Chrome for web scraping, use M1 Chip and use macOS Ventura 13.4, how can I make it compatible for my scraping
@JohnWatsonRooney Жыл бұрын
Hi - the user agent header is what we send with the request to the website - it can be anything, you can use the same one I do or any that you can find on google. It doesn’t need to match your system
@atatekeli9295 Жыл бұрын
@@JohnWatsonRooney Would it cause an error if I write the same code that is not configured to my system requirements
@keifer7813 Жыл бұрын
What do you do when the elements you want have dynamically changing classes like class="xJdnxidXjejns xIdhdn39db xzIJhdidmn8"
@JohnWatsonRooney Жыл бұрын
go back up the element tree until you find one that is constant, then reference off of that. I use css selectors so something like "div.constantclass li a" for all the a tags within li tags in divs with class "constantclass"
@ankylosis751 Жыл бұрын
@@JohnWatsonRooney would really love a tutorial on this... and if you made something similar to this dynamic Changing classes can you link me? I'm at my wits end btw superb content manh. its helping me learn python deeply too
@malwaredev33 Жыл бұрын
Hi, Bro how are you.?
@richardboreiko Жыл бұрын
I'm getting an error on page 20 and it's consistent, but the products seem to vary each time the page appears, so they must be getting unordered data from their SQL statement. File "C:\Users icha\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1132, in read return self._sslobj.read(len) TimeoutError: The read operation timed out It looks like the last line from your code to be executed was this: File "C:\Users icha\PycharmProjects\webScraping\JohnWatsonRooney\ModernScrapingBestTools.py", line 28, in get_page resp = client.get(url, headers=headers) File "C:\Users icha\PycharmProjects\webScraping\venv\lib\site-packages\httpx\_transports\default.py", line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ReadTimeout: The read operation timed out It happens consistently on www.rei.com/c/backpacks?page=20 but the number of products printed seems to vary before the error occurs. Do you have any debugging suggestions?