Crawl and Follow links with SCRAPY - Web Scraping with Python Project

Рет қаралды 39,744

Күн бұрын

Пікірлер: 61

@hmak5423 3 жыл бұрын

The guy who left a dislike think you misspelled "scrape" as "scrapy"! :) Otherwise, who is that dumb enough to dislike such informative content? Thanks for all your help, Mr Rooney. You are the reason why I passed an interview just a few days ago!

@JohnWatsonRooney 3 жыл бұрын

Haha thanks, well done on the interview!!

@hmak5423 3 жыл бұрын

@@JohnWatsonRooney Thank you. Means a lot! By the way, I know you've a busy schedule, but I've a request for you which you can look into when convenient. The following might be long, but I thought I'd just put it all in one message... www.bearspace.co.uk/purchase?page=1 The interview comprised of a task that stated the above link. Upon opening the link you'll find art pieces which are essentially links that go to the art piece's page showing the price, title, dimensions and the "Add to Cart" option. The task was to go into all of the art pieces links to extract the price, title (name of the painting) and the dimensions. I encountered a problem with loading more pages (Load more) as there is no limit on the page numbers that can be queried - meaning that even typing 'page=1234' in the url link will load a page even though the art elements themselves stop at 200 total (essentially at page 10 or 11). I want to see how you would do the extraction/scraping that results in a dataframe of values including title, price, dimensions. Respectfully yours, Mak

@JohnWatsonRooney 3 жыл бұрын

Hi Mak. I think generally for that site I would keep a list of urls and check the length of the list after each page, and if it is the same as the previous page break out the loop.

@tubelessHuma 3 жыл бұрын

Well done john. Need more Scrapy Tutorials.🌹

@mokolokoa3288 3 жыл бұрын

The best scrapy thing i ever seen. Great Work !

@JohnWatsonRooney 3 жыл бұрын

Thanks 👍

@vickysharma9227 2 жыл бұрын

I was looking for this kind of video, executing with selenium for this task took more time. Thanks for the video Man.

@dickyindra4923 2 жыл бұрын

WELL DONE!!!! GREAT VIDEOS SIR JOHN!!!!

@JohnWatsonRooney 2 жыл бұрын

Thanks very much!

@omidasadi2264 3 жыл бұрын

great, thanks for sharing it.... John how could deal with pages without pagination? I mean pages with a policy of scrolling down and appearing continuous products.

@vishalverma5280 3 жыл бұрын

Hey John ! I guess you missed some thing here , what about pagination how should I apply in such code ?

@bitarddrag60 3 жыл бұрын

Hi John thanks cool video! i have a question is it possible to somehow make scrapy go check whether this category has subcategories and go through them and parse and if not then parse what is there if you know then explain how to do I will be very grateful and apologize for the language barrier

@umarsaid6340 2 жыл бұрын

What if the categories spans to multiple pages? So I want to follow the next button and for each page found, I want to follow the links to each detail item. I can use CrawlSpider, but I wonder how to use Spider only.

@mushinart 3 жыл бұрын

Thank you bro , cool stuff as always

@JohnWatsonRooney 3 жыл бұрын

Thanks!

@GelsYT 2 жыл бұрын

literally answered every question i have

@GelsYT 2 жыл бұрын

THANKS

@samjane6187 3 жыл бұрын

this is insane work man keep it up and smile more please lol :D

@JohnWatsonRooney 3 жыл бұрын

Haha

@CenTexCollector Жыл бұрын

Site I'm looking like doesn't have all the CSS just HTML it looks like. Do you have a tutorial video on getting responses for that?

@ansuapillai2785 3 жыл бұрын

Exactly wanted video.... thanks

@JohnWatsonRooney 3 жыл бұрын

Thanks glad you enjoyed it!

3 жыл бұрын

Awesome John! I have a big question for you: is it possible to get data out in parse (not just the links), join them with what you get in parse_categories, and yield everything for each item? I managed to do it by getting data in parse, passing the parsed data as arguments to parse_detail, calling the Item loader, and yielding there. Not the best solution nor the most efficient, but worked! Hope you can do a video on CrawlSpiders.

@AwB 3 жыл бұрын

I second this question

@GelsYT 2 жыл бұрын

It sounds a great solution

@hayathbasha4519 3 жыл бұрын

Hi, Please advice me on how to improve / speed up the scrapy process

@janyawn6923 3 жыл бұрын

Hi John, looking at the logs and output - do I understand correctly that scrapy did this asynchronously?

@JohnWatsonRooney 3 жыл бұрын

Yes it’s asynchronous, scrapy is built on the twisted library. Although it’s async - it’s not async and await which most people associate with async

@ulfgj 2 жыл бұрын

so, when we want to follow links in ALL THE underlying pages that has a specific url pattern - not just one level down - this isn't it, right?

@tnssajivasudevan1601 3 жыл бұрын

Thank you Sir Great video

@JohnWatsonRooney 3 жыл бұрын

Glad you liked it

@RenatoEsquarcit 3 жыл бұрын

Hey guys... I'm wondering one thing. Is there any market for web scraping services? Every time I check fiverr I get frustrated big time seeing people doing this for 5$.. I mean mastering scrapy alone needed a decent amount of programming skills...

@Kyosika 3 жыл бұрын

Hmmm, in SEO this is sometimes wanted. I have been playing around with Python scripts and it is slowly making my life a lot easier for myself!

@RenatoEsquarcit 3 жыл бұрын

@@Kyosika I love python too, but very complex stuff sold for few dollars is strange :-/

@artabra1019 3 жыл бұрын

me too indians make web scraping work garbage i submit proposal to upwork the client is have budget 60$ to do extracting data from website after few day i look again that project it was given in indian guy and its only cost 10$ they make technical skills garbage but programming is really hard to master.

3 жыл бұрын

Actually it is 5 dls per quantity. If you need a bunch, it'll cost you some dough!

@RenatoEsquarcit 3 жыл бұрын

@@artabra1019 absolutely agree...!

@amineboutaghou4714 3 жыл бұрын

Thank you John !

@reddatosteam3721 3 жыл бұрын

Great content! Following your video , what if the categories also have pages? A kind of multi level page. For example, follow the category, the product, then pagination, once all product were scraped, go back to the next category and so on. I mean instead of having a bunch of start URLs. Is that possible? I tried to mange one and I was able to get only the first page from every category 🤔

@shalinisrivastava7250 2 жыл бұрын

Yes, I also have same problem

@amritanshusharma950 3 жыл бұрын

Hi, i watched your video and it's awesome, so i am also doing some project just like you did on this video but in different way and i stuck. Here is what i want, first i want the data from each column then open the link and then extract some extra data from that link . I know it sound crazy but that's the project i am doing right now using scrapy. I check scrapy documentation and i didn't find anything similar to what i want.

@palashpathak21 2 жыл бұрын

hi John, i am getting 503 error when i do scrapy shell 'url', i have tries setting the user agent, still same, can you help please

@vladimirantonov4506 2 жыл бұрын

Thanks a lot! Very interesting and informative!! :))

@youvanced6593 3 жыл бұрын

What are the benifts of using this instead of the Requests library?

@tomstalley3179 2 жыл бұрын

Thank you!

@rafewm151 3 жыл бұрын

Thank you that was really useful

@rostranj2504 3 жыл бұрын

How do you follow links in a website with nested pages? I'd like to grab the links in a map, then follow them, scrape that page, look for another map with links follow and scrap until I reach the end of the nested pages.

@victormaia4192 3 жыл бұрын

Thanks! very easy to follow

@sixstringscoder8412 3 жыл бұрын

I don't understand why using a parse function with a callback would be needed in place of just using a parse function minus the callback. You show the code in the video, but it's unclear why you couldn't just use the code in parse_categories, rename it as 'parse' and run it just like you've done in past videos about Scrapy. Can you explain the advantage of using the callback? Thanks for your videos and your instruction! Highly appreciated the contribution you make!

@melih.a 3 жыл бұрын

Is there a way to have the default website a link from playwright after logging in?

@RicardoPorteladaSilva 2 жыл бұрын

great tutorial!. Any tips to get freelance jobs remote, I'm from Brazil - Sao Paulo!. I already try plataforms as upwork, workana, but doesnt work for me!. Anyway thanks in advance!

@ggboostedtop483 3 жыл бұрын

Thanks bro!!

@azizaalkuatova9527 3 жыл бұрын

Thank u for the video! How to select with CSS the class with spaces in it? dots doesn't work and as in your example the class name was unique but in many websites they are like parent-child and the beginnings are the same, if anyone can help with that?

@ssshwanth 2 жыл бұрын

what if how to extract note there is no json

@مستقبل_مشرق 10 ай бұрын

thank you you are the best

@Iglum 3 жыл бұрын

Great stuff! Gonna check out all the videos :) i want to learn how to scrape sites like texture.ninja etc for getting me some public domain textures. 👍❤

@artabra1019 3 жыл бұрын

i cant get project. in webs scraping

@களவையும்கற்றுமற 3 жыл бұрын

I need a video about scraping google search results without getting banned. Please do it. Suggest some affordable API or proxy service for that. Thank you

@Don_ron666 3 жыл бұрын

Why does he use a virtual environment

@Scuurpro 2 жыл бұрын

How would I do this if I were too combine this with parse json like in this video kzbin.info/www/bejne/rpvMloWMo9qDmbM. Basically I'm trying to create a crawl spider to scrape products then retrieve the Request URL to get the json data. I don't really know where to start. i was able to build my first two bots just using playwright and css selectors.