How to Scrape Any Website in Make.com

Рет қаралды 147,089

Күн бұрын

Пікірлер: 199

@nicksaraev 3 ай бұрын

My long awaited community is now live! Apply fast: makemoneywithmake.com 🙏😤 Limited to 400. Price increases every 40 members.

@Txjxsxa 3 ай бұрын

After building OpenAI module am facing rate limit error. Even after upgrading to GPT 4o am facing same issue. Any idea how do I fix this?

@atbapp 4 ай бұрын

Awesome tutorial Nick... I cant emphasise enough, not only how helpful this tutorial was but also the number of ideas this tutorial has given me - top 5 Channel for me!

@thibaultmouillefarine795 4 ай бұрын

Really ? Who else being in a Top 4?

@michellelandon8780 6 ай бұрын

Hi I want to say Thank you for being a great teacher. I appreciate you taking your time in explaining things.You are very easy to follow. I always look forward to your next video.

@nicksaraev 6 ай бұрын

You're very welcome Michelle!

@johnringo6155 5 ай бұрын

@@nicksaraev how can one get to your mentorship/course please ?

@senpow 4 ай бұрын

I got the same question. It seems that it is still in construction. Because we get curriculum in the video description. @@johnringo6155

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@agirlnamedsew 6 ай бұрын

3 minutes in and I know how to scrape a webpage and parse it to text. THANK YOU!!!!

@nicksaraev 6 ай бұрын

Glad I could help!

@yuryhorulko3834 5 ай бұрын

Thank you so much Nick! Every time there is a brilliant video!

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@alexf7414 3 ай бұрын

That's amazing, how did I miss this company. You got a new customer. Great job

@robertjett_ 6 ай бұрын

Dude, I cannot overstate how mindblowing this series is. There were so many things in this video that I had absolutely no idea were possible. Also 1 bed 4 bath is crazy.

@nicksaraev 6 ай бұрын

Hell ya man! Glad I could help. And SF real estate smh

@MyFukinBass 5 ай бұрын

Another brilliant video, Nick! Would be awesome to get a more in depth tutorial about Regex or what to ask chatgpt about (What are we looking for specifically) in order to scrape. Were you a developer before? You seem to know a lot about webdev. Thanks again!

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@saeedsm57 5 ай бұрын

One the best videos I have met this year so far..Thanks!

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@sirishkumar-m5z 29 күн бұрын

An engaging overview of the SmolLM model! Some alternative AI venues offer goods equivalent to those of other systems.

@swoollard 5 ай бұрын

Unfortunately i couldn't get this to work. The parsed HTML seemed to have different data to your example and i couldn't figure out Regex. You mentioned it could be done with Chat GPT - it would be helpful to know that approach also.

@amirsohail855 Ай бұрын

First of all, thank you @Nick Saraev for this such useful knowledge. You only scraped one record but there are a lot of records on how we scrap those all. Please give me the answer I am working on such a project just for learning purposes.

@esprit4432 5 ай бұрын

sometimes the regex say match on regex101 and then in integromat it doesn't..

@j3ffw1n5 6 ай бұрын

Very appreciative of what you’re doing with this series 🙏🏽 It’s becoming clear that having a solid understanding of JSON and regix is a must if you intend on building anything decently complex for clients. Any resources, courses, or forums you can point us towards? Thanks again!

@nico.m527 6 ай бұрын

You can always ask ChatGPT for help with this kind of stuff. Explains to you in plain english

@nicksaraev 6 ай бұрын

Thank you Jeffrey 🙏 agree that it's important. Luckily AI is making it less so-if regex is currently the bottleneck in your flows, you can usually "cheat" by passing the input into GPT-4 with a prompt like "extract X". To answer your q, though, my education was basically: I watched a few KZbin videos, same as you, and now just use regex101 for fine tuning. Most of my parsers are extremely simple and real regex pros would laugh at them (but they work for me and my wallet!) .* is your friend Hope this helps man.

@LeximoAI 4 ай бұрын

Hey Nick great video!! I just have a doubt. If you this module once for one url and then put it on sleep, how do you scrape the other urls. I didnt quite get the hang of it how it happens so it would be nice of you to explain in briefly. Thanks in advance!!

@sunshinemodels1 6 ай бұрын

came for the web scraping insights, stayed for the pearly white teeth

@nicksaraev 6 ай бұрын

Brb getting a Colgate sponsorship

@RandyRakhman 4 ай бұрын

thanks for teaching us sir. Appreciate it!

@automate_all_the_things 6 ай бұрын

Super insightful videos, much appreciated! Just fyi, in timestamp 28:20 you're trying to expand the window size. You can do this by clicking the little symbol with the 4 arrows.

@Bassdag1 6 ай бұрын

That was fascinating to watch and very clear explanation. Thank you for sharing. I am definitely subscribing!

@nicksaraev 6 ай бұрын

Welcome aboard!

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@great_live_music 6 ай бұрын

Really great content, thank you for this video! If I wanted to optimize your flow, I would check if the URL is in the Google Sheets document before calling the parsed URL and extract the data on the page.

@nicksaraev 6 ай бұрын

Good thinking!

@xvivaan7422 2 ай бұрын

Hey Nick, love the videos!! just had a few questions. Would love if you could help us out. What is your business model like? Do you offer clients a subscription model? or a one shot payment? and what do you think we should apply to our business model as well, considering we r looking to rope in new clients and remain profitable over a period of time. I ask this since the websites we will be using have a monthly subscription fee and a limit on the API / operation requests and if the requests exceed the limit of the plan purchased, how do you tackle that? It would be of great help if you could make a short 10 min video on this or maybe a reply to the comment. Love the series!! Keep up the good work!!

@highnitin 6 ай бұрын

this is pure gold :)

@ricardofernandes161 2 ай бұрын

Masterclass Nick ! Thanks a lot for this video

@conglife 4 ай бұрын

Thank you for your sharing, it has truly benefited me a lot.

@DidierWiot 3 ай бұрын

Fantastic stuff, thank you so much Nick!

@stephena8965 5 ай бұрын

Hey Nick, amazing tutorial as always, you've massively helped me on so many flows - thank you! I actually managed to build a similar flow but instead of RegEx I used an anchor tag text parser with a filter that checked for the presence of a class of "page__link" from element type since all page links had that. Would you say there's anything wrong with this if it works for the use case?

@elibessudo 4 ай бұрын

Super helpful, thank you. Any chance you could do a tutorial on how to scrape sites that require logging in?

@EasGuardians 6 ай бұрын

Thanks Nick, super helpful. Will set this up right away :D

@nicksaraev 6 ай бұрын

Hell ya man! Let me know how it goes.

@axellang2132 5 ай бұрын

Thank you very much Nick for your amazing videos! I'm a beginner and this question may sound dumb but I'm running a scenario with 2 text parsers following each other. The first one runs 1 operation but the one following that's using the same input data runs way more operations. Do you know where that could be coming from? No hard feeling if you don't have time to answer ;)

@EliColussi 6 ай бұрын

I am curious how you would tackle getting around a "click to reveal" phone number. It requires 2 clicks to find the phone number.

@Storworx 6 ай бұрын

Again, your instructional video’s are so informative. Very much appreciated! Could you post how i can visit multiple websites from sheet? Would i add a sheet at the front and another at the the end to assess the next row?

@nicksaraev 6 ай бұрын

Appreciate the support! Absolutely-here are steps for plugging multiple sites in: 1. Create a Google Sheet (sheets.new) with a column labelled "URL". 2. In the Make scenario builder, search for Google Sheets connectors. You're looking specifically for "Search Rows", which has a built in iterator. Make this the trigger of your flow. 3. Authorize your account, select the sheet from earlier in the modal, etc. Set "maximum number of returned rows" to however many you need. 4. Use the output from the "URL" column as the input to the rest of the flow you see in the video. Remember that since "Search Rows" is now a trigger, if you turn this scenario on it'll run every X minutes. So if you don't have a designated flow you might want to make it "on demand" and just run whenever you need to process sites/etc. You can then make another Google Sheet to collect the output and use the "Add Rows" module to fill it up. Hope this helps!

@tobigbemisola 6 ай бұрын

This is great and well explained. After watching the full length of the tutorial, I'd rather opt for using a web scrapper tool until I'm good with using regex. Btw any resources on learning regex?

@nicksaraev 6 ай бұрын

Thx Tobi! Frankly I just use Regex101 for everything (regex101.com), the highlighting as you set your search up is extremely helpful. If you were to quiz me on tokens/selectors without a tool like this I'd probably know fewer than 50% of them 😂

@LEGOJediAcademy 5 ай бұрын

Have you caught wind of VideoGPT making waves? It's your ticket to seamless, professional video content.

@kerimallami 6 ай бұрын

BRO YOU ARE ROCKING IT!!!!!

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@PazLeBon 6 ай бұрын

so it scrapes but you have to sign up.. hardly feels private does it

@hammadyounas2688 Ай бұрын

Great Work. Can you make a tutorial how to scrape data from linkedin?

@alderdj.froolik 6 ай бұрын

Nicely brought!

@nicksaraev 6 ай бұрын

So happy you found value in this man.

@tachfineamnay398 4 ай бұрын

Great job ! thank you

@yuryhorulko3834 3 ай бұрын

Hi Nick! Thank you for your education! But.... How to solve the issue with Status code 403?

@aymscores 3 ай бұрын

i think adding " " to the value section of the header's fixed this for me!

@MrRichBravo 5 ай бұрын

Great info!

@KenshiDigital 5 ай бұрын

In the OpenAI does it need payment to generate an output (credits) or you just need to get the api key and that’s it?

@fontwellestate3544 26 күн бұрын

HELP! Hey guys, I couldn't get paste the 403 error you came across when you got to getting the individual listings! What do I do?

@Deborah-iz1wi 4 ай бұрын

Hi Nick thanks for the video, I'm having a problem with the parser...it's not parsing down the text for me like it show in the video. Any suggestions on this?

@SaadBelcaid 4 ай бұрын

Hey Nick, What would be a prompt of GTP4 to extract those URLS and build the regex?

@lukeshieldsnature 4 ай бұрын

Don’t understand why you moved the last sleep before the sheets but otherwise great explanation

@ArtemSFG 3 ай бұрын

Thanks so much for the tutorial! Just a question: how do you deal with pagination when scraping data?

@nicksaraev 3 ай бұрын

Thanks Artem 🙏 you'd create a separate route for the scraper so it can iterate over each page, then add each page's data to an array (using the add() function or similar). On your main route you'd then add a Get Variable module and pull the array contents. Hope this helps.

@ArtemSFG 3 ай бұрын

@@nicksaraev Thank so much for sharing, Nick! Hopefully, I'll be able to help you somehow one day :)

@jga13775 3 ай бұрын

Great video! What if the page you're trying to scrape requires authentification? Like the "my profile" section of uber or ny other company.

@LuxGolfAlgarve 28 күн бұрын

how do you scrape in the same process the images?

@snappyinsight 6 ай бұрын

Thanks for the Tutorial. Does this also work on Amazon listings?

@nicksaraev 6 ай бұрын

Glad I could help. Yes it works on Amazon, though be wary that their bot detection is much more sophisticated (see another comment where I discuss how to scrape reviews).

@m4RIK 3 ай бұрын

to get it right ... u feed the whole html-content to gpt. so u pay the input-tokens for all of this. isnt it possible just feed the body or a single container ID or Class ?

@TesteAutomacao 3 ай бұрын

Hi Nick, how you doing? First of all I wanna thank you for everything you are doing for us. I tried to use this automation on different websites but there are a lot of websites where the code has 2 or 3 times almost in a row the same link for the same house/product, so when you use regex you get repeated results. How can I filter this or put some type of condition that makes me not have duplicate results and save me operations? Thank you

@woundedhealer8575 4 ай бұрын

Is there a way to use proxies for this? I just feel like it’d be pointless to get so deep into this without one

@karamjittech 6 ай бұрын

Great stuff. the shared Hidden_API_Masterclass.json seems incomplete, would be great if the complete json could be shared.

@bsandmg 6 ай бұрын

Gonna check out, wonder if could be used for comments on a post, or twitter, example someone saying they want something, than boom you can respond

@nicksaraev 6 ай бұрын

Thanks Raiheen! You could, although there are probably better solutions to this. Facebook/Twitter/etc often hide comments behind a "reveal" mechanism like scrolling or a "Read More" button which makes scraping them difficult (in addition to their security and tight rate limits). That said, anything is possible in 2024! You could run a search every X minutes using a search bar and scrape the top X comments. You'd use an intermediary DB of some kind to store the comment text, and then for every comment in your scrape, if that comment doesn't already exist, you could fire up a browser automation tool like Apify and log on to the platform in question. You'd then have GPT-4 or similar dream up a response and post it using JavaScript. Hope this helps man 🙏

@channel83932 6 ай бұрын

Can we add proxies to these flows?

@nicksaraev 6 ай бұрын

Yes, definitely. You'd just replace the URL in the HTTP Request module with whatever your proxy is and then add the proxy-specific data (most proxies will require you to send credentials, the URL you want to passthrough, etc).

@channel83932 6 ай бұрын

@@nicksaraev can you show us an example of this?

@FYWinBangkok 4 ай бұрын

Hey amazing work just you should cut in post what did not work I got so lost and trying to do at my home can't make it happen :(

@terrycarson 6 ай бұрын

Great Job!

@nicksaraev 6 ай бұрын

Thank you Terry!

@lc285 6 ай бұрын

First, you should explain what scraping a website is. 🤔

@AjarnSpencer 5 ай бұрын

There are other videos for that this is for those taking the next step

@elie2222 5 ай бұрын

Curious why you decided to watch the video if you didn’t know what it was

@craigsandeman3865 5 ай бұрын

Managed to get a 200 reponse, on the first step. But it appears that some of the html is hidden. Seems like there is a delay in all the data being populated. I added all the header info. Thanks for the tutorial.

@Txjxsxa 3 ай бұрын

After building OpenAI module am facing rate limit error. Even after upgrading to GPT 4o am facing same issue. Any idea how do I fix this?

@stevearodgers 5 ай бұрын

I can't get past the HTML to Text module. It keeps giving me an error message: BundleValidationError. Maybe poor HTML on the website I'm scraping? Anyway, thanks for the information. So much to learn!

@hitmusicworldwide 6 ай бұрын

How do you get past authentication to scrape for resources that require a sign in?

@GarthB-uf6dr 2 ай бұрын

Hi Nick, is it possible to scrape a page that does not have an API and that you have to be logged into, please?

@ivansmiljkovic9097 6 ай бұрын

What camera are you using, is it Lumia by any chance? Thanks!

@nicksaraev 6 ай бұрын

Because of this comment & a few others, I just published a full gear list in the description! Including camera, lens, lighting, etc :-) all the best

@BassTi2k 5 ай бұрын

How can I code the headers for scraping data from TikTok? Is a specific type of header required to imitate a legitimate user or device?

@dandyddz 5 ай бұрын

Doesn`t make support css selectors?

@sketchingbyyash6358 4 ай бұрын

What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?

@littlehorn941 6 ай бұрын

Thanks for making this video; very helpful with a few automation projects that I have. I've never heard of make before. I've been spending the last two years making a local webhook application as a side project that basically does the same thing as make, but this site is so much better.

@nicksaraev 6 ай бұрын

You're very welcome! I'm a dev as well and find Make better for 99% of business use cases. The only time I build something out in code these days is when a flow is extremely operationally heavy. Keep me posted 🦾

@DIPU1036 4 ай бұрын

How would you address the legality of scrapping?

@obvp 2 ай бұрын

Is it possible to scrape Wikipedia? Not working following your steps

@dfreshness2006 6 ай бұрын

You only logged the first listing on your Redfin search. How does it loop to the second and so on?

@nicksaraev 6 ай бұрын

Great q. The flow automatically loops because the "Match Pattern" module outputs multiple bundles. When multiple bundles are output by a module, every module after that module runs anew for each respective bundle. Hope this helps 🙏

@JMasalle 4 ай бұрын

Skip to 2:47

@d3.finance 4 ай бұрын

Great project to learn from. Thank you Nick.

@BashkimUkshini Ай бұрын

The HTML objects I get, have >50,000 characters, and when trying to paste this back on a G-Spreadsheet cell, I get an error. Any tips how to reduce/clean-up the HTML object that I get back? For example, ScrapeNinja module offers a JavaScript field you can use, to filter this out on the go, but they have paid APIs :/

@SaidThaher Ай бұрын

Try split function to divide the data in pieces and then send them to splited cells in GS

@hypnoticblaze4323 6 ай бұрын

How to bypass the robo.txt file blocking the scraper?

@aiforbuiness 6 ай бұрын

Same would like to know the answer to this

@ryanangel3355 6 ай бұрын

You can't with this I am pretty sure

@marvinschulz2480 6 ай бұрын

Golden content

@nicksaraev 6 ай бұрын

So glad you find it valuable man

@DanielAuriemmaOfficial 6 ай бұрын

How would I use this if I have to login to a site in order to scrape it? Is there a login prompt to add before the site prompt? Thanks for all the info!!!

@nicksaraev 6 ай бұрын

Happy you found this valuable Daniel! It depends on the site-sometimes you can just pass a username/password in the HTTP request module to get the cookie, other times you need to use browser automation tools like Apify. I recorded an in-depth video on authentication here if you're interested: kzbin.info/www/bejne/iGnSaIlpbrOGibs Hope this helps 🙏

@untetheredproperty 4 ай бұрын

Thank you for the information. BTW, your copyright has not been updated. :)

@overtheedge23 6 ай бұрын

How about content behind a pay wall?

@nicksaraev 6 ай бұрын

Just recorded a video to answer this (hidden APIs)! Hope it helps you.

@cam6996 4 ай бұрын

bro.. that drink was empty

@amitjangra6454 5 ай бұрын

I do it with a simple python code.

@user-jg5dx4pk8x 6 ай бұрын

Thank you, can I use this to web scrape all reviews of a product on amazon

@nicksaraev 6 ай бұрын

Absolutely, just checked for you. You have to do it in two parts: 1. Feed in the Amazon product URL to a Request module like I show in the video. Then scrape HTML and parse as text. 2. Somewhere in the resulting scrape will be a URL with a string like /product-reviews/. You need to match this (can use regex). Then make another request to that URL for product reviews. Amazon's bot detection is very good so be careful you don't get rate limited 🙏

@jtisaks8569 5 ай бұрын

This is very good explained!!!!!!!!

@purvenproducts2463 6 ай бұрын

my friend thank you so much for your videos, I really appreciate it, again any Go High Level Platform Review?

@nicksaraev 6 ай бұрын

I will absolutely do one on GHL, I used to sell their platform as an affiliate actually. Tbh I don't like their "automations" one bit but it's important enough to go through. Probably next month as I finish the course and the rest of my videos-thank you for the idea!

@purvenproducts2463 6 ай бұрын

@@nicksaraev thanks buddy, I tried it but it was a bit overwhelming for a beginner.

@sm0k3ahontas 4 ай бұрын

I don't understand how to find a regex

@brianaragon1641 3 ай бұрын

but this only works if the content data you want to grab is only in text form on the web page, but if it is dynamically created let's say for a js script or something it wouldn't be able to grab the desire data... i.e... if i want to grab a price data from a web page... the content grab by Make a Request module would get something like this PRICE: $ 0.00, but in the web page it is like this PRICE: $ 3.70... but this last one is dynamically created and doesn't show this way in the make module...

@nicksaraev Ай бұрын

Thanks for bringing this up. Will cover this in an updated video 🙏

@byokey 3 ай бұрын

can you scrape banking account ?

@nicksaraev 3 ай бұрын

Only my own 😫

@my.johnnylavene 6 ай бұрын

need to scrape website for an ai web app to allow me to put Q&A, company info etc to fields on web app is that possible?

@nicksaraev 6 ай бұрын

Absolutely. I did something similar for a data viz SaaS a while back. You'd have to find a way to parse each of those strings (Q&A, Company Name, Company Description, etc) and then pass them to your app db. You can use AI for this if there's no consistent pattern-something like "Categorize the following text into XYZ using the following JSON format". Hope this helps man 🙏

@Oscar-kg5eo 4 ай бұрын

This does not seem to work with linkedin

@CrossTalksTV Ай бұрын

true and suggestions

@champagnebulge1 5 ай бұрын

It appears the free version of Chat GPT doesn't work with this. Still, interesting

@IwonaRepetowska-ij7so 7 күн бұрын

I dont get it... you said ... "by the end of it you'll know everything that you need to know about how to scrape like you'll be better than 99% of the rest of the world at scraping sites and you don't even really need to know like HTML or anything like that because we're going to use AI to help"... but if you dont know how to create key to connect openAI... or you have no idea what JSON is... C'mon! HTML is the easiest one of that... :( I was hopefull and eager to follow you.. now I'm in a rabbit whole

@earn_cash_with_G 6 ай бұрын

Brother can I scrap translators details from translation websites?

@nicksaraev 6 ай бұрын

For sure man. If you have a specific site in mind just drop it below and I'll take a peek 🙏

@antoniosales3059 5 ай бұрын

Firstly, thank you, but the title should be: Any website without Cloudflare.

@MichaelWilliams-lo3ix 3 ай бұрын

Awesome

@RSAGENCY_0 6 ай бұрын

AWOSOOOOME🔥🔥🔥🔥

@RiversideInsight 2 ай бұрын

Does anyone know how to get past a Login - Pasword page?

@SaidThaher Ай бұрын

That's illegal 😂😂😂😂

@jimlynch9390 5 ай бұрын

You could have changed the greedy Regex to lazy by adding a question mark to the quantifier e. g. .*? instead of .*

@hishamazmy8189 5 ай бұрын

amazing

@DrDonBoo815 6 ай бұрын

At the 13:31 mark, could you have your ChatGPT with custom instructions entered versus writing JSON to get a better email intro?

@nicksaraev 6 ай бұрын

Yes, definitely! PS the quality usually goes up if you let it output plaintext. This isn't as relevant for my purposes but something to keep in mind if you're generating content (say blogs etc)