Build undetectable Amazon scraper with n8n, Puppeteer and Scraping Browser

  Рет қаралды 5,738

Oskar

Oskar

Күн бұрын

In this tutorial I explain how to automatically scrape Amazon search results using Puppeteer, Scraping Browser and n8n.
In this tutorial I cover:
1. Creating a base Puppeteer script to retrieve data from Amazon search results (product names, prices, ratings etc.)
2. Key issues while scraping Amazon
3. Implementing Scraping Browser into the code
4. Adjusting the script and deploying function to Google Cloud Functions
5. Automating the scraping using n8n (transferring results to Baserow)
Resources:
1. Puppeteer script (for local use): gist.github.com/workfloows/d7...
2. Puppeteer script (for local use with Scraping Browser): gist.github.com/workfloows/1d...
3. Puppeteer script (for use with Scraping Browser on Google Cloud Functions): gist.github.com/workfloows/5a...
4. Scraping Browser API: brightdata.com/products/scrap...
5. Docs for Scraping Browser troubleshooting : help.brightdata.com/hc/en-us/...
My other tutorials:
1. How to build voice Telegram AI bot with n8n, Whisper and ElevenLabs: workfloows.gumroad.com/l/tele...
2. How to build Telegram AI bot with n8n and 🦜🔗 LangChain (FlowiseAI): • How to build Telegram ...
3. Web scraping data with n8n and Puppeteer: • Web scraping data with...
4. How to automate Notion databases using n8n: • How to automate Notion...
5. Using webhooks in n8n (parameters, responses and triggers): • Using webhooks in n8n ...
Subscribe my newsletter: workfloows.com/
Visit my Gumroad profile: workfloows.gumroad.com/
Follow me on Twitter/X: / workfloows
Follow Workfloows on LinkedIn: / workfloows
Disclaimer: I cannot be held responsible for any consequences resulting from the use of the information provided in this tutorial. Make sure to obtain proper authorization before engaging in web scraping activities, or consider using proxies to protect your online presence and ensure ethical scraping practices.
Create your Bright Data account and get $10 credit: brdta.com/workfloows
Create your n8n cloud account here (affiliate): n8ngmbh.partnerlinks.io/6hvl7...
Screen recording software that I use (affiliate): www.screen.studio/@jmMwX
0:00 Final result
1:05 Step 1: Install Puppeteer
1:57 Step 2: Add scraping script
6:48 Step 3: Connect Scraping Browser API
9:14 Step 4: Adjust code to Google Cloud Functions
9:59 Step 5: Deploy function to Google Cloud Functions
12:03 Step 6: Automate scraping with n8n

Пікірлер: 31
@workfloows
@workfloows 9 ай бұрын
Hello, thanks for watching my video! If you want to play a bit more with scraping using n8n and Puppeteer, here is my previous tutorial: kzbin.info/www/bejne/j6DRf32ndqarmsk
@jinqimao4781
@jinqimao4781 3 ай бұрын
thanks for teaching lesson start from the install package. it's so important!!! Keep doing it.
@workfloows
@workfloows 3 ай бұрын
Thank you very much - I’m glad you find my video helpful!
@nestiq
@nestiq 6 ай бұрын
Congrats bro! 😃😃😃😃😃😃
@workfloows
@workfloows 6 ай бұрын
Thanks a lot!
@SUDHANSHU934
@SUDHANSHU934 9 ай бұрын
I didn't like this video....I loved this ❤ There are so many new things which I have learnt from this video today. By the awesome video editing....😍
@workfloows
@workfloows 9 ай бұрын
Thank you very much! I'm happy that you like the video and I'm very grateful for your support. All the best to you!
@RolandoLopezNieto
@RolandoLopezNieto 6 ай бұрын
Great video
@workfloows
@workfloows 6 ай бұрын
Thank you very much!
@pjones6749
@pjones6749 5 ай бұрын
Great video. Question: I am trying to reduct cloud costs for all these tools. How do you host Baserow, and do you host it on the same server as n8n? I now use Hetzner but not sure if it can handle both.
@workfloows
@workfloows 5 ай бұрын
Hey, thank you very much for your comment and kind words about my work - I really appreciate it! Yes, it is possible. Although I haven’t self-hosted Baserow yet (I use cloud option for now), I’m pretty sure it should not be a problem to have both n8n and other apps on the same VPS. The key here is to set each app on a different port. I suppose basic machines with ~2GB memory should handle both things.
@pjones6749
@pjones6749 5 ай бұрын
@@workfloows Thank you....Subscribed
@ablae1234
@ablae1234 8 ай бұрын
Is it possible to take this code and modify it to scrape Reddit or any other website, or each site needs a different approach ?
@workfloows
@workfloows 8 ай бұрын
Hi, thank you for your comment! This code was created exclusively for Amazon, so scraping and retrieving data from Reddit is rather not possible with it. Of course, you can create your own scraping script using Puppeteer, and I strongly encourage you to do so - it's a lot of fun!
@Falalfel
@Falalfel Ай бұрын
Why do you upload node.js code to Google cloud and not directly in node code in n8n workflow? Is it because you can't import modules? I get module not found error. Is there any others way?
@GehirnGoldmine
@GehirnGoldmine 7 ай бұрын
What is the purpose of this scraped data? Obviously, there seem to be some use-cases. Would you please share some?
@workfloows
@workfloows 7 ай бұрын
Hello, thanks for you comment. Absolutely - mostly analytical ones, for example: monitoring of price changes, checking product fit and competitiveness, exploring customers preferences (basing on number of products sold), SEO analysis (e.g. what keywords competitors use) and many more similar. Basically, the key here is not only the type of data scraped, but also automation around it. In this example I scraped search results only for one keyword, but imagine you’d like to perform analysis for dozens or hundreds of product types - it’s also possible with this workflow. Scraped and structured data is also much easier to read and transform - it simplifies performing analytical tasks.
@GehirnGoldmine
@GehirnGoldmine 7 ай бұрын
@@workfloows Hey, Thank you very much for your great answer! Makes a lot of sense to me.
@chowadagod
@chowadagod 7 ай бұрын
lovely tutorial. i tried this but i'm getting a CORS error.. any idea how to resolve this.. thanks
@workfloows
@workfloows 7 ай бұрын
Hi, thanks for your comment and apologize for late feedback. Could you please let me know on which point you get this error (while running script locally or on GCF)? Do you get any other information from console? Thanks in advance for your kind reply.
@chowadagod
@chowadagod 7 ай бұрын
@@workfloows I later on figured out the error . Thanks 🙏
@nocodecreative
@nocodecreative 6 ай бұрын
what about the puppeteer community node?
@workfloows
@workfloows 6 ай бұрын
Hello, thanks a lot for your comment and sorry for my late feedback. I had a chance to use Puppeteer community node, and unfortunately I find it a bit buggy. Since Puppeteer is also slightly demanding in terms of memory, it’s much more convenient for me to host it on GCF and perform calls when needed. But it’s just my experience - if Puppeteer community node works well for you, I don’t see a reason not to use it 😃
@omare5383
@omare5383 5 ай бұрын
What do you mean by undetected ? Shouldn't you get blocked after numbers of requests?
@workfloows
@workfloows 5 ай бұрын
Hello, thank you for your comment. As long as you use IP rotation, chances that the script will be permanently blocked are rather low (when one IP address is getting blocked, the other one is used in the next round).
@omare5383
@omare5383 5 ай бұрын
How can I apply IP Rotation on this method on veido ​@@workfloows
@wallpp
@wallpp 8 ай бұрын
aguante el mate
@workfloows
@workfloows 8 ай бұрын
¡Aguante!
@MK-jn9uu
@MK-jn9uu 7 ай бұрын
What’s the purpose of alllll this, if you’re just going to use brightdata
@workfloows
@workfloows 7 ай бұрын
Hello, thank you for your comment. In the video description, you can find links to the Puppeteer code without Bright Data implemented. If you prefer not to use BD, please feel free to use these resources and adapt them to the requirements of any other proxy provider of your choice.
@pratikguptaji
@pratikguptaji 4 ай бұрын
Create A Python API in 12 Minutes
12:05
Tech With Tim
Рет қаралды 564 М.
Would you like a delicious big mooncake? #shorts#Mooncake #China #Chinesefood
00:30
Final muy inesperado 🥹
00:48
Juan De Dios Pantoja
Рет қаралды 16 МЛН
Купили айфон для собачки #shorts #iribaby
00:31
Industrial-scale Web Scraping with AI & Proxy Networks
6:17
Beyond Fireship
Рет қаралды 698 М.
Always Check for the Hidden API when Web Scraping
11:50
John Watson Rooney
Рет қаралды 601 М.
Get over 80,000 emails a day through google and n8n! 😄
14:50
Create Your First AWS Lambda Function | AWS Tutorial for Beginners
12:44
Tiny Technical Tutorials
Рет қаралды 126 М.
Advanced Web Scraping in Puppeteer: Scraping a Bookstore!
21:59
Josh tried coding
Рет қаралды 28 М.
OpenAI Embeddings and Vector Databases Crash Course
18:41
Adrian Twarog
Рет қаралды 400 М.
Puppeteer Tutorial - Puppeteer Full Course for Beginners 2022
3:18:33
Michael Kitas
Рет қаралды 32 М.
Would you like a delicious big mooncake? #shorts#Mooncake #China #Chinesefood
00:30