I built my own Reddit API to beat Inflation. Web Scraping for data collection.

  Рет қаралды 179,463

Dreams of Code

Dreams of Code

Күн бұрын

The only way for us cash strapped developers to make it in this economy!
In this video, I decide to create my own version of the Reddit API for as cheap as possible (whilst still remaining cloud hosted). We look at how I gathered the data, and how I built a simple, yet affordable, data pipeline, and finally, a usage based API which costs me pennies, rather than hundreds of dollars.
This video was sponsored by BrightData. To signup for Brightdata and get $15 credit to build your own web scrapers, use the following link brdta.com/dreamsofcode
You can find the source code for this project on GitHub at the link below
github.com/dreamsofcode-io/re...
Become a better developer in 4 minutes: bit.ly/45C7a29 👈
Join this channel to get access to perks:
/ @dreamsofcode
My socials:
Discord: / discord
Twitter: / dreamsofcode_io
00:00 Intro
01:42 Web Scraping
06:58 Message Queue
10:19 BrightData
13:23 Deploy to AWS Lambda
14:18 DynamoDB
15:32 API
18:05 Final Cost

Пікірлер: 286
@dreamsofcode
@dreamsofcode 8 ай бұрын
To get $15 credit for use with Brightdata to scrape your own APIs, visit: brdta.com/dreamsofcode
@meinkanal13378
@meinkanal13378 7 ай бұрын
Just an info: Not working anymore, only $5
@dreamsofcode
@dreamsofcode 7 ай бұрын
@@meinkanal13378 inflation strikes again 😭 Let me reach out. Thank you for letting me know
@PaulSebastianM
@PaulSebastianM 6 ай бұрын
Be careful, we scraping is illegal in some countries.
@foobars3816
@foobars3816 7 ай бұрын
This was never a technical limitation, it was a legal one.
@jgould30
@jgould30 6 ай бұрын
uh, no. It's a financial one. The idea that companies are going to offer network and compute resources for the sheer amount of API calls made for free was always comical. It's sad that so many programmers and general public think this stuff is just free or a charity. No matter what you do, eventually these costs will catch up to the business and HAVE to be charged to people or else the service will just die.
@fizzcochito
@fizzcochito 6 ай бұрын
@@jgould30 I am going to touch you without your consent
@Homiloko2
@Homiloko2 5 ай бұрын
@@jgould30 Yep. People pretend webscraping is 'free', but it still costs the companies. The companies are willing to bear the cost of regular users browsing through pages, but a scraper browsing through the entire catalog is even more expensive for the company than if they just used the API. Scraping is definitely malicious.
@tabbytobias2167
@tabbytobias2167 Ай бұрын
@@jgould30 it costs a server less than a penny to serve 1000 requests.
@jameskim7565
@jameskim7565 27 күн бұрын
@@tabbytobias2167 yes, but for a service the size of reddit, it can lead to hundreds of thousands of dollars in losses, due to the sheer volume of those requests.
@sivuyilemagutywa5286
@sivuyilemagutywa5286 8 ай бұрын
The video was enjoyable, but it's important to acknowledge that sponsored content can introduce bias. One approach could be to make the entire video centered around the sponsor, or if you choose to feature the sponsor as you did, consider presenting alternative services similar to them. Your videos are consistently excellent, boasting high-quality production, a well-maintained pace, and crystal-clear explanations.
@aliengarden
@aliengarden 7 ай бұрын
that was my exact thought, thanks for pointing it out.
@seanthesheep
@seanthesheep 7 ай бұрын
when ChatGPT focuses more on the sponsor of the video than the video itself
@jaumsilveira
@jaumsilveira 7 ай бұрын
Yeah, bro was talking about make everything as free as possible and then presents a service which is very expensive
@hqcart1
@hqcart1 7 ай бұрын
what about captcha?????? he didnt mention that his sponsor can go around it, and even his code did not handle captcha.
@TheMacWindows
@TheMacWindows 7 ай бұрын
@@hqcart1 Death by captcha and related services exist for that
@shishsquared
@shishsquared 8 ай бұрын
Crowdsourcing idea for this to prevent IPs getting blocked: a browser that pays its users for using it. Developers write scripts to scrape data, and pay to use the network of users. Users then get paid for using the web browser, which will create a private session, encrypted away from the user, run the web scraping tasks, and send the data back to the developer. Build it all on top of chromium, and if done correctly, websites would have a very difficult time blocking based on IP addresses, activity , or fingerprinting because it would be distributed across actual user IPs, and actual user login times (browser only runs when open). My only concern would be how to protect the users when malicious devs start doing illegal activities. You'd have to have very strong terms and conditions, have logging, and be able to trace back requests to devs. But then that opens a dev privacy can of worms. Still, interesting concept
@phoneywheeze9959
@phoneywheeze9959 8 ай бұрын
Botnet as a Service
@levifig
@levifig 8 ай бұрын
You just described 99% of the “VPN” apps available for your mobile device… ;)
@MuhsinunChowdhury
@MuhsinunChowdhury 8 ай бұрын
Wouldn't residential sneaker botting proxies be able to accomplish the same thing?
@mathisd
@mathisd 8 ай бұрын
@@MuhsinunChowdhury These costs..
@ajnart_
@ajnart_ 8 ай бұрын
ahahahah you're not wrong, especially the free ones@@levifig
@shadez221
@shadez221 8 ай бұрын
For anyone planning to try this , use headless mode of puppeteer so that I doesn’t open multiple browser to improve performance and route it via a vpn setup on aws to obfuscate . And be ready to have your ip blocked 😊
@__sassan__
@__sassan__ 8 ай бұрын
Even when using the VPN?
@tacokoneko
@tacokoneko 8 ай бұрын
vpns also have an ip so when doing this if they block you you need an endless revolving door of new VPNs or proxys @@__sassan__
@tacokoneko
@tacokoneko 8 ай бұрын
which is not that hard because if you port scan the entire internet with some strategic guessing (downloading public datacenter IP ranges, scan port 1080 for SOCKS5 proxys) you can find unsecured proxys for free, even some rare ones that work with SSL over SOCKS5
@tacokoneko
@tacokoneko 8 ай бұрын
i asked someone if port scanning the internet to find proxys is illegal and they said no so i think it's completely legal, they didnt put a password or any authentication so they are allowing people to use it
@Dot_UwU
@Dot_UwU 8 ай бұрын
@@__sassan__ if you send a ton of requests with the same IP, you'll get rate limited. Also most VPN ips are datacenter IPs which are almost always blocked.
@conaticus
@conaticus 8 ай бұрын
Really cool project idea! Loved it
@nocluebruh3792
@nocluebruh3792 8 ай бұрын
yooo
@aa898246
@aa898246 8 ай бұрын
Its the rust guy
@dreamsofcode
@dreamsofcode 8 ай бұрын
Yoo thank you! Love your videos as well.
@outofrange7156
@outofrange7156 8 ай бұрын
rusty boi
@forresthopkinsa
@forresthopkinsa 8 ай бұрын
This is an interesting idea but a really impractical approach. New Reddit is an SPA and you can just use the XHR endpoints to fetch the data raw. Don't bother with browser emulation and HTML parsing. Besides, the closure of the APIs was never about restricting access to a user like you're circumventing here. As you've acknowledged, that wouldn't really make sense on the Web. The API pricing is about charging for data farming and large-scale user interception. You can't accomplish either of these use cases by scraping; you'll get rate-limited very quickly. The only way around this is using Bright Data's borderline-illegal botnet, which seems like a pretty shady way to do business.
@tatianatub
@tatianatub 7 ай бұрын
its called hostile interoperability and its the consequence to fucking over developers, its time we remind platform hosts why APIs were created in the first place
@mathgeniuszach
@mathgeniuszach 7 ай бұрын
People will use their own embedded browsers and similar scraping methods will occur locally. It's basically the same as an extension modification of the site. People just browsing normally don't need botnets and access to all of reddit, they just want a better stinking interface.
@ArizeOW
@ArizeOW 7 ай бұрын
@@tatianatub It's time to remind you, that Reddit doesn't belong to "us". It belongs to Reddit. And they can do whatever they want with it. If they don't want large applications like Apollo to scrape EVERY post, comment, upvote, downvote, user karma and such, there is nothing you can do about it. That's it. It's not that deep.
@DathCoco
@DathCoco 7 ай бұрын
also if using oldreddit you can simply use jsdom to parse the data without needing to spin up a chromium
@x--.
@x--. 7 ай бұрын
The internet is meant to be and should be open. That doesn't mean everything has to be free at-scale but fighting hostility to the _idea of an open internet_ is a good thing. You're free to put your content behind a paywall for everyone.
@Jana-se4kv
@Jana-se4kv 7 ай бұрын
THANK YOU! Very helpful!
@IannoOfAlgodoo
@IannoOfAlgodoo 8 ай бұрын
Curious how much you spend on bright data as their product is like 20$ / GB and 0.1/hour
@GoldenretriverYT
@GoldenretriverYT 7 ай бұрын
Yeah, its expensive as heck. Also I am wondering how they claim they have 72 million residential ips? I can only imagine them having spread malware which then gave them a botnet to work with, or, less likely, they offer people money in exchange for them running a proxy. Edit: I looked it up, apparently they have an SDK which app developers can integrate which gives the users a choice between ads or allowing their connection to be used by BrightData as a proxy, thats where they (at least claim to) have the proxies from.
@tardistrailers
@tardistrailers 7 ай бұрын
@@GoldenretriverYT It'd be insane to run a resold proxy on your personal IP, just to see no ads somewhere. Worst case you get your home raided by law enforcement, because someone did something highly illegal with it. But I wouldn't be surprised if less educated people still do this.
@OrangeYTT
@OrangeYTT 6 ай бұрын
​@@GoldenretriverYT99% of "residential proxies" are just computers under a botnet. Hola (that free Vpn) got in trouble a while back for making people who used their VPN join their botnet for this very reason!
@wierdnes
@wierdnes 7 ай бұрын
Great video. I liked the step by step thought process of getting the scraper get data. One major flaw in the cost analysis you presented was the absence of any cost for brightdata. Checking the pricing myself it looks like 20€ per GB of data?
@FunctionGermany
@FunctionGermany 8 ай бұрын
new reddit probably uses an internal API you can pull from by fetching from the browser window. also note another user's comment about old reddit + cheerio (no browser needed).
@eoussama
@eoussama 3 ай бұрын
He probably used Playwright just to have an excuse to shove the Bright Data sponsorship in the vide, which I understand.
@takennmc
@takennmc 8 ай бұрын
8 cents for 3 weeks damn this really makes reddit unreasonable
@rockshankar
@rockshankar 8 ай бұрын
That does come with a significant management. the project is a simple way to get it working. Once you dig deeper there are lots of problems. Lambda and dynamodb is cheaper based on amount of requests. If you post your api endpoint in public. 1 million requests will be gone in seconds. and then using Lambda will make it more expensive than running your server. If its cheaper, someone else would have done it already.
@moldeecheese
@moldeecheese 7 ай бұрын
Why use an SQS queue to abstract the db writing interface? The solution that immediately comes to mind is to just make an abstract class. The point of SQS is to be able to handle crazy amounts of throughput (like, up to 30,000 messages per second), which isn't really what you're doing.
@WarlordEnthusiast
@WarlordEnthusiast 6 ай бұрын
I actually did something similar, we needed financial data for a project we were working on and the APIs we found were very limiting and some were very expensive. We tried using one of the cheaper ones and it straight up did not work, it had downtime of sometimes hours and when we contacted the company they basically told us it wasn't there problem. So I built a web scraper, hosted it on my server at home and scraped all the forex data I needed from their website for free.
@scaffus
@scaffus 8 ай бұрын
Great vid! Love your work
@sumirandahal76
@sumirandahal76 8 ай бұрын
Quality project ❤ content worth watching, hooks through the time. 🎉
@jerryaugusto95
@jerryaugusto95 8 ай бұрын
Is it just me or are the icons for the Go files different? How do you change these icons please?
@teamredstudio7012
@teamredstudio7012 7 ай бұрын
I would do this in a different way. I would simply write a script in whatever language, that has a get and post function so you can call the main page first, then parse the data, often websites use apis already to fetch the content, use Fiddler Classic or some other proxy server to inspect what api the website uses. When the website loads more content after scrolling, it needs to fetch the data from somewhere. Simply reproduce this api by copying the authentication tokens from the headers and providing the required headers in the requests, then parse the response body and add it to some database. I would make it store everything so if it needs to be fetched repeatedly it simply gets from offline copy instead of wasting resources fetching and parsing. I never automate browsers, if your browser can fetch the data, you can fetch it too without front end. You can also get the url to load more content from fetching the raw main page because the browser needs to know where to fetch this anyways so it's definitely defined somewhere. It's super simple to scrape websites, you only need to know how to do requests and parse json and xml in your preferred language! Don't automate browsers but just fetch it directly!
@unforgettable31
@unforgettable31 7 ай бұрын
I come from a cracking background and back in the day and this is exactly what we would do. We would write GET/POST requests with token grabbing methods and get the job done. We’d launch hundredths of threads all connected to different proxies, instead of a single web browser. Sometimes it was challenging for particular platforms because of cookies but at the end of the day it was doable.
@rossimac
@rossimac 7 ай бұрын
Websites that use recaptcha2 are ones that I've found that I need a browser to interact with. Ones that don't then yes, totally, inspect the network traffic and understand how your browser is creating the requests and then replicate them.
@S0L4RE
@S0L4RE 7 ай бұрын
+1 it’s such a massive pet peeve of mine seeing people use selenium when it could just be achieved with requests.
@cheemzboi
@cheemzboi 7 ай бұрын
@@unforgettable31 what about captchas then
@unforgettable31
@unforgettable31 7 ай бұрын
@@cheemzboi Most platforms use captchas when they detect ongoing suspicious activity, which is omitted when using proxies.
@EarlZMoade
@EarlZMoade 8 ай бұрын
Unrelated to this video - would you show how you version your dotfiles (if you do)? It would make for a good video.
@socks5proxy
@socks5proxy 7 ай бұрын
absolutely brilliant video. so very well done.
@dreamsofcode
@dreamsofcode 7 ай бұрын
Thank you! I'm glad you enjoyed it!
@kale_bhai
@kale_bhai 7 ай бұрын
Learned about the queing system utilization. But thats pretty much the obly thing new to me.
@nigerianprince5389
@nigerianprince5389 5 ай бұрын
1st off, thanks for this buddy, you're a godsend. it does feel a bit over-engineered but i guess you've gone this route because you want to build your own Reddit API. for folks like me who have only been coding everyday for 1 month using GPT - knowing how to pull the data from reddit and store in a database is the main thing i need (i think most people as well but i could be wrong). keep up the good work still and thank you again !
@-Siknakaliux-II
@-Siknakaliux-II 7 ай бұрын
So this vid popped up in my recs. Unrelated off-topic comment, but I remember getting into a programming phase in grade 6-7. I've pretty much obsessed over the thought of doing something great with it. Got myself to do a few courses but never really stuck on as ive moded onto Finance. Now I kinda wanna get into it again as I did in the past...
@zack_beard
@zack_beard 6 ай бұрын
Great content! Quick question. Did you do this after logging into to Reddit with your userid/pwd o without? IIRC Reddit does not show new content if you are not logged in. Thanks!
@dreamsofcode
@dreamsofcode 6 ай бұрын
Thank you! Logged out, which causes it to fall under publically accessible. Reddit still shows content on the old reddit website under the /new when you're not logged it.
@poggybitz513
@poggybitz513 7 ай бұрын
I did the same thing for my app using selenium bindings in rust and used vagrant to manage instances. You can use docker if you want. Please mark this video as ad, because none in their right mind would do it this way. I am so tired of people shoving ads down my throat and claiming its a good education.
@antonjoacir
@antonjoacir 8 ай бұрын
Man, could you make a video about the configurations of your terminal?
@primo_geniture
@primo_geniture 8 ай бұрын
I'm curious as to what the total time for the project was.
@dancinglazer1628
@dancinglazer1628 7 ай бұрын
Honestly, I think this infrastructure is too complicated for what it is doing. I don't really care about the sponsored bit, but I think it would have been better to simply create a lambda that directly writes to a database (assume a cacheFactory -> RedisCache | MongoCache | JsonCache) along with a "freshness" param due to the relative simplicity of the data I think redis would be a good candidate; Then all you would need to do in the API is simply fetch the data based on the query param, something which can probably be achieved in a single file.
@jp46614
@jp46614 7 ай бұрын
Yeah I feel it's been quite overengineered with all this message queue and database/service stuff, this could be done fully locally realistically and at not much of a bigger cost since nowadays OSS databases and caching solutions are really efficient
@hqcart1
@hqcart1 7 ай бұрын
he will need a 2-4GB ram VM to do that. AWS is expensive
@dancinglazer1628
@dancinglazer1628 7 ай бұрын
@@hqcart1 he is deffering the scraping to the sponsered service anyway, but I think we can just fetch the html instead of running a headless browser
@dancinglazer1628
@dancinglazer1628 7 ай бұрын
@@jp46614 This could be a single service on a docker image, run a cron scheduler that fetches and writes to a json file and have a server running that uses the json as a database
@hqcart1
@hqcart1 7 ай бұрын
@@dancinglazer1628even he uses a sponsored service, at one point you will get captcha, and my point was his code does not handle that.. and about fetching HTML, no it does not work for complex sites where HTML code or classes is getting rewritten by js, i tried that and failed, ended up using headless browser.
@jondoe79
@jondoe79 8 ай бұрын
Great content, real examples of use case for different tools for a simple but useful project.
@DodaGarcia
@DodaGarcia 7 ай бұрын
Decoupling the data persistence from the business logic is always a good idea, but using a queue service for that is bonkers. It removes none of the existing complexity, since you still eventually have to map the message payload to the database schema, and then introduces more complexity because you now have to keep track of one more service, the publishing code, the consuming code and the asynchronicity itself. Just use the repository pattern with an adapter for the chosen database, or an ORM like Prisma if you really don't expect the app to scale much.
@goofynose2520
@goofynose2520 6 ай бұрын
Agreed. I swear 90% of queues I encounter are needless overcomplications
@ShaneZarechian
@ShaneZarechian 3 ай бұрын
Someone fork this and make it non-ridiculous
@louishuort7969
@louishuort7969 8 ай бұрын
What about the cost of bright data ?
@glitchy_weasel
@glitchy_weasel 7 ай бұрын
Fantastic! Very informative, always nice to stick it to big tech lol
@5criptcom
@5criptcom 7 ай бұрын
Good one sir!
@christianjedro6206
@christianjedro6206 7 ай бұрын
How do you avoid vendor/database lock in by using AWS SQS?!
@TheHotMrDuck
@TheHotMrDuck 7 ай бұрын
i hope this doesnt kill old reddit, if they remove it im gone
@veshal.s3690
@veshal.s3690 8 ай бұрын
Would love a post on your powerlevel10k config and your terminal config
@rando521
@rando521 8 ай бұрын
hi dreams i love your vids on vim and tried it on my own due to them while trying c++ i want to know if there is a better option than cmake? i come from python so i plan on rpc-ing the python part and move to mostly c++ or golang any ideas on how to do this?
@FaZekiller-qe3uf
@FaZekiller-qe3uf 8 ай бұрын
The better option is to use a language with good tooling. Zig, Rust, Go, etc. cmake L, Make L.
@jacksonsmith4648
@jacksonsmith4648 8 ай бұрын
Meson! It's basically CMake, but with syntax similar to python, and a lot less stupid design decisions. Definitely worth a look.
@S0L4RE
@S0L4RE 7 ай бұрын
@@jacksonsmith4648why are we hating on cmake?
@xXtim128Xx
@xXtim128Xx 7 ай бұрын
Using a full webbrowser when a simple HTTP request and HTML parser would suffice...
@dreamsofcode
@dreamsofcode 7 ай бұрын
You're correct. It would have. However a browser is a more versatile option for other use cases.
@ahwx
@ahwx 8 ай бұрын
I see you're using a Mac now, what terminal is that? How are your rounded window corners so much less rounded that mine? Have you changed anything?
@sworatex1683
@sworatex1683 7 ай бұрын
Why didnt you use curl? It would bei way more lightweight than using a Browser. Most Programming languages will let you manage Dom objects with built in libraries
@grif5307
@grif5307 8 ай бұрын
One of my favourite videos in a while, great job!!!!
@iamrafiqulislam
@iamrafiqulislam 7 ай бұрын
what is the Font you are using for Nvim and tmux status bar, please?
@dreamsofcode
@dreamsofcode 7 ай бұрын
I am using JetBrainsMono Nerd Font! I have a video on both of my Nvim and tmux configs on my channel :)
@ltecheroffical
@ltecheroffical 24 күн бұрын
You can remove the browser part by using a web scraping framework that works without a browser instance.
@vekoze9872
@vekoze9872 7 ай бұрын
what is the tmux font ?
@cooperqmarshall
@cooperqmarshall 8 ай бұрын
The quality of this project is supreme their. Love the detail and consideration for the infrastructure
@chofmann
@chofmann 8 ай бұрын
you are aware of the json api that things like rif is using? basically, for every link, there is also a json file you can just access
@TrueDetectivePikachu
@TrueDetectivePikachu 7 ай бұрын
Genuine question, why use puppeteer that relies on an active browser and not something like cheerio?
@dreamsofcode
@dreamsofcode 7 ай бұрын
It's a great question. Cheerio would work really well in this case as there was little to no javascript for the old version of reddit. Initially I wanted to go with the new reddit so had scoped out using an active browser (which I think has more application beyond reddit). Cheerio is always preferable in a case with no javascript, but it's not as applicable as puppeteer is. TLDR is that I wanted to showcase active browser scraping in the video.
@jasontruter7239
@jasontruter7239 7 ай бұрын
Good job, one improvement would be to go with a single table design with DynamoDb
@edanbigw
@edanbigw 7 ай бұрын
sorry oot, did you use mac sir?
@shadyworld1
@shadyworld1 7 ай бұрын
If you could use RSS to pull the data and store them in a proper format to be used for API you’ll be able to save 40% at least of your current approach time and effort!
@techwithjoe8636
@techwithjoe8636 7 ай бұрын
Which Editor is he using? Vim?
@EarlZMoade
@EarlZMoade 8 ай бұрын
Are there any issues with legality when using the data you extract? I.e. could you use the data for commercial purposes, or research?
@ristekostadinov2820
@ristekostadinov2820 8 ай бұрын
Microsoft i think have taken someone to court for web scraping and won, i think it was a company that were scraping linkedin public data from users and were building their own app for recruiting people and microsoft were arguing that the users didn't consent to that (which is true, but then again data is public). So it's a very tricky problem, and is best to read websites terms & service.
@k98killer
@k98killer 8 ай бұрын
Would it have cost more without the brightdata sponsorship?
@louishuort7969
@louishuort7969 8 ай бұрын
Ohh yes, a lot, bright data is very expensive
@Meleeman011
@Meleeman011 5 ай бұрын
why do you use playwright and not just puppeteer?
@dandandev
@dandandev 8 ай бұрын
Heya! I'd recommend Railway to host your apps, its usage based and pretty cheap!
@pelic9608
@pelic9608 7 ай бұрын
Every modern website has an API. Most just aren't documented. 🤷‍♂️ Copy their own website's auth flow and use those tokens to drive your app. Wjat are they gonna do? Paywall their entire site? (Ok, ok; SSR is a thing, but there's still almost always some pure-data endpoint around)
@sheldonsays9922
@sheldonsays9922 6 ай бұрын
How long did it actually take for you to complete this project.
@jakestrouse12
@jakestrouse12 7 ай бұрын
You can also reverse engineer their private api by looking at the browser network requests. The scraping will be much faster
@S0L4RE
@S0L4RE 7 ай бұрын
Although Cloudflare IUAM makes it an immense pain in the ass
@batmanatkinson1188
@batmanatkinson1188 7 ай бұрын
And keep in mind that private APIs are susceptible to change, so today it’s gonna work, tomorrow you have to start over
@unaif.2171
@unaif.2171 7 ай бұрын
​@@batmanatkinson1188less often than the html
@TheSaintsVEVO
@TheSaintsVEVO 6 ай бұрын
@@S0L4REwhat’s that? Does Reddit use it?
@S0L4RE
@S0L4RE 6 ай бұрын
@@TheSaintsVEVO I’m not sure if Reddit uses it, but IUAM detects very low-level characteristics about the request (i.e cipher mode, SSL configuration) to determine whether it looks automated.
@heckerhecker8246
@heckerhecker8246 7 ай бұрын
How to get four hitmen at your door:
@betapacket
@betapacket 7 ай бұрын
2:02 isn't playright yet another ECM and not a web scraper?
@houstonbova3136
@houstonbova3136 8 ай бұрын
DataStore and FireStore work roughly the same as Dynamo, no?
@pchris
@pchris 7 ай бұрын
Would something like this work for third-party applications like Reddit Apollo?
@CrazyWinner357
@CrazyWinner357 7 ай бұрын
It can work... until you get a captcha
@JoshIbbotson-
@JoshIbbotson- 6 ай бұрын
How long have you been programming? Loved this video btw!
@dreamsofcode
@dreamsofcode 6 ай бұрын
Thank you! I've been writing code since 2008.
@stylrart
@stylrart 7 ай бұрын
Nice you are using JB Mono, like me. what theme are you using, the colors are handsome ;)
@qCJLbggG4IWAY9nTH6o
@qCJLbggG4IWAY9nTH6o 8 ай бұрын
why not use their rss feed?
@willmil1199
@willmil1199 5 ай бұрын
How do we use your api then ?
@creeperlolthetrouble
@creeperlolthetrouble 7 ай бұрын
xD i've seen this coming for months but why not keep AWS and tunnel the requests through a proxy
@Shudshudu
@Shudshudu 7 ай бұрын
Sir am learning c and am new to programming. Currently am learning control structure. But when i look into real world projects I don’t understand anything why
@user-hy6cp6xp9f
@user-hy6cp6xp9f 7 ай бұрын
It takes time! Also C is a VERY different level of abstraction than Javascript / Go like he used here.
@metalspoon69
@metalspoon69 8 ай бұрын
"Just build your own API" *builds own API* "NOO NOT LIKE THAT!!!!"
@juanmacias5922
@juanmacias5922 8 ай бұрын
Bahahaha...
@siniarskimar
@siniarskimar 8 ай бұрын
How about developing a browser extension for "enhancing" reddit that would additionaly scrape any post that user sees 🤔
@hemant_san
@hemant_san 7 ай бұрын
how to bypass capctha?
@mx338
@mx338 6 ай бұрын
DynamoDB isn't really low cost, so I would definitely look into switching to ScyllaDB which offers a DynamoDB compatible API.
@user-nr1qk6oi7g
@user-nr1qk6oi7g 7 ай бұрын
if you used python you could easily bypass ip blocking with torpy
@filiprandom
@filiprandom 2 ай бұрын
I watched this video for 4 hours because it was on repeat and I fell asleep
@mr.togrul--9383
@mr.togrul--9383 8 ай бұрын
Great video btw! In the future I also want to make my own web scraper project and this just simplified everything I need to do. Is there any reason why you didnt just use Golang for the whole thing, for the scraper as well? just curious, since as you said writing golang would be more faster than node js
@JeanHirtz-ms3bf
@JeanHirtz-ms3bf 8 ай бұрын
Curious about Golang - any repo / vids ?
@navaneeth6157
@navaneeth6157 8 ай бұрын
chromedp for golang is also an option
@Dev-Siri
@Dev-Siri 8 ай бұрын
tip: bun 1.0 has been released just last day, and you can use it as a drop-in-replacement for node. it executes js much faster, without breaking anything so it can magically make your api faster. for deployment, you need to use a docker image because its still very early and not supported by any platforms (yet)
@ac130kz
@ac130kz 8 ай бұрын
it just get stuck if I try to run puppeteer with whatsapwebjs, yeah, fast and cool, but too early
@Puwunda
@Puwunda 7 ай бұрын
Intercontinental Lawsuit Inbound!!!
@dimagass7801
@dimagass7801 7 ай бұрын
I have no clue how to use apis I still don't completely understand but data is the new oil😅
@_Mackan
@_Mackan 8 ай бұрын
virgin api consumer vs chad scraper
@robinbinder8658
@robinbinder8658 7 ай бұрын
boi do i smell a cease and desist
@reihanboo
@reihanboo 7 ай бұрын
didn't understand anything but great video
@Rundik
@Rundik 8 ай бұрын
You don't need any browser to scrape html from reddit. How did you even managed to configure vim with that kind of skills?
@night23412
@night23412 8 ай бұрын
what about pressing the next button, don't you need a browser emulator for that?
@Rundik
@Rundik 8 ай бұрын
@@night23412 unless you need to take a screenshot or you don't have much experience/time using puppetier-like tools is extreamly wasteful. And for simple text scraping you don't even need that much experience at all
@TheArchimede2000
@TheArchimede2000 7 ай бұрын
he never disappoints
@guillemgarcia3630
@guillemgarcia3630 8 ай бұрын
jesus there's more terraform configuration than code
@flor.7797
@flor.7797 5 ай бұрын
There’s no AI without API
@hqcart1
@hqcart1 7 ай бұрын
what about cAaptcha ??????????????????????
@VRGamerBoi
@VRGamerBoi 7 ай бұрын
Chatgpt told me about this
@ultimatetoast2739
@ultimatetoast2739 6 ай бұрын
Apicels be seething over scrapechads
@mikaay4269
@mikaay4269 7 ай бұрын
Application Paying Interface
@mayar2047
@mayar2047 8 ай бұрын
I'm thinking of just scrape reddit directly from a mobile device, and maybe save the data to the device for caching. I don't need to pay for anything
@xybersurfer
@xybersurfer 8 ай бұрын
i was with you until you started putting things in a database and the cloud. was it because your video was sponsored by a cloud provider? (i really can't tell) it would be more interesting to see you justifying decisions. seeing all the code is really not that interesting. the overall idea of creating your own reddit API is interesting though, so i will give this a like
@_soundwave_
@_soundwave_ 5 ай бұрын
A very interesting comment section.
@pixel690
@pixel690 7 ай бұрын
$20 per GB is something different jesus
@bieggerm
@bieggerm 8 ай бұрын
This video shows the only way an arms race should be visualized
@lowlevell0ser25
@lowlevell0ser25 8 ай бұрын
They will block things like this with Web Environment Integrity
@itsjohannawren
@itsjohannawren 7 ай бұрын
Application Profit Initiative
@earu_arcana
@earu_arcana 6 ай бұрын
Nice video, but your setup is a lot more complex than it needs to be IMO.
@DaMu24
@DaMu24 5 ай бұрын
Ok, give it to me
@lollermann
@lollermann 6 ай бұрын
Don't let pyrocynical see this video he'll become a web dev
Using docker in unusual ways
12:58
Dreams of Code
Рет қаралды 389 М.
When RESTful architecture isn't enough...
21:02
Dreams of Code
Рет қаралды 240 М.
La final estuvo difícil
00:34
Juan De Dios Pantoja
Рет қаралды 29 МЛН
Заметили?
00:11
Double Bubble
Рет қаралды 3 МЛН
ДЕНЬ РОЖДЕНИЯ БАБУШКИ #shorts
00:19
Паша Осадчий
Рет қаралды 6 МЛН
Кәріс өшін алды...| Synyptas 3 | 10 серия
24:51
Yonatan Kra, Vonage  - Test Drive Your Code
28:42
Chegg Israel
Рет қаралды 3
Web Scraping with Python - Start HERE
20:58
John Watson Rooney
Рет қаралды 29 М.
how NASA writes space-proof code
6:03
Low Level Learning
Рет қаралды 2 МЛН
Tmux has forever changed the way I write code.
13:30
Dreams of Code
Рет қаралды 898 М.
How principled coders outperform the competition
11:11
Coderized
Рет қаралды 1,5 МЛН
everything is open source if you can reverse engineer (try it RIGHT NOW!)
13:56
Low Level Learning
Рет қаралды 1,2 МЛН
What does larger scale software development look like?
24:15
Web Dev Cody
Рет қаралды 1,2 МЛН
The Biggest Mistake Beginners Make When Web Scraping
10:21
John Watson Rooney
Рет қаралды 100 М.
Never install locally
5:45
Coderized
Рет қаралды 1,6 МЛН
Adding a cache is not as simple as it may seem...
13:29
Dreams of Code
Рет қаралды 103 М.
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 2,1 МЛН
Не обзор DJI Osmo Pocket 3 Creator Combo
1:00
superfirsthero
Рет қаралды 1,3 МЛН
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 2,8 МЛН