I Made a FAST Search Engine

  Рет қаралды 160,704

conaticus

conaticus

Күн бұрын

Пікірлер: 186
@conaticus
@conaticus 9 ай бұрын
Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1
@AWIRE_onpc
@AWIRE_onpc 8 ай бұрын
no
@xulaxwtf
@xulaxwtf 8 ай бұрын
no
@aryanszone4963
@aryanszone4963 8 ай бұрын
no
@noviui
@noviui 7 ай бұрын
no thanks
@user-uv3nv2bc6v
@user-uv3nv2bc6v 6 ай бұрын
no
@jaymarksum6542
@jaymarksum6542 9 ай бұрын
I’m impressed, can’t wait to see you build a multithreaded web server in assembly
@da40au40
@da40au40 9 ай бұрын
Why do I find it super funny 😅😅😅.
@ArthursHD
@ArthursHD 9 ай бұрын
@@da40au40 Me too :D
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV 8 ай бұрын
it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
@KibitoAkuya
@KibitoAkuya 8 ай бұрын
​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV 8 ай бұрын
@@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.
@lifeofme702
@lifeofme702 9 ай бұрын
I don't know what this guy said, and still was mind-blown of all the effort this guy puts
@conaticus
@conaticus 9 ай бұрын
Thanks much so 🙏 It would not be possible without your support
@ccost
@ccost 8 ай бұрын
7:40 flashing those questionable websites in a sponsored video is quite the move
@twitchizle
@twitchizle 8 ай бұрын
You scared of porn?
@coderx8634
@coderx8634 9 ай бұрын
Love your content. You and your quality have really improved. Keep it up ❤
@conaticus
@conaticus 9 ай бұрын
Thanks so much, your support means a lot ♥
@coderan5029
@coderan5029 8 ай бұрын
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
@JoshuaLonsako
@JoshuaLonsako 8 ай бұрын
W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.
@6IGNITION9
@6IGNITION9 8 ай бұрын
filter out JS for another 10x bandwidth savings alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)
@SG-kn2jl
@SG-kn2jl 8 ай бұрын
Why did you choose TF-IDF instead of word2vec or any context aware model?
@skorp5677
@skorp5677 8 ай бұрын
+1 Woule like to know
@devinlauderdale9635
@devinlauderdale9635 9 ай бұрын
The problem is this approach is susceptible to SEO spamming/invisible SEO keywords
@conaticus
@conaticus 9 ай бұрын
Yeah for sure, realistically it should be moderated based on user interaction as well
@jjwe2002
@jjwe2002 2 ай бұрын
@@conaticus How would you do that?
@greensporevalley
@greensporevalley 9 ай бұрын
SERBIA MENTIONED 🎉🎉🎉
@RealMephres
@RealMephres 9 ай бұрын
​@europa_the_last_battle>goes to comments >sees meme comment >looks at replies >only a LARPer replied lol
@jawadmansoor6064
@jawadmansoor6064 8 ай бұрын
that name rings a bell, maybe from some kind of Serbian movie?
@RealMephres
@RealMephres 8 ай бұрын
@@MAXHASS-ph5ib tell that to the LARPer dawg
@slimeyar
@slimeyar 8 ай бұрын
​​@@RealMephrestell that to yourself 😊
@RealMephres
@RealMephres 8 ай бұрын
@@slimeyar you first
@80sVectorz
@80sVectorz 8 ай бұрын
3:07 Best pronunciation of Euclidean I have every heard :P
@CrazyDiamondo
@CrazyDiamondo 8 ай бұрын
Where?
@80sVectorz
@80sVectorz 8 ай бұрын
@@CrazyDiamondo I added a timestamp
@anitaweasel
@anitaweasel 8 ай бұрын
Nice, you re-invented the lucene library
@aryakvn6051
@aryakvn6051 6 ай бұрын
You could calculate and cache TF values on the fly so you don’t fill up your ram as quickly but still get a decent response time.
@rafaelpereiracoias1047
@rafaelpereiracoias1047 8 ай бұрын
Nice video and nice code, keep up the good work!
@Raven-fu1zz
@Raven-fu1zz 8 ай бұрын
Remember, never return an over 18 site without an over 18 word in the search request
@polyshrub
@polyshrub 8 ай бұрын
This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big
@MySachincool
@MySachincool 7 ай бұрын
Subscribed & notifications on :) you deserve more recognition bruh
@R_Y_Z_E_N
@R_Y_Z_E_N 8 ай бұрын
Google also does the same but with disstributed computing to reduce the overall time . Just scale the database horizontally and mimic googles apporach
@turb0004
@turb0004 8 ай бұрын
Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉
@foqsi_
@foqsi_ 8 ай бұрын
Love this dude and his video projects
@conaticus
@conaticus 8 ай бұрын
🙏
@rmt3589
@rmt3589 2 ай бұрын
Awesome video! Will help immensely when I eventually make an AI RAG search engine. I wanna see if I can add blacklisted and whitelisted websites. That way things like useless citation sites and spam sites cannot come up, but things like Wikipedia and websites I get good results from ahow up more.
@soulcpp
@soulcpp 7 күн бұрын
Definitely a very great video, keep it up!!!
@ExpandedCuber
@ExpandedCuber 9 ай бұрын
Let's go another conaticus video
@dreamsofcode
@dreamsofcode 9 ай бұрын
🔥🔥🔥
@MortonMcCastle
@MortonMcCastle 8 ай бұрын
Good! The world needs a new Google Search, one that's more like how it was in the 2000s.
@madalenaferreira3018
@madalenaferreira3018 8 ай бұрын
great video, gave me ptsd from my information retrieval class though
@alexmoses3215
@alexmoses3215 7 ай бұрын
Programming 🤝 martincitopants…match made in heaven
@stayhappy-forever
@stayhappy-forever 8 ай бұрын
thats insane, hows this only at 12k views
@GermanTimecrafter
@GermanTimecrafter 9 ай бұрын
such a cool video! i love the way how you explain what you are doing :) random question but what is your editor font?
@conaticus
@conaticus 8 ай бұрын
Appreciate it :) I'm using Jetbrains Mono it's free to download
@TheRojo387
@TheRojo387 Ай бұрын
In high school, I could outperform search engines of the time. I don't think I can say the same for today's search engines.
@jsalsman
@jsalsman 8 ай бұрын
I believe it's "inverted indexing", as inverse indexing is something else.
@miro5182
@miro5182 6 ай бұрын
You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.
@gammongaming9081
@gammongaming9081 8 ай бұрын
yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths
@jugurtha292
@jugurtha292 8 ай бұрын
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
@animeworld4775
@animeworld4775 8 ай бұрын
what is things that i should to know or learn to create like these projects
@HyperCodec
@HyperCodec 8 ай бұрын
Bro managed to memleak in js
@yorailevi6747
@yorailevi6747 8 ай бұрын
how much did you pay for the web scraping service in total?
@ethanstewart1011
@ethanstewart1011 8 ай бұрын
How did you manage to get a node.js memory leak??
@thekwoka4707
@thekwoka4707 8 ай бұрын
How much did the scraping cost if it wasn't free?
@lonelybookworm
@lonelybookworm 8 ай бұрын
Well of course it is very fast, it only has like 200 websites
@maksymilianglowacki1409
@maksymilianglowacki1409 8 ай бұрын
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )
@errplane_
@errplane_ 9 ай бұрын
oh my fuck i saw this on your github last night
@iritesh
@iritesh 8 ай бұрын
Awesome effort ✨
@allenfpascua
@allenfpascua 9 ай бұрын
Super good editing 🫡🫡🫡🫡
@conaticus
@conaticus 8 ай бұрын
Would not possible with your breathtaking animations 😄
@gamedirection_us
@gamedirection_us 8 ай бұрын
🍎 👀 .. Apple being like "when will it be ready?".
@gopallohar5534
@gopallohar5534 7 ай бұрын
ain't see rust there!
@guidedorphas10
@guidedorphas10 8 ай бұрын
6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀
@joenutt1232
@joenutt1232 9 ай бұрын
Create your own database engine for shits and giggles
@conaticus
@conaticus 8 ай бұрын
B+Trees 💀
@sleepybraincells
@sleepybraincells 9 ай бұрын
Why is there Rust in the thumbnail? This was written in Javascript
@conaticus
@conaticus 8 ай бұрын
Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations
@MinecraftRecordings1
@MinecraftRecordings1 4 ай бұрын
whats the link?
@a6gittiworld
@a6gittiworld 8 ай бұрын
Supa dope. I would like to use this search engine of yours
@monotonedevelopment
@monotonedevelopment 8 ай бұрын
If only windows file explorer could do the same
@SandWire
@SandWire 8 ай бұрын
For this we have thing named Everything :)
@callowaysutton
@callowaysutton 8 ай бұрын
Next time use the Common Crawl dataset ;)
@--bountyhunter--
@--bountyhunter-- 6 ай бұрын
bro thought he could scrape my web and get away with it.
@SlimyFrog123
@SlimyFrog123 9 ай бұрын
Now make your own email system to go along with it. 😉
@synapsenova299-fp7tf
@synapsenova299-fp7tf 8 ай бұрын
>goes to youtube homepage >finds this video >yipeee >oh >lets try it
@FastCarsLoudMusic
@FastCarsLoudMusic 2 ай бұрын
This video is so good. Instant hook.
@mahrezjanati3426
@mahrezjanati3426 8 ай бұрын
first time watching a vid of yours ... i have one question : why are you vibrating ??
@-rate6326
@-rate6326 8 ай бұрын
Cause he is vibrator
@InioluwaFalade-Tolulope
@InioluwaFalade-Tolulope 6 ай бұрын
don't know either
@schoolbreakyay
@schoolbreakyay 5 ай бұрын
Can i not use brightdata?
@playtatus1758
@playtatus1758 9 ай бұрын
how do you edit your vids
@conaticus
@conaticus 8 ай бұрын
Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁
@playtatus1758
@playtatus1758 8 ай бұрын
@@conaticus ok thx
@daemonkisure2952
@daemonkisure2952 9 ай бұрын
how can i install this search engine?
@conaticus
@conaticus 8 ай бұрын
Instructions are on the Github repos :)
@Faeest
@Faeest 8 ай бұрын
why disallow and user-agent matter? can't you just scrap everything?
@skorp5677
@skorp5677 8 ай бұрын
You can but it might be illegal
@AquaQuokka
@AquaQuokka 9 ай бұрын
Rewrite your genetic code in Rust.
@pyyrr
@pyyrr 9 ай бұрын
i would rather be bug free so i will pass
@Nerdimo
@Nerdimo 9 ай бұрын
Impressive, seriously!
@humanontheinternet6510
@humanontheinternet6510 7 ай бұрын
Auto solve captcha you say🧐
@Miluum
@Miluum 8 ай бұрын
1:06 automatically solve captchas? i knew these things exist just to waste our time and energy
@deepfan14
@deepfan14 7 ай бұрын
Bro make a compiler programming language
@Rudransh-hu
@Rudransh-hu 8 ай бұрын
You should host it
@Serhii_Volchetskyi
@Serhii_Volchetskyi 8 ай бұрын
🔥🔥🔥 I was looking for that algorithm and didn't know its name.
@_DarkLiquid
@_DarkLiquid 9 ай бұрын
discord clone when
@iCrimzon
@iCrimzon 6 ай бұрын
Cant wait for you to rewrite JS in binary 🎉🎉
@danielisop3182
@danielisop3182 8 ай бұрын
What did u mean by the websites u shouldn’t have searched
@binpersonal
@binpersonal 8 ай бұрын
"some fucking genius" lmao
@neologicalgamer3437
@neologicalgamer3437 8 ай бұрын
Bro sounds like WilburSoot
@fangg194
@fangg194 8 ай бұрын
you seem ok
@AttaaH
@AttaaH 6 ай бұрын
0:33 🤨
@chiroyce
@chiroyce 9 ай бұрын
What are the consequences of scrapings sites you aren't allowed to?
@conaticus
@conaticus 8 ай бұрын
Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt
@314cubed
@314cubed 8 ай бұрын
wastes their resources and yours
@Xanmattauri
@Xanmattauri 8 ай бұрын
@google acquire this man
@StellarWeb008
@StellarWeb008 6 күн бұрын
"Always bet on javascript"
@v037_
@v037_ 8 ай бұрын
I found a worthy opponent
@juniordevmedia
@juniordevmedia 9 ай бұрын
what TF is IDF ?!!
@neofox2526
@neofox2526 9 ай бұрын
idk man but watching it makes me feel smart
@jamesbarret4240
@jamesbarret4240 9 ай бұрын
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
@larry_berry
@larry_berry 9 ай бұрын
Lol. Got notif after clicking the video.
@AhmedMo-ec4kz
@AhmedMo-ec4kz 7 ай бұрын
Great video 😊 FYI: bright data is an Israeli company 😮
@gaimnbro9337
@gaimnbro9337 8 ай бұрын
Nice job :D
@konstantinsotov6251
@konstantinsotov6251 8 ай бұрын
we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF. also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone
@skorp5677
@skorp5677 8 ай бұрын
skill issue
@konstantinsotov6251
@konstantinsotov6251 8 ай бұрын
@@skorp5677 exactly
@Ayymoss
@Ayymoss 9 ай бұрын
MAKE LONGER VIDEOS
@ph03n1x_dev
@ph03n1x_dev 9 ай бұрын
You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀
@conaticus
@conaticus 9 ай бұрын
All open source and ready to play around with 😂
@susannerudolph8469
@susannerudolph8469 8 ай бұрын
then brightdata makes captchas useless
@educacionespecialchannel3756
@educacionespecialchannel3756 8 ай бұрын
Captcha's effectiveness has been in question for quite some time now.
@Naw1dawg
@Naw1dawg 6 ай бұрын
protects against amateurs but keeps it simple enough that an expert won’t breach/destroy their data to get what they want.
@_sohom
@_sohom 9 ай бұрын
Make a better version of VSCode.
@kavinbharathi
@kavinbharathi 9 ай бұрын
Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi
@conaticus
@conaticus 9 ай бұрын
Thank you, I'll do this if I ever pronounce it again 😂
@planktonfun1
@planktonfun1 8 ай бұрын
Still not fast and scalable enough. The result is not even relevant, you made bing not google
@LaugeHeiberg
@LaugeHeiberg 8 ай бұрын
wow really? Im also surprised one single guy didnt manage to make a product rivaling Google
@gamefun2525
@gamefun2525 6 ай бұрын
wow Sheldon, you got your Nobel yet?
@Horn7xGaming
@Horn7xGaming 8 ай бұрын
hub 🎉🎉
@Naw1dawg
@Naw1dawg 6 ай бұрын
So you’re telling me I can access restricted data by telling it to, basically, ignore restrictions??? I Have been calling myself dev, admin, ownr, root in vain for far too long
@lukamajcenic1172
@lukamajcenic1172 8 ай бұрын
This is just an ad for BrightData. Compared to previous videos very low effort.
@TheRealMangoDev
@TheRealMangoDev 9 ай бұрын
good vid
@ssoka-m5n
@ssoka-m5n 8 ай бұрын
rust is a real badass❤❤
@igrb
@igrb 8 ай бұрын
nice
@J0Y22
@J0Y22 8 ай бұрын
shockedd
@Macellaio94
@Macellaio94 8 ай бұрын
Liked and subbed
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 211 М.
I Scraped the Entire Steam Catalog, Here’s the Data
11:29
Newbie Indie Game Dev
Рет қаралды 616 М.
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
How I RUINED My Rust Project
4:58
conaticus
Рет қаралды 57 М.
Building Real-time Apps with Go | Azim Pulat
54:58
Azim Pulat
Рет қаралды 73 М.
Why LLMs Are Going to a Dead End Explained | AGI Lambda
14:46
AGI Lambda
Рет қаралды 6 М.
I Optimised My Game Engine Up To 12000 FPS
11:58
Vercidium
Рет қаралды 741 М.
Python laid waste to my C++!
17:18
Sheafification of G
Рет қаралды 160 М.
This is how I scrape 99% websites via LLM
22:44
AI Jason
Рет қаралды 180 М.
I made a SEARCH ENGINE from scratch!
10:27
Daniel Zhang
Рет қаралды 14 М.
Coding Adventure: Rendering Text
1:10:54
Sebastian Lague
Рет қаралды 787 М.
The 3 Laws of Writing Readable Code
5:28
Kantan Coding
Рет қаралды 787 М.