Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1
@AWIRE_onpc8 ай бұрын
no
@xulaxwtf8 ай бұрын
no
@aryanszone49638 ай бұрын
no
@noviui7 ай бұрын
no thanks
@user-uv3nv2bc6v6 ай бұрын
no
@jaymarksum65429 ай бұрын
I’m impressed, can’t wait to see you build a multithreaded web server in assembly
@da40au409 ай бұрын
Why do I find it super funny 😅😅😅.
@ArthursHD9 ай бұрын
@@da40au40 Me too :D
@DanskeCrimeRiderTV8 ай бұрын
it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
@KibitoAkuya8 ай бұрын
@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites
@DanskeCrimeRiderTV8 ай бұрын
@@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.
@lifeofme7029 ай бұрын
I don't know what this guy said, and still was mind-blown of all the effort this guy puts
@conaticus9 ай бұрын
Thanks much so 🙏 It would not be possible without your support
@ccost8 ай бұрын
7:40 flashing those questionable websites in a sponsored video is quite the move
@twitchizle8 ай бұрын
You scared of porn?
@coderx86349 ай бұрын
Love your content. You and your quality have really improved. Keep it up ❤
@conaticus9 ай бұрын
Thanks so much, your support means a lot ♥
@coderan50298 ай бұрын
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
@JoshuaLonsako8 ай бұрын
W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.
@6IGNITION98 ай бұрын
filter out JS for another 10x bandwidth savings alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)
@SG-kn2jl8 ай бұрын
Why did you choose TF-IDF instead of word2vec or any context aware model?
@skorp56778 ай бұрын
+1 Woule like to know
@devinlauderdale96359 ай бұрын
The problem is this approach is susceptible to SEO spamming/invisible SEO keywords
@conaticus9 ай бұрын
Yeah for sure, realistically it should be moderated based on user interaction as well
@jjwe20022 ай бұрын
@@conaticus How would you do that?
@greensporevalley9 ай бұрын
SERBIA MENTIONED 🎉🎉🎉
@RealMephres9 ай бұрын
@europa_the_last_battle>goes to comments >sees meme comment >looks at replies >only a LARPer replied lol
@jawadmansoor60648 ай бұрын
that name rings a bell, maybe from some kind of Serbian movie?
@RealMephres8 ай бұрын
@@MAXHASS-ph5ib tell that to the LARPer dawg
@slimeyar8 ай бұрын
@@RealMephrestell that to yourself 😊
@RealMephres8 ай бұрын
@@slimeyar you first
@80sVectorz8 ай бұрын
3:07 Best pronunciation of Euclidean I have every heard :P
@CrazyDiamondo8 ай бұрын
Where?
@80sVectorz8 ай бұрын
@@CrazyDiamondo I added a timestamp
@anitaweasel8 ай бұрын
Nice, you re-invented the lucene library
@aryakvn60516 ай бұрын
You could calculate and cache TF values on the fly so you don’t fill up your ram as quickly but still get a decent response time.
@rafaelpereiracoias10478 ай бұрын
Nice video and nice code, keep up the good work!
@Raven-fu1zz8 ай бұрын
Remember, never return an over 18 site without an over 18 word in the search request
@polyshrub8 ай бұрын
This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big
@MySachincool7 ай бұрын
Subscribed & notifications on :) you deserve more recognition bruh
@R_Y_Z_E_N8 ай бұрын
Google also does the same but with disstributed computing to reduce the overall time . Just scale the database horizontally and mimic googles apporach
@turb00048 ай бұрын
Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉
@foqsi_8 ай бұрын
Love this dude and his video projects
@conaticus8 ай бұрын
🙏
@rmt35892 ай бұрын
Awesome video! Will help immensely when I eventually make an AI RAG search engine. I wanna see if I can add blacklisted and whitelisted websites. That way things like useless citation sites and spam sites cannot come up, but things like Wikipedia and websites I get good results from ahow up more.
@soulcpp7 күн бұрын
Definitely a very great video, keep it up!!!
@ExpandedCuber9 ай бұрын
Let's go another conaticus video
@dreamsofcode9 ай бұрын
🔥🔥🔥
@MortonMcCastle8 ай бұрын
Good! The world needs a new Google Search, one that's more like how it was in the 2000s.
@madalenaferreira30188 ай бұрын
great video, gave me ptsd from my information retrieval class though
@alexmoses32157 ай бұрын
Programming 🤝 martincitopants…match made in heaven
@stayhappy-forever8 ай бұрын
thats insane, hows this only at 12k views
@GermanTimecrafter9 ай бұрын
such a cool video! i love the way how you explain what you are doing :) random question but what is your editor font?
@conaticus8 ай бұрын
Appreciate it :) I'm using Jetbrains Mono it's free to download
@TheRojo387Ай бұрын
In high school, I could outperform search engines of the time. I don't think I can say the same for today's search engines.
@jsalsman8 ай бұрын
I believe it's "inverted indexing", as inverse indexing is something else.
@miro51826 ай бұрын
You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.
@gammongaming90818 ай бұрын
yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths
@jugurtha2928 ай бұрын
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
@animeworld47758 ай бұрын
what is things that i should to know or learn to create like these projects
@HyperCodec8 ай бұрын
Bro managed to memleak in js
@yorailevi67478 ай бұрын
how much did you pay for the web scraping service in total?
@ethanstewart10118 ай бұрын
How did you manage to get a node.js memory leak??
@thekwoka47078 ай бұрын
How much did the scraping cost if it wasn't free?
@lonelybookworm8 ай бұрын
Well of course it is very fast, it only has like 200 websites
@maksymilianglowacki14098 ай бұрын
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )
@errplane_9 ай бұрын
oh my fuck i saw this on your github last night
@iritesh8 ай бұрын
Awesome effort ✨
@allenfpascua9 ай бұрын
Super good editing 🫡🫡🫡🫡
@conaticus8 ай бұрын
Would not possible with your breathtaking animations 😄
@gamedirection_us8 ай бұрын
🍎 👀 .. Apple being like "when will it be ready?".
@gopallohar55347 ай бұрын
ain't see rust there!
@guidedorphas108 ай бұрын
6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀
@joenutt12329 ай бұрын
Create your own database engine for shits and giggles
@conaticus8 ай бұрын
B+Trees 💀
@sleepybraincells9 ай бұрын
Why is there Rust in the thumbnail? This was written in Javascript
@conaticus8 ай бұрын
Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations
@MinecraftRecordings14 ай бұрын
whats the link?
@a6gittiworld8 ай бұрын
Supa dope. I would like to use this search engine of yours
@monotonedevelopment8 ай бұрын
If only windows file explorer could do the same
@SandWire8 ай бұрын
For this we have thing named Everything :)
@callowaysutton8 ай бұрын
Next time use the Common Crawl dataset ;)
@--bountyhunter--6 ай бұрын
bro thought he could scrape my web and get away with it.
@SlimyFrog1239 ай бұрын
Now make your own email system to go along with it. 😉
@synapsenova299-fp7tf8 ай бұрын
>goes to youtube homepage >finds this video >yipeee >oh >lets try it
@FastCarsLoudMusic2 ай бұрын
This video is so good. Instant hook.
@mahrezjanati34268 ай бұрын
first time watching a vid of yours ... i have one question : why are you vibrating ??
@-rate63268 ай бұрын
Cause he is vibrator
@InioluwaFalade-Tolulope6 ай бұрын
don't know either
@schoolbreakyay5 ай бұрын
Can i not use brightdata?
@playtatus17589 ай бұрын
how do you edit your vids
@conaticus8 ай бұрын
Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁
@playtatus17588 ай бұрын
@@conaticus ok thx
@daemonkisure29529 ай бұрын
how can i install this search engine?
@conaticus8 ай бұрын
Instructions are on the Github repos :)
@Faeest8 ай бұрын
why disallow and user-agent matter? can't you just scrap everything?
@skorp56778 ай бұрын
You can but it might be illegal
@AquaQuokka9 ай бұрын
Rewrite your genetic code in Rust.
@pyyrr9 ай бұрын
i would rather be bug free so i will pass
@Nerdimo9 ай бұрын
Impressive, seriously!
@humanontheinternet65107 ай бұрын
Auto solve captcha you say🧐
@Miluum8 ай бұрын
1:06 automatically solve captchas? i knew these things exist just to waste our time and energy
@deepfan147 ай бұрын
Bro make a compiler programming language
@Rudransh-hu8 ай бұрын
You should host it
@Serhii_Volchetskyi8 ай бұрын
🔥🔥🔥 I was looking for that algorithm and didn't know its name.
@_DarkLiquid9 ай бұрын
discord clone when
@iCrimzon6 ай бұрын
Cant wait for you to rewrite JS in binary 🎉🎉
@danielisop31828 ай бұрын
What did u mean by the websites u shouldn’t have searched
@binpersonal8 ай бұрын
"some fucking genius" lmao
@neologicalgamer34378 ай бұрын
Bro sounds like WilburSoot
@fangg1948 ай бұрын
you seem ok
@AttaaH6 ай бұрын
0:33 🤨
@chiroyce9 ай бұрын
What are the consequences of scrapings sites you aren't allowed to?
@conaticus8 ай бұрын
Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt
@314cubed8 ай бұрын
wastes their resources and yours
@Xanmattauri8 ай бұрын
@google acquire this man
@StellarWeb0086 күн бұрын
"Always bet on javascript"
@v037_8 ай бұрын
I found a worthy opponent
@juniordevmedia9 ай бұрын
what TF is IDF ?!!
@neofox25269 ай бұрын
idk man but watching it makes me feel smart
@jamesbarret42409 ай бұрын
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
@larry_berry9 ай бұрын
Lol. Got notif after clicking the video.
@AhmedMo-ec4kz7 ай бұрын
Great video 😊 FYI: bright data is an Israeli company 😮
@gaimnbro93378 ай бұрын
Nice job :D
@konstantinsotov62518 ай бұрын
we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF. also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone
@skorp56778 ай бұрын
skill issue
@konstantinsotov62518 ай бұрын
@@skorp5677 exactly
@Ayymoss9 ай бұрын
MAKE LONGER VIDEOS
@ph03n1x_dev9 ай бұрын
You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀
@conaticus9 ай бұрын
All open source and ready to play around with 😂
@susannerudolph84698 ай бұрын
then brightdata makes captchas useless
@educacionespecialchannel37568 ай бұрын
Captcha's effectiveness has been in question for quite some time now.
@Naw1dawg6 ай бұрын
protects against amateurs but keeps it simple enough that an expert won’t breach/destroy their data to get what they want.
@_sohom9 ай бұрын
Make a better version of VSCode.
@kavinbharathi9 ай бұрын
Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi
@conaticus9 ай бұрын
Thank you, I'll do this if I ever pronounce it again 😂
@planktonfun18 ай бұрын
Still not fast and scalable enough. The result is not even relevant, you made bing not google
@LaugeHeiberg8 ай бұрын
wow really? Im also surprised one single guy didnt manage to make a product rivaling Google
@gamefun25256 ай бұрын
wow Sheldon, you got your Nobel yet?
@Horn7xGaming8 ай бұрын
hub 🎉🎉
@Naw1dawg6 ай бұрын
So you’re telling me I can access restricted data by telling it to, basically, ignore restrictions??? I Have been calling myself dev, admin, ownr, root in vain for far too long
@lukamajcenic11728 ай бұрын
This is just an ad for BrightData. Compared to previous videos very low effort.