The first thing I thought when you said href will always be a valid link was "not really". And then you showed the descent into madness, love it.
@mathman056912 күн бұрын
I was about to say this lol
@cinderwolf3212 күн бұрын
Client side hash string routing: "I'd like to introduce myself"
@reed651411 күн бұрын
I suggest scraping your local news websites and making a search engine just for local news. It could actually be useful for your community.
@HootMoot16 күн бұрын
Good stuff and nicely scripted and edited. Once you get a mic and some traction, you'll be on your way to 100k subs. Everything but audio is extremely high quality and well done!
@smb139711 күн бұрын
"my hard drive was getting too full" shows 112 gb used. walled garden problems
@ujjwaldimri239214 күн бұрын
amazing video, it is well made and very informative. hard to believe you only have a 100 subs
@vitaliykishchenko436512 күн бұрын
This is awesome work! Have no idea how you're doing all this at this age, you have a bright future. Research and editing all done very well. I love the TF-IDF explanation, super concise.
@deepbrar110 күн бұрын
This is awesome. The video is really informative and you explained everything so perfectly. Great content, got a lot to learn from it.
@fakejimhalpert14 күн бұрын
GREAT video this is so cool!! keep going
@kevinnielsendev12 күн бұрын
Nice project! This is really impressive :D
@casperdong16 күн бұрын
the new GOOGLE I cannot wait
@varram348812 күн бұрын
I've been in the webscriping space for years. The robots.txt page is a suggestion. It is completely legal to ignore the robots.txt file. Big tech companies want you to think it's illegal even though its not. As long as you don't commercialize human made content you will be fine :D
@hexxt11 күн бұрын
naughty boy
@fujinshu11 күн бұрын
Right now it's illegal, but I wouldn't be surprised if Congress and Trump rule that ignoring the robots.txt file is illegal, especially with some light bribery and a compromised SCOTUS.
@cinderwolf3211 күн бұрын
Have you ever been blacklisted for requesting routes that are denied by the file? Especially if your user agent is noticeable
@reed651411 күн бұрын
Webscraping can be illegal depending what you do with the content. The u.s. copyright office has a Fair Use Index online that summarizes findings. Search should be fine in most cases, but it's good to be informed, especially if your use case is remotely iffy. Even better to talk to a lawyer.
@varram348810 күн бұрын
@@reed6514 yep exactly, and the use case in the video falls under fair use. Also, it's really funny how the entire AI industry is based around a grey area (The law is pretty outdated and vague); which we should see resolved really soon through lawsuits going on right now lmfao.
@Gaarlicc12 күн бұрын
Such a cool video , keep it up
@TheRetroEngine11 күн бұрын
Dude I'm new to your site and when the FBI turned up, I spilled my tea.
@zinck_dome707212 күн бұрын
Cool project man
@cake053911 күн бұрын
tfidf sounds like a DOOM cheat code
@LegendBegins12 күн бұрын
This was great! Nice work!
@Mikko-Maggie-More11 күн бұрын
fun fact: google purposefully removes results so that you'll use gemini instead
@ClarkeMacbeth10 күн бұрын
Just in time for 2121!
@casperdong16 күн бұрын
daniel zhang is the next joma tech!
@mwguy13 күн бұрын
Good stuff. You can improve crawler algorithm by spliting it to small worker nodes that coordinates with kafka/rabbitmq to parallelize page downloading. Also just thow away prisma and use raw sql to squize all performanice from database.
@C4CH3S11 күн бұрын
Or yk, don't use toy languages like JS in the backend for performance heavy tasks...
@reed651411 күн бұрын
It is worth considering load on the aites you're crawling. Don't want to ddos on accident.
@joshchen7993Күн бұрын
Such a informative video! How did you speed up the querying process? I’m also trying to create a search engine and calculating the tf idf of every page is taking a long time
@MeowVR14 күн бұрын
Very cool video
@casperdong15 күн бұрын
I am ur only female viewer
@jir_UwU12 күн бұрын
nuh uh me too
@casperdong12 күн бұрын
@@jir_UwU us girls gotta stick together
@deleted_handle12 күн бұрын
girls aren't real. stop trolling
@LEMON_2U12 күн бұрын
Nuh uh me too
@Imtitled12 күн бұрын
R30 There are no girls on the Internet
@BiuerBoris14 күн бұрын
Wonderful video and project! Would be interesting to have it spawn the crawler from somewhere else than Wikipedia.
@petermarshall163412 күн бұрын
Amazing video how does your channel only have 200 subs
@felixranesberger384612 күн бұрын
Awesome video!
@horntoad16167 күн бұрын
Horntoad here
@staniekkkkkkkkkkkkkkkkkkkkkkkk5 күн бұрын
good shif vru
@Matusevichfilms12 күн бұрын
Kagi is pretty good
@gljames2411 күн бұрын
What is that random sound in the background of your voice? Do you have your hand on your mic? Put a gain cutoff or something.
@garf51016 күн бұрын
I love giggle 😀
@mathman056912 күн бұрын
Should've used the URL API to check for valid URLs
@Christian-ry3ol10 күн бұрын
I think I just found my next project to work on. This inspired me. So fucking cool.
@CerebrumReality14 күн бұрын
Nice Video :0
@casperdong12 күн бұрын
I love you.
@exp526111 күн бұрын
@@casperdong thanks
@ShaikhRehanShakil14 күн бұрын
everythiing is just wikipedia :((
@ironislife985712 күн бұрын
Why not use a faster language that would allow you to search for things fastee
@cinderwolf3212 күн бұрын
The language is probably not relevant to the time it takes. It'd likely be unnoticeable whether you use Python or C++ for this (assuming you don't write terrible code). A network call / database write is multiple orders of magnitude slower than whatever the code is doing
@reed651411 күн бұрын
@@cinderwolf32caching to text files and then writing bulk queries could speed up the db stuff. The language speed hardly matters, but a couple milliseconds per page adds up when you're doing many thousands of pages.
@solmateusbraga11 күн бұрын
I am ur only femboy viewer
@_jb_11 күн бұрын
I enjoyed the animations and the effort. But your coding abilities need to improve. Keep on
@rixanito14 күн бұрын
The video is amazing, but i would still suggest you put more visual effects like zooming ins and transitions(especially zoom ins)
@rixanito14 күн бұрын
@bogxd would be a great source of inspiration
@therealpersonion16 күн бұрын
is this manim??
@danielcsthings16 күн бұрын
It's motion canvas motioncanvas.io
@Zhane499412 күн бұрын
YOU LIAR, THERE IS MORE ALTERNATIVES 😱😱😱😱😱😱😱😱😱😱😱😱😱😱😱😱