The creators of TikTok caused my website to shut down

  Рет қаралды 314,903

MattKC

MattKC

10 ай бұрын

and i thought charli d'amelio was the worst thing bytedance had done to me
▶SUPPORT on Patreon and watch videos like this early and ad-free: / mattkc
▶FOLLOW on Twitter: / itsmattkc
▶FOLLOW on Twitch: / mattkclive
▶FOLLOW on Instagram: / itsmattkc
▶Music by DDRKirby(ISQ) used with permission: ddrkirbyisq.bandcamp.com/
"I Can Feel it Coming" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
creativecommons.org/licenses/b...

Пікірлер: 1 000
@christianwolff497
@christianwolff497 10 ай бұрын
the biggest crime here is naming it ByteSpider and not SpiderByte
@Your_Average_Stickman_WasTaken
@Your_Average_Stickman_WasTaken 10 ай бұрын
I hate bluey
@ErikKoev
@ErikKoev 10 ай бұрын
no, the biggest crime is actually naming it ByteSpider and not SpiderDance
@pausebreakreviews
@pausebreakreviews 10 ай бұрын
God forbid it bite ya. Don't let 'em bitecha. That SpiderByte! Hurt! Hurt SpiderByte! That SpiderByte HURT!
@codyryan9789
@codyryan9789 10 ай бұрын
​@@pausebreakreviewsthat spider bit me where the good lord split me
@KOMEOyt
@KOMEOyt 10 ай бұрын
SpyderByte
@philo23
@philo23 10 ай бұрын
You probably want to block them in cloudflare rather than on your server, currently they’re still wasting your bandwidth (just in a much much more reduced form) by blocking them in cloudflare they shouldn’t end up wasting any of your bandwidth at all, they’ll never even touch your server. A simple page rule should do the trick, and even on the free tier you should get 3 page rules.
@lxpe5269
@lxpe5269 10 ай бұрын
Cloudflare also gives 5 WAF rules for free. With these, you could create a rule to block the user agent and add any other user agents, IPs, ASNs, etc in the future within a single rule.
@JustAWalter
@JustAWalter 10 ай бұрын
It says in the video blocking doesn't help
@philo23
@philo23 10 ай бұрын
@@JustAWalter in the video he's talking about Cloudflare's automatic bot detection, which is going to let legitimate web crawlers like ByteSpider through. I'm talking about a custom rule to specifically block that user agent at the Cloudflare level
@porglezomp7235
@porglezomp7235 10 ай бұрын
No, it says in the video that cloudflare’s automated DDOS protection doesn’t help. Explicit traffic rules would help.
@gggkiller
@gggkiller 10 ай бұрын
Since the bot follows links, if the homepage returns a 403, it won't spam the other pages as it has no links to follow I assume, but yeah, blocking in CF would still be ideal as it'd mean even avoiding that initial single 403 request's bandwidth.
@GooveG
@GooveG 10 ай бұрын
ByteSpider checking for updates on the Lego Island series every 100 milliseconds
@underscore.
@underscore. 10 ай бұрын
0.1 milliseconds*
@Yazan_Majdalawi
@Yazan_Majdalawi 10 ай бұрын
​@@underscore. 0.1 seconds
@UnitSe7en
@UnitSe7en 10 ай бұрын
Chinese hackers still have not managed to fix the framerate. Spy on the West.
@tomtrublu
@tomtrublu 10 ай бұрын
0.1 nanoseconds.
@leonkernan
@leonkernan 10 ай бұрын
Can’t fault them for wanting an update
@PoignantPirate
@PoignantPirate 10 ай бұрын
I definitely appreciate the PSA, you literally just saved me from having to diagnose the same issue on one of my servers.
@erikkonstas
@erikkonstas 10 ай бұрын
Wait really? The coincidence!
@NaoPb
@NaoPb 9 ай бұрын
​@@erikkonstasyou mean coincibytedance.
@adamkuster
@adamkuster 9 ай бұрын
@@erikkonstas You mean CoinciDance.
@FlameSoulis
@FlameSoulis 9 ай бұрын
Can confirm. I've been bitten by the stupid spider now that I reviewed my logs. If it isn't Russia trying to access a non-existant CPanel, it's now this.
@glossymouse7712
@glossymouse7712 9 ай бұрын
​@@erikkonstasIt might not be a coincidence as they are probably launching a huge data gathering campaign for a possible AI.
@asriel09
@asriel09 10 ай бұрын
Looks to me like they're downloading any and all images they can find. Could be for training an AI model. Looks like you have a forum, so that's why there's tonnes of requests coming your way.
@TsoLIt
@TsoLIt 10 ай бұрын
I've seen this before on my company's website. We host a lot of blog posts for business communications systems. Our site traffic trippled in a span of a week, and pretty sure it was one of these crawlers for AI
@mr.whimsic6902
@mr.whimsic6902 10 ай бұрын
Imagine a timeline where tiktok makes an ai of mattkc
@TuriGamer
@TuriGamer 10 ай бұрын
"Could be training for ai" No
@bonkwonkelchip7569
@bonkwonkelchip7569 10 ай бұрын
@@TuriGamer yes
@Survivalist_Redo
@Survivalist_Redo 10 ай бұрын
@@TuriGamer yes it's likely not training for an AI, could very easily be dataset gathering to then later train an AI though
@Gunbudder
@Gunbudder 10 ай бұрын
oddly enough, a chinese crawler completely tanked my college professor's homework submission website. it was extremely persistent too! i remember the entire system was down for a few days while they worked out exactly how to block it. from what i remember, they eventually just blocked every IP not from the USA lol
@IdentifiantE.S
@IdentifiantE.S 10 ай бұрын
You’re just too strong 😂
@Kyrmana
@Kyrmana 10 ай бұрын
Very sus
@steve_1507
@steve_1507 10 ай бұрын
Digital racism
@GavinFromWeb
@GavinFromWeb 10 ай бұрын
@@steve_1507no, not really. It’s a uni in the US. Unless they let you study online in other countries it shouldn’t be a problem.
@erikkonstas
@erikkonstas 10 ай бұрын
Um, sorry to be a killjoy, but "every IP not from the USA" is not an objective statement... an IP address by itself does not contain information regarding its origin on the planet, the job is usually done by ISPs (of all levels) who hand these addresses out to customers while reporting back to geo-IP database hosts at the same time; if one ISP of a high enough level goes rogue, you're toast...
@TheGreatSteve
@TheGreatSteve 10 ай бұрын
You'd think non-malicious inadvertent DDoS would be the easiest thing for Cloudflare to spot and block? Maybe it's whitelisted?
@capsey_
@capsey_ 10 ай бұрын
I mean, is it though? I am no expert, but I think gradual exceed of bandwidth limit is harder to spot than active DDoS attack. Why do you think it is easier?
@semaja2
@semaja2 10 ай бұрын
CF can block this traffic, you could deploy rules to block the user agent, or pay for their bot features, but this isn’t a DDoS Alternatively adjusting the server code to be more cache friendly would also help
@FurriousFox
@FurriousFox 10 ай бұрын
the caching part would in fact not work, the bytespiders will only scan all urls once, not multiple times, so caching wouldn't solve anything
@monad_tcp
@monad_tcp 10 ай бұрын
probably
@x_x_w_
@x_x_w_ 10 ай бұрын
Increase the cloudflare query caching level
@Nik.leonard
@Nik.leonard 10 ай бұрын
At least they had the “decency” of using a proper UA. They could (and hope that not will) just use the Chrome UA or worst, weighted random UA’s
@JessicaFEREM
@JessicaFEREM 10 ай бұрын
may be worth it to block the entire country if it's bad enough.
@FlamesRunner
@FlamesRunner 10 ай бұрын
@@JessicaFEREMBlocking countries is moreso a last resort, and shouldn't be considered so long as other options are available. CloudFlare, for instance, offers the ability to selectively block user agents, which would do the trick here.
@saiv46
@saiv46 10 ай бұрын
@@JessicaFEREM What's why many websites just outright block China (and now Russia, but for other reasons)
@undefinedchannel9916
@undefinedchannel9916 10 ай бұрын
@@JessicaFEREMApparently they use different hosts like AWS so the country may show up as the US for some requests.
@HappyGick
@HappyGick 8 ай бұрын
​@@undefinedchannel9916 Enough requests appeared with Singapore/China as location, so he could block those countries and he would be fine.
@SlavTiger
@SlavTiger 9 ай бұрын
I'm just sick of us being expected to foot the bill for something a large corporation does without your consent. These days our data makes us look like little more than dollar signs instead of people to a lot of those tech company execs.
@rudolfpast9243
@rudolfpast9243 10 ай бұрын
i dont know if its still a thing, but back in the day i implemented a spidertrap to all of my websites. easy thing. you need a 1x1 pixel transparent image on every site linked to your trap-script and in your robots.txt you declare the script as disallowed. so good spiders wont go there and bad ones will be blocked...
@ondrejpavlik4210
@ondrejpavlik4210 10 ай бұрын
I'd recommend you set up a simple email notification that the server would send to you if an arbitrary bandwidth threshold you'd consider too high was exceeded. This way you could resolve the issue before any downtime occurs.
@arjix8738
@arjix8738 10 ай бұрын
or just implement a cooldown that returns 429 when the same IP makes too many requests in under a specific amount of time
@jacksoncremean1664
@jacksoncremean1664 10 ай бұрын
@@arjix8738 from what he's shown in the access log that will be tricky to pull off since they are crawling very slowly
@jacksoncremean1664
@jacksoncremean1664 10 ай бұрын
a better idea would be to just set Cloudflare security level to IUAM
@randomblock1_
@randomblock1_ 10 ай бұрын
That's what his first email was about. The second was the notification that it ran out
@JordanPlayz158
@JordanPlayz158 10 ай бұрын
​@@arjix8738true, otherwise the crawler has no reason to assume there is a rate limit (perhaps there are even standard crawler headers to dictate how often they should scrape?)
@ZeroRiskAppetite
@ZeroRiskAppetite 10 ай бұрын
Maybe the crawler gets into an infinite loop. Might be the classic 'detecting cycles in an undirected graph' problem.
@erikkonstas
@erikkonstas 10 ай бұрын
Pretty sure that would make more of an exponential curve though, the one I saw in the video was a bit too linear...
@n3ishere
@n3ishere 10 ай бұрын
@@erikkonstas not necessarily, if it got stuck on some pages that in some way link to each other in a loop, it could be linear like that as more spiders go there and get stuck in a repeating loop (source: ive made web spiders before and this was a problem i had to fix with it)
@some1and297
@some1and297 10 ай бұрын
Yeah, I mean is this case I can't imagine bytedance designing a production webcrawler so terrible it can't cache URLs. It might have more to do with unique get request parameters being generated from page links.
@n3ishere
@n3ishere 10 ай бұрын
@@some1and297 unless the loop has enough pages that the cache gets cleared beforehand
@erikkonstas
@erikkonstas 10 ай бұрын
@@some1and297 I don't think crawlers should take into account whatever follows a question mark in the URL... like yes, there might be that one rare case where it doesn't mean what we think it means, but come on, it's just a spider...
@EpicLPer
@EpicLPer 10 ай бұрын
OH MY GOD ARE YOU KIDDING ME... so THIS was the reason my site went down too??? I suddenly couldn't reach my website at around July 11th or something too, and a few minutes later my provider sent me a mail saying they temporarily disabled my site till the figure out what's going on, it also looked like a DDoS in the logs and everything... wow... Now that mystery is solved, thanks! :)
@erikkonstas
@erikkonstas 10 ай бұрын
I'd say check if this was *really* it tho (like "ByteSpider" and everything).
@GeorgeSukFuk
@GeorgeSukFuk 9 ай бұрын
It's the squinty-eyed commies!
@johnbucki5567
@johnbucki5567 9 ай бұрын
When exceeding bandwidth, VPS's should not be suspended. I believe they should just shut off all network access, so the KVM console would still be accessible for troubleshooting. Also, if the reason is a DDOS attack, it will stop reaching the server and you can check where the traffic is coming from.
@shishsquared
@shishsquared 9 ай бұрын
Yeah it's crazy that there's not an out of band console
@notniko6914
@notniko6914 10 ай бұрын
Sue them for the 10$
@Howtheheckarehandleswit
@Howtheheckarehandleswit 10 ай бұрын
It's ByteDance, unless they do something bad enough to spark an international incident, the CCP will protect them from the consequences of their actions
@new_simsons
@new_simsons 10 ай бұрын
Bruh
@adorable_yangire
@adorable_yangire 10 ай бұрын
​@@new_simsonsBruh translate to English
@new_simsons
@new_simsons 10 ай бұрын
@@adorable_yangire wtf?
@Roach18
@Roach18 10 ай бұрын
​@@adorable_yangireToo bad, I translate to polish
@blikthepro972
@blikthepro972 10 ай бұрын
knowing how tiktok spies on and tracks phones like crazy, their web crawlers being extremely overkill just to scrape every last bit of data makes sense
@haileymccurry3756
@haileymccurry3756 10 ай бұрын
google et al spies on and tracks phones like crazy and yet thier crawlers are doing fine
@blikthepro972
@blikthepro972 10 ай бұрын
@@haileymccurry3756 true, but google's tracking is still not as bad as tiktok's. how "not as bad" it is i don't know, but that's the vibe i have gotten over the years
@internet_userr
@internet_userr 10 ай бұрын
Bing Chilling
@RAFMnBgaming
@RAFMnBgaming 10 ай бұрын
@@haileymccurry3756 google is certainly more practiced at "keeping their heads down", insofar as that's possible for one of the biggest companies around.
@nicepotato5755
@nicepotato5755 10 ай бұрын
the tiktok thing is mostly propaganda, all major tech companies do this.
@OROO111
@OROO111 10 ай бұрын
Thank you, I have the exactly same problem with my website, I don't even host anything on that website besides the default WordPress website but I had a huge amount of "users" accessing my site
@Zowiezo101
@Zowiezo101 10 ай бұрын
Yeah, I'm very glad to know about this as I have my own website as well and now I am prepared if this would happen to me!
@erikkonstas
@erikkonstas 10 ай бұрын
Were they all "ByteSpider"?
@rkvkydqf
@rkvkydqf 10 ай бұрын
Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.
@piemadd
@piemadd 10 ай бұрын
Bytespider has been active for years not (4ish last I checked) so this isn't anything new.
@WackoMcGoose
@WackoMcGoose 10 ай бұрын
Why do I get the feeling that Bytedance paid Cloudflare to look the other way and ignore their aggressive crawler shenanigans...
@jcfawerd
@jcfawerd 2 ай бұрын
Not surprised, since cloudflare is allowed to operate in china, coincidence? I don’t think so
@XeZrunner
@XeZrunner 10 ай бұрын
5:46 Have you tried contacting their email address from the UA string? In case it is a legitimate issue, they might want to hear about it.
@U20E0
@U20E0 10 ай бұрын
if they are actually just making a search engine, i doubt they want to waste their own resources like this
@foreskin
@foreskin 10 ай бұрын
I dont mean to be that guy but they probably legitimately dont care since its already been brought up multiple times before matt
@U20E0
@U20E0 10 ай бұрын
@@foreskin probably.
@FirstLast-gw5mg
@FirstLast-gw5mg 10 ай бұрын
If they ignore robots.txt I don't think it's likely that they care much about complaining emails.
@XeZrunner
@XeZrunner 10 ай бұрын
@@foreskin In that case, I agree with blocking them in this scenario.
@MagicalPhi
@MagicalPhi 10 ай бұрын
Now to find out if it was Tik or Tok who was responsible for this.
@Junimeek
@Junimeek 10 ай бұрын
or their lost cousin Tak
@SomeRandomPiggo
@SomeRandomPiggo 10 ай бұрын
Definitely Tek
@TheRedOwl
@TheRedOwl 10 ай бұрын
I'm pretty sure it was Tuk
@havesomerespectandspoilthe5880
@havesomerespectandspoilthe5880 10 ай бұрын
Or Tyk, if they ever come out of exile
@Zowiezo101
@Zowiezo101 10 ай бұрын
Don't forget Tyk and Tøk
@MyHandleIsAplaceholder
@MyHandleIsAplaceholder 10 ай бұрын
I believe Bytedance wants to create a new Chinese web browser to compete with the blocked ones
@zyxwv
@zyxwv 10 ай бұрын
Another? What about TouTiao?
@IdentifiantE.S
@IdentifiantE.S 10 ай бұрын
@@zyxwvWhat is Tatiao ?
@zyxwv
@zyxwv 10 ай бұрын
@@IdentifiantE.S A Chinese Web Browser by Bytedance.
@hi12167pies
@hi12167pies 10 ай бұрын
make a browser to compete with browsers chinese people can't even access 💀
@f3rny_66
@f3rny_66 10 ай бұрын
not a browser, but a search engine, I had the same bot and also PetalBot, from the huawei people and their search engine petal crawling client servers. But it can be filtered tho, just needs configuration. bytespyder is banned by default in AWS iirc
@shadowtheimpure
@shadowtheimpure 10 ай бұрын
The old adage applies here: Never attribute to malice what can be easily attributed to incompetence.
@GreyMaria
@GreyMaria 10 ай бұрын
Found the ByteDance employee
@shadowtheimpure
@shadowtheimpure 10 ай бұрын
@@GreyMariaWhat? I'm literally calling them stupid rather than malicious. Their web crawlers are not malicious, just very poorly coded.
@itsTyrion
@itsTyrion 9 ай бұрын
@@GreyMaria it's literally just "Hanlon's razor"
@MrTriple3D
@MrTriple3D 9 ай бұрын
evil people make it look like incompentence when it really is malice.
@erwannthietart3602
@erwannthietart3602 9 ай бұрын
​@@MrTriple3Dthe problem is, if we apply this idea to everytime incompetance looks evil, you may unjustly treat something actually incompetant, which can be just as useful tk the "evil people" as hiding behind a veil of incompetance
@CoolJosh3k
@CoolJosh3k 10 ай бұрын
Would be nice is certain user agents, like web crawlers, had a default limit on how often they could access the site. While obvious malicious crawlers could get around this, reputable ones that wish to stay whitelisted by default would obey.
@notalostnumber8660
@notalostnumber8660 10 ай бұрын
You can make php scripts to rate limit bot crawlers based on user agent In fact, you can try to Denial-of-Service them by using a GZip bomb or PNG/GIF/WebP bomb, since those can look legitimate, but end up causing havok for a short while
@erikkonstas
@erikkonstas 10 ай бұрын
"reputable ones that wish to stay whitelisted by default would obey" nah, malicious would just become the new reputable.
@Zettymaster
@Zettymaster 10 ай бұрын
UAs are super easy to spoof (since they are supplied by the software that SENDS the request) so that would only force them to crawl using spoofed UAs, which they allegedly already do.
@CoolJosh3k
@CoolJosh3k 10 ай бұрын
@@Zettymaster Oh. Then I stand corrected.
@someguy4915
@someguy4915 9 ай бұрын
This is in part what robots.txt is for but as the video shows, ByteSpider does not obey robots.txt... Used to be, for crawlers to not get blocked by everyone, they had to obey robots.txt, seems like ByteDance didn't get the memo...
@chaosmagican
@chaosmagican 9 ай бұрын
10 bucks for 100GB? Jeez, I'm paying 1€ per TB over here, that is just straight up robbery
@sesad5035
@sesad5035 9 ай бұрын
10 aussie dollars.
@thewhitefalcon8539
@thewhitefalcon8539 4 ай бұрын
Sounds like cloud. I get 1€ per TB too.
@sosman64
@sosman64 3 ай бұрын
​@@sesad5035then its even more robbery
@alex13902
@alex13902 Ай бұрын
​@@thewhitefalcon8539for bandwidth? Or storage. The two are very very different
@CoolJosh3k
@CoolJosh3k 10 ай бұрын
Assuming it really was Byte Dance, I expect this was not intended behaviour. It would cost them bandwidth too, though maybe so little in comparison that it just looks like regular background noise. An earlier alert would be been very useful here.
@Thesnugglebottom
@Thesnugglebottom 10 ай бұрын
They would be doing this tho thousands if not millions of sites though so the bandwidth in their side would be gigantic
@f3rny_66
@f3rny_66 10 ай бұрын
is the cost of bussines, just like google crawls the web, the issue with bytespyder and other chinese bots is that ignores robots.txt and other shady stuff
@spykillergames8402
@spykillergames8402 10 ай бұрын
it probably was....as i reckon they are using iamges from his site to train an AI model...modified webcrawlers can do that thing
@CoolJosh3k
@CoolJosh3k 10 ай бұрын
@@Thesnugglebottom I figure maybe it is so rare that only very few sites would have the issue.
@JordanPlayz158
@JordanPlayz158 10 ай бұрын
​@@Thesnugglebottombut if they know they are using a ton, they may opt for servers with no bandwidth limit
@jonmayer
@jonmayer 10 ай бұрын
I'm interested if you could get a response or not by emailing the support. Probably not, but it would be funny to see their reply.
@thebunsenburner
@thebunsenburner 10 ай бұрын
That's a wild ride for sure.
@diegopescia9602
@diegopescia9602 10 ай бұрын
Luckily your site has a static limit with a fixed price. Imagine the costs if it were an uncapped pay-as-you-go service like most cloud services
@airnith
@airnith 10 ай бұрын
this is very useful information. I been thinking about putting together a website for some friends, so now I know that I might need to look out for this.
@ToadyEN
@ToadyEN 10 ай бұрын
Worth noting that Twitter / X and lots of other sites have stopped bots from crawling them now, something todo with them training their AI with content from their sites.
@erikkonstas
@erikkonstas 10 ай бұрын
Except that I believe Twitter's case has become common knowledge to a wider audience, because, well, it did hit actual people with rate limits often too.
@Sammysapphira
@Sammysapphira 9 ай бұрын
Facebook and KZbin are obviously rate limiting. I get the same posts nonstop on Facebook for literal weeks no matter how many times I refresh or even if I open it on a different device. A lot of people are getting the same behavior. Twitter was just rhe only ones that were public about it.
@official-obama
@official-obama 9 ай бұрын
@@Sammysapphira it would go "oh no! something went wrong and we can't tell you" instead of doing that. it might be caching or nobody's posting anything
@mrscrewu1199
@mrscrewu1199 10 ай бұрын
Feel like cloud flare should detect this sort of activity and automatically block the user agent. At least temporarily too see if it stops or continues. Instead of suspending the client.
@DorAntCr
@DorAntCr 10 ай бұрын
It's always a good day when Matt uploads a new video. And rants about a random company as well.
@csbauder
@csbauder 10 ай бұрын
Really interesting stuff. I've considered making a website before, but I wasn't aware of stuff like this. Thanks for the heads-up!
@SeraphimKnight
@SeraphimKnight 10 ай бұрын
Good thing this is happening in the age of DDOS-prevention. Imagine getting fucked by a spiderbot back in the days when you'd host your website on your home network and your ISP charged you by data usage.
@gluttonousmaximus9048
@gluttonousmaximus9048 10 ай бұрын
...And several years ago here I simply failed to see several of the classic cartoon blogs simply because I was detected using VPN. Tom Scott has warned us well. The internet is a cesspool of patchwork offense and defense, shady strategies and clumsy turf war.
@robyc9545
@robyc9545 10 ай бұрын
Kinda irony that your sub is 404k now. Stay safe out there
@kur0kiba
@kur0kiba 10 ай бұрын
i thought for sure that it would be the same as a friend of mine had about 10 or more years ago. he kept a travel log website where he uploaded photos to because he was a nerd who liked to travel. he eventually visited the original Starbucks and uploaded a picture of the original logo. a more popular website used the photo but they didn't download and host the photo themselves. instead they just linked to the photo so when you loaded up the more popular website it would give your browser a link to where the photo was located on my buddies website so it could display it. his traffic skyrocketed. i believe it has a name for when people do this but i don't know it. he did find a fix for it where any website linking to any photo on his website like that would then be blocked.
@erikkonstas
@erikkonstas 10 ай бұрын
I've seen the term "hotlinking" for that, and yep, that's exactly why it's frowned upon.
@DarkGob
@DarkGob 10 ай бұрын
It's called hotlinking, and has been a discouraged practice for decades.
@HappyTinfoilCat
@HappyTinfoilCat 9 ай бұрын
That's when you swap out the photo for something like goatse
@thewhitefalcon8539
@thewhitefalcon8539 4 ай бұрын
It's called hotlinking and it's traditional to change the picture to pron. That could be illegal in some countries though.
@TravellingTARDIS
@TravellingTARDIS 10 ай бұрын
funny you use the spider-man 3 in that clip about bytespider because im fairly certain the font from the bytespider logo is that same one from the sam raimi spider-man films lmao
@8ullfrog
@8ullfrog 9 ай бұрын
It's a shame you can't invoice them for the bytef**king they did.
@LilacMonarch
@LilacMonarch 8 ай бұрын
I mean, you can still try. Just send an invoice and see if they'll pay it lol
@nj5374
@nj5374 9 ай бұрын
Surely as this becomes more common cloudflare may begin to implement a catch for similar overzealous crawlers?
@kennethbeal
@kennethbeal 10 ай бұрын
Thank you, excellent analysis!
@cptpotatoface386
@cptpotatoface386 9 ай бұрын
This reminds me when i had a minecraft server running for me and my friends. Woke up one day and went to check on it to see the that the server command window was full or disconnected messages. Did some stuff like editing the hosts file to make it redirect the IP back to itself or simular (prob did nothing) but eventually just went with running malwarebytes since it blocks suspicious requests
@wesleyfournier6278
@wesleyfournier6278 10 ай бұрын
cheers on the psa, the more people that share knowledge like this in unbiased ways like this the safer we can all be on the interwebs :)
@grass6317
@grass6317 9 ай бұрын
3:11 who tf uses android 5.0
@rockpie
@rockpie 5 ай бұрын
People who don't want to upgrade
@___aZa___
@___aZa___ 10 ай бұрын
always happy to see you upload :)
@RetroJack
@RetroJack 9 ай бұрын
Handy to know - thanks for the heads-up!
@General12th
@General12th 10 ай бұрын
Hi Matt! I love storytime with Matt! You're really fun to listen to.
@TheFinnishTechie
@TheFinnishTechie 10 ай бұрын
You KNOW it’s going to be a good day when MattKC posts a video. Keep up the good work man
@matthewforan6397
@matthewforan6397 10 ай бұрын
I've also noticed a ton of traffic from Singapore recently, and my domain just has the default parking page!
@Mark_Rober
@Mark_Rober 10 ай бұрын
"and i thought charli d'amelio was the worst thing bytedance had done to me" The description is the best part of this video XD
@zyxwv
@zyxwv 10 ай бұрын
I would find it strange for them to be making a new SE. I believe TouTiao would not really need a remake, saying as it already has over 100 million users daily
@rkvkydqf
@rkvkydqf 10 ай бұрын
Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.
@zyxwv
@zyxwv 10 ай бұрын
@@rkvkydqf That does make a lot of sense. However, googling the issue in the video (ByteSpider) shows that this has been going on for a long time. I saw a Stack Overflow post from 2019.
@realcrashie
@realcrashie 10 ай бұрын
Not the type of MattKC video we expected, but the one we deserved. Always happy to see you have uploaded, no matter the content ❤
@v1mja
@v1mja 8 ай бұрын
I work at a cloud provider. We have a wide band of customers and I'm afraid to say that we have seen all sorts of issues with search engine bots. Not just from fringe ones either. Even the large ones can cause weird issues. The problems we have observed include big spikes in PHP-FPM processes, tens gigabytes of cache being generated by weird access patterns and even extremely high database loads... Funny how that goes sometimes.
@pcislocked
@pcislocked 10 ай бұрын
ur uncached traffic ratio is really low tbh, maybe also take a look at that to take more load from your webserver.
@Jergling
@Jergling 10 ай бұрын
The fact that the requests were coming from seemingly random Singapore IPs still suggests a botnet. I wonder if there's a Bytedance app doing ill-conceived distributed computing in the background. You wouldn't need any kind of app permissions to browse the web, nor would any one user notice it the way crypto leech apps tend to be noticed.
@d9zirable
@d9zirable 10 ай бұрын
nah singapore is just a colony of china
@zwz.zdenek
@zwz.zdenek 3 ай бұрын
They are not random at all, they are ranges owned by cloud services.
@LethalBubbles
@LethalBubbles 10 ай бұрын
gotta love their use of the spider-man movie font
@imaxvi
@imaxvi 10 ай бұрын
“im not that popular” hits hard 😭
@wchorski
@wchorski 10 ай бұрын
Please more content like this. I host websites and services and this helps me keep up on new threats and how to deal with them
@JulianR2JG
@JulianR2JG 10 ай бұрын
New video from Mr. LEGO Island
@donutsndcoffee
@donutsndcoffee 10 ай бұрын
Friggin fascinating mate
@bananapl0
@bananapl0 10 ай бұрын
The reveal on stream was epic.
@Serverfrog
@Serverfrog 10 ай бұрын
fail2ban with BadBots Rule should also do the job ;) then it would already block the IP Address temporarily in iptables (or other Firewall thing that fail2ban was configured), which reduces more the Traffic they will produce
@JTCF
@JTCF 10 ай бұрын
That was a nice reminder to check my home server nginx access logs. Thank god I set it up correctly before opening up to the world.
@mandarina1367
@mandarina1367 10 ай бұрын
set it up in a way to avoid this from happening?
@Kyrmana
@Kyrmana 10 ай бұрын
Happy 404k subs! 😄
@JohnLasseter-ct5in
@JohnLasseter-ct5in 10 ай бұрын
Old man yells at cloud
@autiboy08
@autiboy08 10 ай бұрын
Hi Matt, love the content you make! Looking forward to this watch!
@jps915
@jps915 10 ай бұрын
nn
@Pesthuf
@Pesthuf 10 ай бұрын
I hope they won't stop using that user agent string, or else you've got an issue. It's weird how they give you that string, but do everything else in their power to stop you from blocking their crawler.
@BluesM18A1
@BluesM18A1 10 ай бұрын
Learned quite a lot of new things today. I'll have this in mind in case my website gets any run-ins with unwanted attention
@slipperynickels
@slipperynickels 6 ай бұрын
wow. being unable to view my own access logs because my website’s bandwidth allocation has run out would be a MASSIVE dealbreaker for me. that is ridiculous.
@niepytajdl
@niepytajdl 10 ай бұрын
truly a chinese moment
@ruairim2283
@ruairim2283 10 ай бұрын
Openly showing this is the best thing you can do. Even if you can't prove this is malicious, you're still providing info for the Internet. Maybe more OCD users will get to it. Who knows?
@grubdotwebsite
@grubdotwebsite 10 ай бұрын
ByteSpider's logo using the Raimi Spider-Man font is incredibly silly
@Sharan25
@Sharan25 10 ай бұрын
Matt KC is back fr
@Boxuga
@Boxuga 10 ай бұрын
All the crazy tech corporations been in the news recently LTT, now Bytedance again its crazy and also keep up the good work MattKC
@nunyabiznesse6917
@nunyabiznesse6917 10 ай бұрын
They always have been on the news though
@Tigermoto
@Tigermoto 10 ай бұрын
All two? Have i missed something?
@JustPyroYT
@JustPyroYT 10 ай бұрын
How's the Lego island decompilation doing?
@monkeypox21
@monkeypox21 10 ай бұрын
NEW MATTKC FINALLY
@bosch5303
@bosch5303 10 ай бұрын
Yoo new mattkc viv
@Justinjaro
@Justinjaro 10 ай бұрын
I got hit by the same thing the past two weeks. My servers were getting every port and route scanned and pinged every half second.... like damn. Spending at least 20-30 mins a day adding IP's to the block list.
@justaneric
@justaneric 9 ай бұрын
Interesting how there can be that many (IP addresses/computers) scanning your website, TikTok really needs to stop this and fix their crawler.
@Geomedge
@Geomedge 10 ай бұрын
New Matt KC video 🎉
@CWKEnterprises1024
@CWKEnterprises1024 7 ай бұрын
4:27 Bing Chilling
@Napert
@Napert 8 ай бұрын
giving them the benefit of the doubt is like giving a serial killer a benefit of the doubt it's just moronic
@johnsmith34
@johnsmith34 10 ай бұрын
Another thing to note is that your site doesn't have a robots.txt file. I can't say if it matters though.
@CaptainGibbons
@CaptainGibbons 10 ай бұрын
The screenshot he showed said they didn't respect it anyways.
@donatj
@donatj 10 ай бұрын
Have you sent an email to the feedback email address in the user agent string?
@Hugocraft
@Hugocraft 10 ай бұрын
Very informative, thanks!
@yukimoe
@yukimoe 10 ай бұрын
I remember it happened to me years ago with another one of these Chinese crawlers, I think it was Yandex or something Those guys never learn
@Rainmotorsports
@Rainmotorsports 10 ай бұрын
Didn't see the email contents on mobile but if your provider wouldn't spin the VPS up with the external IP blocked so you could access it through a virtual console id probably ditch them lol.
@burp2019
@burp2019 10 ай бұрын
the VPS provider likely wouldn't know what was going on and he only got to it after they locked it
@erikkonstas
@erikkonstas 10 ай бұрын
That could very well open it up to abuse tho... no, the client wouldn't earn anything from the abuse, but if the client is evil-minded and delusional they can wreak havoc like that.
@Rainmotorsports
@Rainmotorsports 10 ай бұрын
@@burp2019 You aren't saying anything against this though. Spinning the server up with no connection to the outside world allows the customer to access their logs. Virtual console is a method to replace the crash cart you would use if you were inside the data center.
@Rainmotorsports
@Rainmotorsports 10 ай бұрын
@@erikkonstas How? All you are allowing a customer to do is see their logs and make config changes before deciding what to do. Selling them more bandwidth first is in poor faith and might not last long enough to solve the issue. With absolutely no connection to the outside world except a virtualized KB/VGA which by the way is soo much worse than using an SSH client there isn't much you can do. You won't be able to install software thats not on the machine, you wont be able to backup and retrieve your files. You can enter text and take screenshots thats about it.
@erikkonstas
@erikkonstas 10 ай бұрын
@@Rainmotorsports Is that actually very common...??? I was thinking the SSH or similar way, where you could just have another VPS with your credentials stuck to it, but which is open to the whole world, and totally not what is intended to be allowed.
@paulinet68
@paulinet68 10 ай бұрын
not people commenting on the video before even watching it assuming this is about people who publish content on tiktok and immediately jumping the gun, noo, that could never happen to a video that's actually about a web crawler
@Junimeek
@Junimeek 10 ай бұрын
considering that tiktok creators are infamous for committing intentionally malicious acts completely unrestricted, i think that makes perfect sense personally
@MirrorsEdgeGamer01
@MirrorsEdgeGamer01 10 ай бұрын
I just saw a webpage about them launching a search engine on china.
@Space_Reptile
@Space_Reptile 10 ай бұрын
it seems to be grabbing every single image file on your forum block that thing asap
@ThePenisMan
@ThePenisMan 10 ай бұрын
So a tech company with way too many resources incompetently played with tech and now everyone else has to pay for it
@brunothedev
@brunothedev 10 ай бұрын
4:03 Where i can apply?
@IllusoryMaze
@IllusoryMaze 10 ай бұрын
I'm curious if you've tried sending a message to that feedback email telling them what's happening. Since you seem to think they're not malicious, maybe it's legitimate to consider that they're just that incompetent and don't realize they're doing damage to websites like yours.
@Light_Dies_07
@Light_Dies_07 10 ай бұрын
NEW MATTKC UPLOAD!!!!!!! AND I FUCKING **MISSED** IT BC I WAS ASLEEP ALL DAY.... 😭
@AsquatranananBananananan
@AsquatranananBananananan 5 ай бұрын
0:26 that SUSPENDED is so ominous lol
@chrisakaschulbus4903
@chrisakaschulbus4903 10 ай бұрын
I was a webmaster once. Then i realised that i want to get older than 30 and stopped.
@brycem8161
@brycem8161 9 ай бұрын
That bad?
@pdlbackup
@pdlbackup 10 ай бұрын
I love this shorter type of informational video! It took me a bit to notice the lack of music, which might be why something felt off to me. Also not seeing you in the video as much as usual felt different. Totally not against you experimenting with it though, cause I can imagine that it would save some time on making the video and for me I don't think the video really suffered too much from it.
@nu52927
@nu52927 10 ай бұрын
man that sucks
@weshuiz1325
@weshuiz1325 10 ай бұрын
5:18 i can not tell if they use aws to purposely trying to bypass blocking channels, and can not be sure if they ignore robots.txt either, but the rest are certain
@FirstLast-gw5mg
@FirstLast-gw5mg 10 ай бұрын
Depending on the tools that he has at his disposal, it may be possible to craft a bot trap that automatically detects and IP-bans spiders that ignore robots.txt. A "Disallow" directive can be added to robots.txt telling spiders that they may not access a specific URL, and then a hidden link can be placed on the site that will lead non-compliant spiders to that forbidden URL. Since only spiders can see the link, and they're specifically forbidden from following it, any client that requests it can automatically be IP-banned.
@JoshuaPeisach
@JoshuaPeisach 10 ай бұрын
Once again, screw TikTok
@Ampd-647
@Ampd-647 10 ай бұрын
yt shorts is shit
@DrakkarCalethiel
@DrakkarCalethiel 10 ай бұрын
Would love a global ban of that CCP spyware that degenerates humanity. But we all know that this will never happen...
The time I got suspended from school...
14:12
MattKC
Рет қаралды 690 М.
Is the Wii U... kinda good now?
17:30
MattKC
Рет қаралды 384 М.
Шокирующая Речь Выпускника 😳📽️@CarrolltonTexas
00:43
Глеб Рандалайнен
Рет қаралды 11 МЛН
UFC 302 : Махачев VS Порье
02:54
Setanta Sports UFC
Рет қаралды 1,4 МЛН
Cloudflare Deploys Really Slow Code, Takes Down Entire Company
13:24
goodbye tiktok...
12:01
LazarLazar
Рет қаралды 2,4 МЛН
Banned From Singleplayer? - Minecraft Java's DARK FUTURE...
14:29
TheMisterEpic
Рет қаралды 1,2 МЛН
I Made a Graph of Wikipedia... This Is What I Found
19:44
adumb
Рет қаралды 2,4 МЛН
Why did we Abandon 4:3? | Nostalgia Nerd
16:40
Nostalgia Nerd
Рет қаралды 484 М.
The biggest mistake I ever made...
9:42
MattKC
Рет қаралды 391 М.
The Legend of YouAreAnIdiot.org
18:01
NationSquid
Рет қаралды 9 МЛН
Where People Go When They Want to Hack You
34:40
CyberNews
Рет қаралды 1 МЛН
Installing Viruses on Windows 98 - MattKC
9:46
MattKC
Рет қаралды 338 М.
North Korea made a Flash game... and I played it
21:37
MattKC
Рет қаралды 615 М.
РАНГ ЖАНА СЕЗОН  - АДЕЛАЙДА СТРИМ !
5:53:20
ДОБРО ПОЖАЛОВАТЬ В АФРИКУ ► Resident Evil 5 #1
1:8:57
СТРАШНЫЕ СУШИ В ROBLOX
41:00
OVER SHOW
Рет қаралды 181 М.