Best Practices Working with Billion-row Tables in Databases

Рет қаралды 64,475

Күн бұрын

In this video, I discuss 3 methods to work with tables of billion rows. This discussion is inspired by a comment on my KZbin video on Twitter System Design right here • Twitter System Design ...
Chapters
Intro 0:00
1. Brute Force Distributed Processing 2:30
2. Working with a Subset of table 3:35
2.1 Indexing 3:55
2.2 Partitioning 5:30
2.3 Sharding 7:30
3. Avoid it all together (reshuffle the whole design) 9:10
Summary 11:30
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
🏭 Backend Engineering Videos
• Backend Engineering (B...
💾 Database Engineering Videos
• Database Engineering
🏰 Load Balancing and Proxies Videos
• Proxies
🏛️ Software Archtiecture Videos
• Software Architecture
📩 Messaging Systems
• Message Queues & PubSu...
Become a Member
/ @hnasr
Support me on PayPal
bit.ly/33ENps4
Stay Awesome,
Hussein

Пікірлер: 145

@elultimopujilense 3 жыл бұрын

The only thing about this channel that makes me feel awful is that i didnt discovered it earlier. The topics discussed here are not really common, but extremely important in my opinion. I havent found anything that gets near this channel, some topics are so high level that I havent heard of them at all. Makes me feel like I know nothing, which is the best feeling ever. There is no much to learn! Thank you so much!

@oscarbarajas3610 3 жыл бұрын

I totally agree with you. This guy makes my time spent worthful.

@saurabhsharma7123 3 жыл бұрын

Exactly! KZbin hasn't reported me any similars yet😂

@zacharythatcher7328 2 жыл бұрын

This idea of transitioning the delay to the writer AND then using queues for writes is true architecting foresight. I love it.

@achraf3117 3 жыл бұрын

I'm a junior Software Engineer and working my way through to specialize with Backend. I always learn new concepts from your videos. This one in particular hits home for me because I've worked on couple of projects were I had to make the design choice of the database. I found it quite difficult to make the right choices because I always end up building a search endpoint with full text search and other search parameters. To put it into context, the databases I had problems with were filled with recipes. I had multiple tables with 100k+ entries filled only with IDs (the problem you mentioned). After two projects I think I switched to the latest idea you mentioned with the list/json column and that one worked the best for me. Because not only it avoids searching through a big table but also saves me an extra query to another table. This is kind of irrelevant the this video but when implementing full text search I think it's better to go with postgresql rather than mysql since it supports gin and gist indexes and fuzzy searching that can help build a nice, affordable and quick solution to meduim sized databases. Keep doing the nice work.

@hnasr 3 жыл бұрын

Thanks for sharing your thoughts Ashraf appreciate it. this is useful

@0161rt 3 жыл бұрын

Isn't that best text search db is Elasticsearch??

@tikz.-3738 3 жыл бұрын

It's kind of related to data structure instead of one to one ur having one to many relation per row

@andreivilla9009 3 жыл бұрын

Man your content is GOLD. I come from a front-end stack and was underestimating the work with databases. But your content has helped in understanding the pitfalls of backend engineering.

@Bilo_7 4 күн бұрын

I really like listening to you while doing other stuff, like driving eating or walking. Always being productive and learning new things "sidely" by your videos. Habibi ❤

@nilanjansarkar100 3 жыл бұрын

generally, it's helpful to think in terms of read and write paths of your data. On the read side, on top of partitioning you can add bloom filters to quicky test where a value exists or not to reduce searching the B-Tree or other persistent data structures

@hnasr 3 жыл бұрын

nilanjan sarkar nice idea Nilanjan, Completely forgot about bloom filters probabilistic data structure.. For anyone interested check out this video I discussed it before Bloom Filters Explained by Example kzbin.info/www/bejne/nXPcmKFpmLuDbpI

@vinny142 3 жыл бұрын

" On the read side, on top of partitioning you can add bloom filters to quicky test where a value exists" That would mean scanning the bloom for every partition and the results you get will contain false-positives meaning you will still scan indexesof many partitions that wil not contain any matches. It's better than doing an index scan on all partitions, but I get the impression that we're not talking about the same thing when we say "partitioning'. The point of partitioning is that you divide rows up by the value of fields so you can predict the partitions-to-search by looking at the query. If I partition a phonebook by the first letter of the last name then querying for "Johnes" will only use the 'J' partition and the btree of that table wil be 1/26th of the original table (roughly) and so scan much faster. I don't have to pre-filter to find out that I need partition 'J', the database knows that from the query.

@nilanjansarkar100 3 жыл бұрын

@@vinny142 Yes. This is more from an LSM tree perspective where an entry in a partition can reside both in memory and on disk via multiple versions

@t529615 3 жыл бұрын

Yet another great video! Truly educational, even for someone who have been in the game for over 15 years. 👏

@PiyushChauhan2011 3 жыл бұрын

I’m front end developer acquired backend skills, all because of good content from this channel. Amazing content and energy, thank you 😊

@OmarBravotube 3 жыл бұрын

Couldn't have explained it simpler! Big ups dude!!

@arianseyedi 3 жыл бұрын

Maybe I didn't understand the second last section about eliminating the need to update both ends of a connection, but your solution will crumble when person A who is following person B closes their account since the information on who follows person B is only in the person's B's records. So when person A exits the medium, we won't know who to update.

@joaopedromoreiradeoliveira4273 3 жыл бұрын

You help not only software engineers with your channel, but also data analysts like me that works a lot with data engineering. Thanks! Greetings from Brazil.

@hnasr 3 жыл бұрын

Love to all subs from beautiful Brazil 🇧🇷

@videosforthegoodlife2253 Жыл бұрын

I had to deal with tables with a few billion records per month, and MySql merge engine (merge tables) let us slice and dice them any way we needed. You can even have more specific indexes in the real tables themselves, as long as each merge index exists in all tables. The downside of Merge tables is that it multiplies the open file handles on the system, which can be tricky for a machine doing public networking, but with latest kernels, you can get kind of crazy. High memory makes a huge difference, of course.

@4am4i 3 жыл бұрын

Good video. The last trick in the video is called denormalization. Also as soon as you introduce sharding you need also add replication because the probability of failures increases.

@insearchof9090 17 сағат бұрын

I love you man. You are so crystal clear. you are legend.

@elmehdisaniss2731 3 жыл бұрын

Your content is top bro.

@rajendiranvenkat5992 3 жыл бұрын

I've got many solution on database side from your video. Thanks for your support.

@p4s153 3 жыл бұрын

Apart from great contents you inspires me to love and play with technology....you are awsm brother..

@simonedeluca104 3 жыл бұрын

Hi Hussein, thank you for your awesome videos!!! They're really intresting!

@hnasr 3 жыл бұрын

Simillo De Simo ❤️❤️ thanks 😊

@sany2k8 3 жыл бұрын

Awesome as always, hands on examples of sharding, partitioning and indexing will double awesome bro.

@hnasr 3 жыл бұрын

Glad it helped Thank you so much!

@sachinkalkur 3 жыл бұрын

top notch content, i just encountered similar problem at my work place. thanks.

@chenxiuming 3 жыл бұрын

2 extra ideas: 1. table archiving: most large tables are caused by timeseries records -- just archive the old records in separate tables and keep the live table small 2. use modern databases that are more scalable than traditional single-host databases: cockroachdb, spanner, aurora, tidb, fauna, etc.

@hnasr 3 жыл бұрын

Great idea if its time series and you can afford to archive do it..

@vinny142 3 жыл бұрын

I'm a big fan of archiving, just make sure that you don't shift the problem from the "live" table to the archive table, if you keep archiving to the same table for a year then that table will become impossibly huge and essentially unusable and you end updeleting it. A simple cronjob can create a new monthly table or something. Partitioning is in effect a type of archiving but with a transparent api so you can just run a query and the database will choose the partitions by itself. As for more scalable database engines; that would depend on wether those can still deliver the functionality you need. For example; any app dealing with money cannot live without transactions and referental integrity because one mistake can cost millions. Also; is't not exactly true that traditional databases are single-host. Replication is easy to setup to get more nodes involved and sharding is realtively easy to built into an app or a proxy.

@filippetrovic2596 3 ай бұрын

Are we archiving entire db or just a table? How do we join between archived and non archived dbs? Because if there are joined users that do not exist anymore archive will not work. I guess we need to snapshot the db or at least more than one table that is problematic

@salehdanbous6783 Жыл бұрын

Sometimes you cannot redesign the table or escape getting at billion rows tables. For example, the journal entry for real-time payment system gets very large in very few months. Moreover, you cannot resort to NoSQL databases since these data are ACID in nature in which eventual consistency is not applicable. I guess you will end up using all the aforementioned techniques like: Indexing (Obvious One), Partitioning and Sharding. One problem with Partitioning though is that most ORMs are not Partition aware. On the other hand, you have to be careful of new distributed and horizontally scalable databases like Cockroach DB, YugaByte,etc,. Because there is always a catch.

@vilkazz 9 ай бұрын

I did have a field day implementing partitioning on top of JPA and Postgres a few years back. Works like a charm, but requires a lot of low level fiddling to get it to work

@mennahamedo8204 3 жыл бұрын

Thank you for your amazing content, I am really happy that I found your channel

@hnasr 3 жыл бұрын

Thanks Menna 🙏😊

@rashmidewangan8627 3 жыл бұрын

Fabulous content on your channel. Subscribed. Thanks !! Keep up the good work :)

@Sarwaan001 3 жыл бұрын

Woah the json method literally blew my mind.

@GurpreetSingh-pl8uz 2 жыл бұрын

In json design, the followers list column can grow very large to millions and can cause speed issues there

@alichoobdar9030 Жыл бұрын

I believe the last concept is actually called denormalization. Another option could be considering NoSQL. By the way, you are awesome

@lucasbastos8019 3 жыл бұрын

Awesome content, thanks!!!

@treyheaney9483 3 жыл бұрын

Is there a method of grouping the data by vector similarity and when querying, the subset that is queried is chosen by a vectorized term in the query?

@natem889 3 жыл бұрын

I liked all your ideas. Although I was waiting if you could talk about data archival and purging. Something I tried was on archival of data to another database and purging it from the original database which may not be accessed frequently.

@EddyCaffrey 3 жыл бұрын

Great topic. Very helpful 👍🏾. Truthfully, I always go with the last option 😅 because most of my projects will never have a billion rows. Now I have others options too. Thanks.

@hnasr 3 жыл бұрын

Glad it was helpful Eddy! thanks

@zhaoxueli315 3 жыл бұрын

Truly a hidden gem

@brucewayne2480 3 жыл бұрын

In case of a twitter like app, I will go certainly go for a graph database like Neo4j and use another database to store indexes. Each user is a node and each followingX relation is an edge, It will limit the join operations

@hnasr 3 жыл бұрын

Bruce Wayne fantastic thoughts, Graph databases are designed for this use cases so they will be optimized for this.. that will be another way to avoid that large table .. I don’t know how graph databases work though so not sure of their scale and how do they store things

@brucewayne2480 3 жыл бұрын

@@hnasr If I'm not wrong , they store the addresses in one file and the data in other ones , so if you find a node using indexes , you will find next to it the physicial address of each edge related to it , it's just a O(1) read operation, and so navigate in the graph by just simple read operations instead of complex joins. And this is an article introducing sharding for neo4j neo4j.com/whats-new-in-neo4j/scalability/ Thanks for your videos , I'm learning a lot of new things each day

@vinny142 3 жыл бұрын

Does the twitter example lend itself to Graph? The twitter database is basically three tables with single-field relations.The joins in the queries don't go deeper than one level. Graph is excellent for large...well graphs,where nodes are linked to nodes who are linked to nodes etc etc , but twitter doesn't d that.(unless you perhaps want to query all the users who follow all users who follow all users who follow allusers who follow Obama,but that's easily solved using recursion in SQLand really doesn't require graph.

@MartinsTalbergs 3 жыл бұрын

DGraph has nicer query syntax than Neo4j imo

@JoaoPinela 3 жыл бұрын

Very Nice content 🥇 I would just ask: isn't sharding just a type of partitioning? Sharding = multi-host-partitioning? Keep up the good work. Very interesting 👏👏

@serhiihorun6298 3 жыл бұрын

You are amazing man!

@sureshchaudhari4465 5 ай бұрын

if we partition data how do we run sort order select query efficiently since it will hit multiple partition

@mmu200 3 жыл бұрын

What is your thoughts on searching a varchar(1000) column in a billion rows table. Note , only one column and keyword search. Like %xyz% . I know like query will be supper slow. So what will be the alternates. How to solve

@krramesh5 3 жыл бұрын

Will this also help in reducing the query time for JOIN? Can someone explain how join will work?

@dinakaranonline 3 жыл бұрын

Thanks for the video . I did a first time KZbin 'applaud' feature with a small token of 100 Rupees . Hope you recieved it . Keep going 🙏🙏🙏

@hnasr 3 жыл бұрын

Dinakaran S thank you so much Dinakaran for the support!

@positivity1016 3 жыл бұрын

Great content 👍👌

@gmggx8944 3 жыл бұрын

can NoSQL store Billions of datas... ??? or use SQL tables stuff... is good ? or is their NoSQL + SQL DB like the row json

@flygonfiasco9751 3 жыл бұрын

Kinda don’t like the idea of having JSON tbh because you can’t enforce foreign keys and you, in this example, couldn’t get a list of your followers. That said, love your content. Thanks for the videos!

@TheRizwanSharp Жыл бұрын

Great video Hussein. Just wanted to share my thoughts as I have first hand experience with a table container over a half billion rows. I don’t believe, a billion row table is that big of a deal for a modern RDBMS if the schema is well designed and proper indexes are in place (to support the common query patterns). I have a database with over half billion rows which works flawless on a medium sized hardware. (Azure Sql database) Indexes go a long way, then partition and then shard as you mentioned. Also before making things complicated by adding shards etc, you can always ask yourself “is there data in the table that I can offload/archive/delete to some other table?” The answer in most cases would be yes. This should save your from a lot of additional complexity.

@user-bk5xo1gj7k Жыл бұрын

Rizwan, might i ask what database technology you are using? I have been working on a similar table (well, multiple tables with around half a billion rows) and i have faced the performance issues. I have tried it on postgres as well as mysql. Indexes definitely make things better, however, in case of searches, even they are not enough. iterating the table sometimes took me days. I didn't design the structure, but have to maintain it. The only way left was sharding for me. If you hhave any other tricks and tips, please share!

@TheRizwanSharp Жыл бұрын

@@user-bk5xo1gj7k Hello, I am using Azure SQL Database but I would think others (MySql, Postgresql etc) perform very similar. For instance, here is the table I had that has: 213,461,562 Rows (almost a quarter million) I ran a query on it by filtering on a column that is indexed (datetime) and selected top 100 rows and here are the results: SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (100 rows affected) SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Completion time: 2023-01-24T08:34:59.6096640-06:00 As you can see, it was barely anything for the server. A few things to look at: 1) Is your table indexed for the query you are running against it? 2) How much data is part of the query result? This is why my original comment stated that you barely need sharding (partition should come before sharding) until maybe you get into multi billions and space on a single server becomes a problem). Sharding is no easy topic when it comes to properly implementing it. Here is something that may be of help in your optimization scenario: use-the-index-luke.com/ Best of luck and let me know if you have any questions.

@user-bk5xo1gj7k Жыл бұрын

@@TheRizwanSharp thanks a bunch for reply!

@andreigatej6704 3 жыл бұрын

Thanks for the great video. I'm learning a lot from this channel. Could a junction table be used in this case as well? For example: id(primary key), followed_id(foreign key), follower_id(foreign key) ...; this would mean a single added row each time the 'follow' button is pressed and, if needed, an inner join to see the relevant details of the followers

@user-bk5xo1gj7k Жыл бұрын

no i don't think that'd work. see, the issue is the number of rows in the end. junction table would be larger than the 2 tables that table is a junction between. the junction table could grow to multiple billions of rows and we would end up with massive table. (hope i understood your question correctly)

@andreigatej6704 Жыл бұрын

@@user-bk5xo1gj7k yes, that's right. Thanks! I'm wondering now whether there are any other approaches to this problem, apart from those already mentioned. Maybe as far as the DB schema is concerned.

@toxiclife 11 ай бұрын

what to do when I want to overwrite 100 millions of rows into new table, in minutes? df.write.mode("overwrite").saveAsTable("FINAL"), if you could please help with this?

@_SoundByte_ 3 жыл бұрын

Can you please talk about different databases and their pros and cons. Lsm, btree, column oriented, rdbms etc

@hnasr 3 жыл бұрын

swapnil kale discussed this some what in my database engine course Database Engines Crash Course (MyISAM, Aria, InnoDB, XtraDB, LevelDB & RocksDB) kzbin.info/www/bejne/gWq0lWaLgq2Ih5Y

@tobiowolawi5044 3 жыл бұрын

I'm currently having this problem right now with my current company. another way could be just archiving old data if they are not often accessed and creating a mechanism to load up the archive data whenever needed. for example transaction tables

@hnasr 3 жыл бұрын

Correct archiving is another remedy for this Tobi agree.

@maikalkumar4382 3 жыл бұрын

Shard - partition - index - parallel processing

@santoshbhatnagar2155 9 ай бұрын

How can we efficiently search in the JSON data? can someone share some tips on this?

@sojuthomas7727 3 жыл бұрын

HI, Thanks for the video.This gives me an insights of how to design a very large database. But can we do this partitioning and sharding on a relational database like MYSQL for example? if we do so I believe we can't leverage the full benefits of MYSQL IMO ,a data base with lots of partitioned table how JOINS will work then?

@videosforthegoodlife2253 Жыл бұрын

MySql has something even cooler - Merge Tables. Doesn't support putting tables on multiple drives, but there are a few ways to deal with that.

@shubhamchopra5518 3 жыл бұрын

Hi Hussein, Love your videos. Can you please create a video on creating index on production tables without having a lock on the table.Maybe you can talk about Online DDL, percona , memory consumption and write latency impact.

@hnasr 3 жыл бұрын

I actually discussed this on my indexing video. You use create index concurrently on postgres kzbin.info/www/bejne/Y6KxhIuBbNhqldk

@nathanbenton2051 3 жыл бұрын

Wow great video. What about using both Spark+Hadoop in conjunction with the index-partitioning-sharding approach. Would such a strategy even makes sense?

@hnasr 3 жыл бұрын

Nathan Benton Works but so much infrastructure to support so the question becomes : is it worth it?

@ghassanaljaafri9368 7 ай бұрын

Thank you for the content it's helpful, I think the problem with your design is that when you try to discover all the profiles you follow, you will end up in trouble.

@woltron4o 3 жыл бұрын

Nice video!

@AchwaqKhalid 3 жыл бұрын

GPU based query is the way to go 💥

@minscj 2 жыл бұрын

how is that done?

@shashibhushan3289 3 жыл бұрын

your content is dope bro

@hnasr 3 жыл бұрын

Shashi Bhushan your dope thanks Shashi ❤️

@happyandinformedlife1212 3 жыл бұрын

@@hnasr Can you make a similar one for API design?

@CrzyGazara 3 жыл бұрын

Best Practices Working with Billion-row Tables in Databases : Don't Work with Billion-row Table :)

@TechPrinc 3 жыл бұрын

Help!! i am user of database so indexing is not granted to me and all the rows except id are of datatype text so i can't make partition with those. partitioning with Id is not an option because data stored randomly. i am junior developer so sharding is not option again. my requirement is to search user based on parameter like Country or job title or company or all of them. data is huge almost 1B+. so please give idea what i can do ?

@hnasr 3 жыл бұрын

I don't think you can do much if, any backend dev/DBA can help you with your queries? you need to be able to have the proper indexes on the table to get better performance

@enlgn7050 3 жыл бұрын

But....is there not a problem with consistency in case of sharding?

@auroshisray9140 3 жыл бұрын

Effective content

@IdreesDargahwala47 3 жыл бұрын

Thanks for the video man! Loved it! In the redesigning the DB part of your video, taking the Profile example, where you have a column or a key which stores the list of followers, my doubt is will that not exceed the maximum limit of the row or a document and make it extremely heavy while retrieving one follower ? I think it's an open-ended discussion and totally depends on the use case.

@mayankpandey8271 3 жыл бұрын

One thing you could add id vertical partition. Otherwise, top content.

@rashadarbab2769 3 жыл бұрын

what about a hashmap type algo where you can be guided to where you need to go.

@tmanley1985 3 жыл бұрын

I'd like to see an example of how to handle sql joins when using shards. I'm guessing since you can't do joins using sharding, you'll have to pull everything in on the application side and glue everything together.

@hnasr 3 жыл бұрын

correct that is why I try to avoid sharding when possible

@tmanley1985 3 жыл бұрын

@@hnasr I guess that's a nightmare for reporting too.

@ryanisthewind 6 ай бұрын

wait saving data like that OK? I had the same mindset but I thought It was wrong because no one did that around me. thank you, amazing talk!

@echoky3189 3 жыл бұрын

What if someone writes a bot that just do follow and unfollows? Will that overload the database?

@hnasr 3 жыл бұрын

It will of course, that is why you shouldn’t make an API that will easily allow anyone to do so. Or charge them so much to call that api

@sarveshsawant7232 6 ай бұрын

Great video,question onf twitter design: what happens when someone unfllows you?

@decipheringthematrix56 5 ай бұрын

Sounds like describing Cassandra and MongoDB =D. Cheers

@shei69 3 жыл бұрын

Can you share your views on fauna db ?

@Flankymanga 3 жыл бұрын

Regarding sharding: Isn't it better to put the database instances into a cluster and let the master take care of sharding so that the client will not need to know which db instance to query - it will only need to know the master? Im not a DB guru but i always thought that this is how it works???

@hnasr 3 жыл бұрын

Correct, Some database vendors provide such feature yes but not every DB has this feature and for the longest time the engineer has to built this themself. I believe Vitess solves this for mySQL. Cassandra also has consistent hashing which distribute the data across shards..

@vinny142 3 жыл бұрын

@@hnasr PostgreSQL can do it with Foreign-Data-Wrappers; essentially you create a partition for a table and stick it on a different node through FWD. www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/ If you route all traffic for all shards through the root node that means you can still only go as fast as the root node, you're not leveraging the full capacity of each shard. Look at geaographical sharding where each shard holds the data that is used for a region, completely bypassing the 'root' node.

@user-nw3rz4yp4u 10 ай бұрын

Will your last solution work for the user with millions followers ? I can not wrap my head around. That's a huge amount of data for a single document/column

@rabbyhossain6150 Жыл бұрын

What about writing to a big postgres table. let's say, we have to do 1000 writes per second on a particular table.

@javaprogramming7235 3 ай бұрын

I am planning to make steak of daily watching your video(s). It's really amazing content 😁✌️ Great work. Let's see how much consistent I be in this activity 🎉 #🔥 (1)

@muhammadkhan-qd-c5996 3 жыл бұрын

awsome

@pavankumardittakavi5325 3 жыл бұрын

Hussain, how does a day in your life as an Engineer span out?. How do you manage to get time for your youtube videos?.

@112BALAGE112 3 жыл бұрын

Hussein has done a fenomenal job of impersonating Nigel Thornberry in the thumbnail picture.

@tanzimchowdhury320 Жыл бұрын

I would like to ask you just one thing, how did you gather so much knowledge? As a software engineer myself, I can assure you this isn't something you get from work experience. And I'm guessing it was books? If so please do share your recommended reading list

@claushellsing 3 жыл бұрын

Can't we use something like a Time Series DB, Elastic or Postgress or Cassandra, they are famous for being able handle heavy loads of data. Btw how the Bitcoin DB works ? There are millions of transactions

@GistTech 3 жыл бұрын

This is a typical job interview question

@m.m.4589 3 жыл бұрын

With nosql no problems, just an JSON array of IDs

@hnasr 3 жыл бұрын

hmmmm I think there must be a limit so we need to think scalability

@nagrajankola1323 7 ай бұрын

❤❤

@sohahm 2 жыл бұрын

Some people like me don't design the table in advance rather they look for a solution once there are 2 billion rows already.

@nixietubes Жыл бұрын

I keep hearing sharting

@rahulmasurkar6971 3 жыл бұрын

What if there are millions of followers?? Would that create an issue for that column??

@rahulgoyal3780 3 жыл бұрын

your content is great but @10:20 u created list of users(mongodb/json) and that can grow infinite but also there is a limit of mongodb that 1 document cant exceed 16mb size.

@hnasr 3 жыл бұрын

Rahul Goyal ohhh I didn’t know Mongo has that limit.. I wouldn’t say infinite though, Tyler swift had 50 million followers which is still manageable but still you are right .. someone mentioned graph database which can also be useful Thanks for sharing thoughts

@Iamhomkar 3 жыл бұрын

We could maintain a separate table, where each record being (user,list of follower Ids) which is queried only when required. You could break the total number of followers into multiple records. Something like paging from OS. Maybe userid+record_number as your primary key?

@rahulgoyal3780 3 жыл бұрын

Yes we can..@omkar

@vinny142 3 жыл бұрын

@@Iamhomkar Technicaly yes, but if the database cannot hold the amount of data you need it to hold then you should move to a different database that can.

@a.yashwanth 3 жыл бұрын

putting followers in one column violates 1NF i.e., atomic values.

@hnasr 3 жыл бұрын

assuming there is a field type called "list" than it shouldn't

@insearchof9090 17 сағат бұрын

you missed: one more such video.

@hurleyd9828 3 жыл бұрын

No one talks about scaling publicly like you do, and you do it for free.

@kamillatosinski3054 3 жыл бұрын

But why are billion-row tables slow? Why having nice indexes is enough for a couple million rows, but it’s not enough for a > 1 billion rows?

@hnasr 3 жыл бұрын

Because the index grows as well and it eventually becomes slow to traverse it.

@mikinyaa 3 жыл бұрын

@benmoskovits5816 10 ай бұрын

You had great ideas until the 10 minute mark. What happens a follower of 3 or n people leaves or changes their name. How will you find the follower unless you scan every single user? Stick to a real relational design and this will never be an issue.

@avinashthakur80 2 ай бұрын

Can you give any reference to what design are you talking about ? What is the real relational version of it ?