Uber Data Engineer Interview: Design a Ride Sharing Schema

Рет қаралды 83,095

Күн бұрын

Пікірлер: 55

@iqjayfeng 4 ай бұрын

If you enjoyed discussing this question with us, explore more on our website! Yelp-Like System: www.interviewquery.com/questions/yelp-like-system? Clickstream Data: www.interviewquery.com/questions/clickstream-data? Click Data Schema: www.interviewquery.com/questions/click-data-schema?

@gr82moro Жыл бұрын

The data model design should not consider actual implementation details, and too early to consider optimizations, which can be added and fine-tuned later on. First thing to do is to understand the functional requirements and draft how many "entities" make sense to the business and then work out the ER diagram to satisfy the requirements and future needs.

@pankm2989 Жыл бұрын

exactly my thoughts at the start but overall good approach from the guy

@vaibhavmaurya8505 2 жыл бұрын

After listening first 3 minutes, I do not agree with the approach. Why you have not started with Entity Relationship model? You have immediately jumped on the technical requirements, I think one should first complete the functional requirement with data modeling.

@PhillyHank Жыл бұрын

🤔 it’s probably always good to talk through ERD and lock that with the interviewer. However, in this specific example, the candidate (TC) quickly chose a NOSQL db as they were optimizing for latency. Generally speaking, NO SQL means there really isn’t a entity relationship. I do think they should have started with 1. Use cases 2a. Filter the interview to 1-2 use cases (e.g., finding a driver/requesting a trip, providing feedback on a trip, payments, etc…) 2b. Talk through limitations (e.g., one language so no L10N, I18N, no accessibility, worldwide or national scale, no saving favorite drivers/trips locally, therefore no need to have a db on the local surface, like a mobile phone) 3. Align on what we are optimizing for… that’s a bit different than use cases, but TC called these “use cases” - e.g., optimizing for speed/performance; optimizing for analytics; optimizing for revenue generation 4. Align on compliance, GDPR, CCPA, etc ## simply mention to interviewer, e.g., “As TC I will assume I must comply with some of the key data governance regulations like GDPR, so the data will have to be stored (in some fashion) where the customers and/or drivers reside.” 5. Design and get feedback 6. Optimize later and then get feedback

@tejtalks1 5 ай бұрын

I feel the approach used here is very unorganised. Jitesh directly jumped on the optimisations, and the modelling part. Also Jay asked about the schema of Dimension & Fact tables, which were not provided explicitly. NoSql database was chosen but the question specifically asks about the join conditions and how would we maintain relationship between different tables. In my view the approach should be like : 1. Clarifying Questions : Use-cases / Volume of data / Analytics or Transactional / Is it particular location based ? / What metrics we are looking at / Final goal of system ? etc . 2. Functional and Non-functional requirements : Discussion on Scalability, Latency, Availability, Consistency etc. CAP theorem can be used. Trade-offs . High read throughput or write ? etc. 3. Key metrics involved (if any) : Daily no. of rides, avg daily active users, avg rides over a month,90days etc. 4. High-level design : Storage , ETL process (if any) , final data warehouse, data lake , relational database or nosql or data warehouse etc. 5 . low-level design : Actual table and schema modelling . Dimension and fact tables with their attributes. Partition or index keys. Any aggregate tables to calculate metrics etc. 6. Security and compliance issue ( if any). Optimisations (if any) 7 .Summarise whole approach and communicate effectively. I am not an expert, but this is what I learned after giving multiple Data system design interviews in companies (including FAANG). Thanks

@yashasvimadhukumar9666 Ай бұрын

Hi, do you mind sharing some of the resources you used for preparation. Your outline is what I'm focusing my preparation on too.

@kavehrezaeimoghadam360 2 жыл бұрын

Dynamo DB has eventual consistency for writes, terrible choice for actual production app at scale. Transactional databases can scale very well. If Facebook can scale on Mysql DB, for sure any other app can as well. Trips is too crucial concept to sacrifice ACID for low cost latency gains with Nosql. Also in dimensional modeling it's not practical to avoid joins and put everything in one table. Scalability is also important in analytics, one need to be able to create new fact table on demand in different grains. Having conformed dimensions and being able to perform proper joins to low grain facts are very important.

@tahiliani22 6 ай бұрын

Do you know where can we get access to such content if we are preparing for Database Design interviews?

@RU-qv3jl 4 ай бұрын

@@tahiliani22 From having sat on both sides of the interview for over a decade, it doesn’t matter nearly as much as you think. If I wanted to fail this guy I would. I am not interested in a perfect response and never have been. There is no right answer. Facebook didn’t not settle on their current solution at the start. They started off with MySQL and built on that until they customised it a lot and automated a load of things around it. In an interview you have to show that you can think, have a good approach and are a team player who is willing to learn and has passion. Whilst the approach this guy took was more than OK, it wouldn’t guarantee you a job. You can always pass or fail depending on the interviewer alone. I hired two different people who I had to train up for over 18 months each. Really enjoyed working with them. They would not have been able to come up with a particularly good database design for this when I hired them. By the time I left the company there were really good, confident and independent. Yes I’ve worked on some really big sharded deployments and also diddly little DB systems. They can all be fun.

@brothermalcolm 2 жыл бұрын

Interesting how it seemed like Jay initially was asking for the schema of the app backend transactional database yet because the interviewer clarified early on the whole question pivoted towards analytical data warehouse design

@ashokkumar-nv8gj Жыл бұрын

Exactly. He has mixed OLTP into OLAP 😂

@Nonchalant2023 Жыл бұрын

I don't understand why this person chose a NOSQL solution based on performance if the objective is to create a OLTP database? The main reason to choose nosql (in my ignorant opinion) is to avoid the rigidness of structured databases, not performance---right? Additionally, how do you jump from a nosql model to a relational db for analytics?

@muribbimehmet9009 9 ай бұрын

It feels like Jay needs to prepare more for these interviews

@arunasingh8617 2 жыл бұрын

This is one of the most useful video for Data Engineer. Great Job (y)

@timz2917 10 ай бұрын

The data warehousing model was actually very impressive

@G-Sinko 3 ай бұрын

Should of built a dimensional schema, it would of taken longer, but it would answer everything and give you an easy time when going to an end user

@PhillyHank Жыл бұрын

It’s interesting this did not immediately cover DQ checks, threshold , error handling, etc which is one of the key responsibilities for a data engineer. ** Of course, in this example the candidate was building a brand new system and therefore there was no data loads, ingestion, ** STILL, I cannot imagine an systems design interview not covering this topic. At most tech companies, there are two loops - 1. Design a system from a database perspective - this type of video ## this would include “coding” i.e., running SQL statements and 2. What are the design considerations for designing an entire DB platform? E.g., how to know when to do a data warehouse vs lake (batch vs streaming), ETL/ELT methods, DQ, put together together shrads (sp?), data recall, etc. Mixing both of these outcomes into one video / interview is hard to do and watching a video the user would likely get lost in the conversation.

@iqjayfeng 2 жыл бұрын

If you’re looking for a tailored interview prep for DE, check out our newly launched Data Engineering learning path. Our learning path will help you practice core skills that are commonly evaluated in data engineering interviews. Try it here: www.interviewquery.com/learning-paths/data-engineering

@sagarpatil1646 2 жыл бұрын

This is one of the best Channel for Analytics part, God Bless you Jay 🙌🏻, Keep up the Good work 🎉🎈

@priyadarshanmohanty277 Жыл бұрын

any reason why Cloud DWHs werent considered such as Snowflake/Big Query ?

@gravenguan 2 жыл бұрын

Have some real talk from experienced data engineer, that's very good content!

@siddharthtripathy3964 2 жыл бұрын

who's Jitesh Soni. Totally love his clarity of thought.

@chunuk4442 2 жыл бұрын

Please upload business intelligence interview questions and answers....

@Expateer 2 жыл бұрын

Rule #1: Never, ever dual-use an operational table for analytics. Also - why did he choose NoSQL this use case?

@alahad4723 2 жыл бұрын

Good point. What would be the best approach for analytics? Create a separate table on the backend?

@Expateer 2 жыл бұрын

@@alahad4723 Exactly. Insert new records in the analytics tables as needed in batches or even mico-batches. And remember that if the information in an analytics table is being used to facilitate transaction processes, then it's no longer an analytics table

@jiteshsoni3030 2 жыл бұрын

I agree and I did not use the operational table twice. Idea is to solve latency use case using No-SQL and the last denormalized table for analytics. Next time, we will use a Drawing/charting tool so it's more clear. -- Jitesh

@Expateer 2 жыл бұрын

@@jiteshsoni3030 Don't mean to pick on you, Jitesh, but this video was wrong on so many levels that I found myself cringing. You're sharding on the initials of a city without any consideration of data distribution or even of dealing with cities with the same name in multiple states (almost every state has a "Springfield"). What if 80% of your trips happen in LA, NYC, and SF? Isn't your sharding going to be worthless? How do you tell the difference between Springfield, OH and Springfield, MO? What if you want to aggregate by State or by country? With no joins, your single table is going to get awfully wide. There were many such issues here, so I won't go through all of them - but I'll say that you're not wrong in answering like this. Despite the poorly designed solution, the interviewer is going to nod in agreement -- and you'll get the job. People like me will get washed out because the interviewer has nothing more than a rudimentary understanding of how indexing works and likely got into the organization to begin with by cramming videos on NoSQL and sharding as opposed to having actual experience in data engineering. This is the first video like this I've seen on DE, and frankly, I'm appalled.

@Expateer 2 жыл бұрын

That sharding is considered before indexing is mind-boggling to me. An unsharded, but well-indexed table would run circles around this solution in terms of speed and flexibility. And no, in a dimensional model, the model does not depend on any one ML algorithm

@thndesmondsaid Жыл бұрын

yeah I agree he should have mentioned indexing. That being said, he was talking about partitioning, not sharding. Sharding is when you split the data in a table across several nodes (shards). You can do partitioning of a table within a single relational db on a single server. Sharding becomes relevant when the data is too large to fit on a single server.

@Expateer Жыл бұрын

@@thndesmondsaid I simply don't see how a data engineering interview can be considered "good" if indexing wasn't even mentioned. Partitioning is great and usually comes after and as a complement to indexing, not a replacement for it.

@zakaria5304 9 ай бұрын

@@Expateer Most data warehouses don't support indexes

@Expateer 9 ай бұрын

@@zakaria5304 Wow .. now you've left me speechless

@kanchidoshi6907 2 жыл бұрын

Thanks for this video. During the actual interview, what if the interviewer is only interested in designing in relational DB style, can you please also share how to proceed in that case?

@iqjayfeng 2 жыл бұрын

Jitesh highlights this in the second half of the video!

@haidar6280 2 жыл бұрын

This channel is fantastic. Thanks kindly for your content!

@miguelamezcua4629 2 жыл бұрын

I agree

@ak6877 11 ай бұрын

How are we talking about optimization at the very beginning?

@iqjayfeng 10 ай бұрын

That’s how we do it

@gueswhom3040 3 ай бұрын

This is pretty bad. They spent more time talking about partitioning than planning a scheme that would contain the basic trip info... for who, by who, when start, where start, when end, where end, ... You recommend nosql then gonna suggest a table storing minute by minute driver position. You guys should have given this more thought and did a practice run and pre-planned the scheme first and faked the interview being live/off the cuff. This was disjointed and too many "what the hell are you thinking moments"

@davidoh0905 4 ай бұрын

where can I get more of these?

@iqjayfeng 4 ай бұрын

You can check out similar questions here: Yelp-Like System: www.interviewquery.com/questions/yelp-like-system? Clickstream Data: www.interviewquery.com/questions/clickstream-data? Click Data Schema: www.interviewquery.com/questions/click-data-schema?

@sergior.m.5694 Жыл бұрын

This is great content, good for my brain to think through some of this and think to myself why I agree and disagree with him

@hydtechietalks3607 2 жыл бұрын

In this interview, Latency is solved by NOSQL.. thats a generic answer, is nt it? and listed attributes of general expected.. regional seggregraton.. thats it?

@WelcomeToDataverse 2 жыл бұрын

Where can I learn such skills?

@pranavreddykallem2025 Жыл бұрын

Great content but too many ads

@avinashb4485 2 жыл бұрын

Very useful and informative video. Thanks a lot!!

@ShaluChaudhary 2 жыл бұрын

Awesome video

@bayusuharso Жыл бұрын

I'm confused, he uses a NoSQL DB but query the data in SQL like RDBMS?

@fbreetz Жыл бұрын

The Transactional Table is NoSQL, but The Analytical is SQL. There would be an ETL process to move from the Transactional Table to the Analytical Table.

@jenilchristo8775 11 ай бұрын

There are sql apis available as well for no sql dbs like say azure cosmos db