Cracking Data Science Business Cases | FAANG Interview Prep

  Рет қаралды 17,143

DataInterview

DataInterview

Күн бұрын

Пікірлер: 18
@aresdan
@aresdan 4 ай бұрын
Good video, but I think you forgot to ask the most important question: how much money the business can save with your model and whether it is worth implementing it in the first place. In your case, you want to create a model that would predict delays in real time and for passengers that are already in the airport and their departure should be in the next couple of hours. But in this case, either your prediction won't be important at all (if it is 10 min or 1 hour delay), since passengers can't do anything with this information, or your delay prediction is so large (5+ hours) that users will ask for extra compensation (since you didn't inform them earlier) and complain why didn't you inform them a day before that the plane is delayed by 5 hours. In your case, you assumed it is for people who are already at the airport, but I would argue that the actual end users are people who's flight is in a day or so. Since if the company would inform users a day before that their flight is 5 hours delayed, users can make adjustments and might not complain so much and request for a refund, compare to when you predict that the flight will be 5 hours delayed when the user is already on the airport. It's more important to predict whether a flight will be delayed 12+ hours before departure, so that the passenger would decide for themselves when to leave home and whether to make other plan changes, rather than a couple of hours before the departure, when most probably everyone is at the airport, and the delay can be easily calculated using a simple math. And when people are already on the airport, it's already too late to tell them that their flight is 24+ hours delayed. In this case you don't need to have a real time prediction. Next point, I don't agree with 95% cutoff. Yes, there might be some bugs, but I would not cutoff like that, since those flight delays might be actually the most important ones (which have 24+ hours delay)
@sgbalakrishna
@sgbalakrishna 2 жыл бұрын
I Think weather is also one of the crucial factor that account for delay
@oorjamathur8459
@oorjamathur8459 4 ай бұрын
1. Looking at the past history of the same airline, the preliminary cause of the flight delay might be operational issues. 2. Geography 3. Weather 4. No of runways - if they are too less then if one flight gets delayed then the chances of others getting delayed increases
@surajbansal1182
@surajbansal1182 2 жыл бұрын
Hi, first of all thank you so much for creating this video. It is definitely useful in understanding on how to solve data science business cases. I have a few questions around the types of solution that interviewers may expect. Clarification - In this video, you started off with business process, requirements etc but I think there was still a lot of scope to narrow down the problem statement. We could have asked questions such as 1. Is there a specific flight company or geography that we are talking about? 2. Are we talking about commercial flights, private flights etc? 3. How exactly are we planning to make use of predictions? (This information may not help us in building the model but it will surely help in evaluating the business metrics, right?) Here comes my first concern - Should we actually ask question 1&2 since it can actually be derived from data too? Measure of Success - In this video, you did not talked about how would we measure success (both from model and business perspective). Isn't this something very important? Therefore, coming up with metrics would be an important step and should be defined before-hand, right? Solution Phase - Once we have the relevant information, then we can move towards the solution phase which is model development as you have explained - this is core data science, of course and listing all the steps you mentioned are important. But before diving into the model, should we ask about the potential reasons for delays from the business or product so that we can use them before mode building? Evaluation - You did not talk about A/B tests or equivalent strategies (not sure how to exactly use A/B tests here) :(. But the model performance on ground should be evaluated in some way and it is indeed the responsibility of data scientists to come up with an approach. Therefore, just wondering how we can evaluate such a model on ground? One way is to see the difference between predicted & actual of course. Could there be a different way which we can relate to business metrics? Sorry for posting such a big comment but I really want to understand how relevant different aspects of case interviews are and where should we spend more time. Looking forward to your response. Thanks
@Aidan_Au
@Aidan_Au 2 жыл бұрын
Thank you Dan for doing a demo of a question and talking about how/when big tech vs startup would ask business case problems. Very thorough! I'll make sure to work through the 40+ business case problems you provided in your course. And yes, practice SQL and business case problems daily
@DataInterview
@DataInterview 2 жыл бұрын
Thanks Aidan!
@paolathecoach
@paolathecoach 6 ай бұрын
First I will list the variables, then use the data to test which of those are statistically relevant. Then from those that are relevant which is the most confident source of information, can we determine seasonality. Which of those variables are under our control. Then I will propose a couple scenarios of Failure Mode and Effects to observe in more depth the problem. Based on that run experiments with existent data to understand better in practice how it looks like and propose high level scenarios.
@huichen3556
@huichen3556 5 күн бұрын
When thinking about the problem again, it’s almost useless. When you know the flight delay 30m before the departure, you are already at the airport😂
@Raiseren
@Raiseren Жыл бұрын
Good video however i would argue your reasons for picking random forest applies to linear regression even more. Ideally you comment around the relationship between engineered features and the target. And discuss how it impacts linear vs decision tree based models
@AndresVeraF
@AndresVeraF Жыл бұрын
thank you very much for your content, I have learn a lot!
@AventuS1998
@AventuS1998 Жыл бұрын
Sir thank you for Explaining in such a simple way. You're the Best 😃👌
@Migsfigs
@Migsfigs 11 ай бұрын
Great content, thank you sir. Any recs/materials on prepping for data engineering instead of data science?
@shaz-z506
@shaz-z506 2 жыл бұрын
Thanks, Dan for providing such useful information with a detailed walkthrough in each step, however, I just want to know if is there any specific reason why you use MAE instead of RMSE and the other thing that I want to know is since you have a realtime data ingestion for every flight in your database, the dataset size will be very large so using k-fold cv could take a very long time to do model evaluation is there any better approach for large dataset.
@charlottenouwen8553
@charlottenouwen8553 Жыл бұрын
MAE is less sensitive to outliers than MSE, I'm guessing that's why Dan picked MAE.
@nothingspecific314
@nothingspecific314 2 жыл бұрын
Thanks for the video. Isn't it more appropriate to use time-series cross validation? I am also thinking about seasonality indicator variables that might affect the results.
@DataInterview
@DataInterview 2 жыл бұрын
In terms of TSCV, it would be appropriate, when in a given time series, let's say, t1, t2, t3, ..., tk, you are using t1 through tk-1 to predict tk, and it would make sense to do this if you believe that the time-series is auto-correlated. But, in this case, the problem is formulated as a regression, and you are always using prior data about the flight infomation to predict the delay in the future. So, K-CV should be sufficient. For seasonality, decomposing the datetime into month should be enough.
@AnkitDasCo
@AnkitDasCo 10 ай бұрын
Does the course goes into more detail of this video?
@PeterPan-hs5tu
@PeterPan-hs5tu Жыл бұрын
a,along content, thank you sir❤
Amazon Data Science Business Case | FAANG Interview Prep
31:52
DataInterview
Рет қаралды 13 М.
when you have plan B 😂
00:11
Andrey Grechka
Рет қаралды 60 МЛН
So Cute 🥰
00:17
dednahype
Рет қаралды 51 МЛН
Facebook Data Scientist Mock Interview - Segment Influencers
31:37
DataInterview
Рет қаралды 123 М.
Amazon Data Science Interview: Linear Regression
23:09
Exponent
Рет қаралды 20 М.
I Studied Data Job Trends for 24 Hours to Save Your Career! (ft Datalore)
13:07
Thu Vu data analytics
Рет қаралды 221 М.