This is by far the best DS mock interview I have seen on youtube, in terms of the authenticity of the coding environment, interview structure/flow/time control and interaction between the interviewee and interviewer. Thank you for making this! It would be great if you can provide feedback/comments to the interviewee at the end so we can learn what areas she did great/where she can further improve. The interviewee is amazing too! I am curious, what is her experience (YOE and job title)? Thanks.
@DrEhrfurchtgebietend2 жыл бұрын
Watching her struggle with a simple SQL question really made me feel better
@tamabebe55513 жыл бұрын
Hello, I don't know why people are being so cold, you did great on the interview.
@junyanyao69773 жыл бұрын
The case study probably want to follow this structure: 1. why you want to distinguish influencer account ? [let's see it's for better target ads, or use these informations in recommender system, etc] 2. What kind of data are available to us (account contextual information and behavioral information)? 3. Clarify which features that can be helpful (can talk about some classification models here, but mainly should be features insights) 4. clarify which features are most important (from product sense and machine learning points- e.g. permutation importance, gini importance) 5. Summarize it.
@torinojuve13 жыл бұрын
Hi - was a Facebook DS and I gave many interviews. This is nothing like the Facebook DS interview.
@FuyangLiu3 жыл бұрын
So what the real ones differ from this?
@asthasrivastava95643 жыл бұрын
Can you please share the real experience, please?
@MegaAntimason3 жыл бұрын
The first sql answer is incorrect you cant filter on rank yet, you have to create a sub query.
@luhan51292 жыл бұрын
agree
@browser12322 жыл бұрын
Ugh, came to the comments just to say this. That was pretty bad.
@ipvikas2 жыл бұрын
Correct MYSQL query is: select user_name, ROW_NUMBER() over () as 'Rank' from Messages window w as (partition by date order by message_sent/message_received desc)
@brothermalcolm3 жыл бұрын
I feel like this is not the typical fb style interview, but I definitely learned something useful here!
@orangethemeow3 жыл бұрын
It doesn't seem like an analytic role. FB prep session mentions that ML is not required for the analytics track.
@huanchenli4137 Жыл бұрын
@@orangethemeow Most DS roles at FB are just DA or BI, not real DS
@PremiumTrackerSilverStacker3 жыл бұрын
I don't think she answered the question right on the log odds correctly. CI in log odds is insignificant if it includes 0. CI for odds is insignificant for including 1
@datahat6423 жыл бұрын
The case study has been worked in detail. An additional important feature could be if there any other influencers following the particular user under consideration
@reanschwarzer10263 жыл бұрын
The third question about the confidence interval of logistic regression is kind of misleading and challenge from the interviewee's perspective. More clarification work should help to understand like if it is the logit format or probability format. First, the question is asking if log-odds (logit) could be 0, I think it is possible, log(p/1-p) definitely could be zero when p=1-p, then you jumped to the confidence interval of the odds ratio, which is kind of tricky if you are treating the odds ratio and log odds as the same stuff (odds ratio is not taking log). The odds ratio format should be like the exp(beta), then when 1 included in the CI, that means beta could be zero since exp(0)=1, then accept the null hp to say beta coefficient is not significant.
@qifeizhang48343 жыл бұрын
your comments is very helpful!
@sujaykha3 жыл бұрын
Yup exactly what I though!.. thanks to uploader for great content though 👌
@jlh530i13 жыл бұрын
... a friend of mine was asked to write an algorithm for search autofill during the case portion of their interview
@jaeen76652 жыл бұрын
Dang coefficient would've gotten me off the bat. Idda said run regression and print the summary...whoops.
@simonhafner47503 жыл бұрын
Thanks a lot for sharing? May I ask which level this mock interview is meant for?
@ni129073 жыл бұрын
Hey the font size is too small, can you please post the questions somewhere?
@DataInterview3 жыл бұрын
Sure thing. Noted for the next video.
@ajitkirpekar42512 жыл бұрын
Thank god it wasn't expected to derrive the MLE. Also, I am a bit surprised FB expects someone to remember the OLS matrix equations for beta coefficients. I mean, it was lasered into my brain sure, but I am not sure that's proof of anything other than I happened to commit it to memory. I also happened to commit the equations for generalized method of moments, but that's also not proof of anything.
@joelwillis20432 жыл бұрын
Well, she commuted the solution but it's not commutable. Her matrix product is not compatible. If you cant derive it from the residual sum of squares you probably don't understand anything from calculus.
@tuanseattle2 жыл бұрын
Yeah, i thought the answer would be simply said OLS (because we do not do it by hand...). But it looked like equations need to be remembered lol
@joelwillis20432 жыл бұрын
@@tuanseattle Again, there is literally nothing to remember. Just take the derivate of the residual sum of squares and set it to 0 and solve. It is a very simple calculation. The analog of what you learned in 1st-semester calculus.
@stanislavdidenko84362 жыл бұрын
@@joelwillis2043 I can derive it, but during an hour or so, sitting with pen a paper. it is not trivial, because you are dealing with matrix forms and at some point you have to abstract it from partial derivative to the gradient form solution. It is not interview format task to derive it. I am 3years middle DS. There was no a single day in my carrier where this skill was needed.
@vnpikachu46273 жыл бұрын
The first sql you have to create a subquery, or use HAVING instead of WHERE.
@weiyangshi47293 жыл бұрын
Would filtering using HAVING work here? I thought SELECT is executed after HAVING. Correct me if I'm wrong!
@techsavy56693 жыл бұрын
What was the experience in years for interviewer & interviewee ?
@bcws Жыл бұрын
Isn' the Beta of the logistic regression the change in Y (or log odds in this case) given a 1 unit change in X? If so, then it is possible for Beta to be 0 (or 0 to be in beta's confidence interval) as that implies a 1 unit change in x does not have any change in log odds. However, if we want to look at odds, then we need to take the exponential of Beta, in which case it is not possible for the confidence interval of exponential of Beta to contain 0. The confidence interval here is not referring to the log odds, but the change in log odds given a change in x.
@chemtech73 жыл бұрын
I have never been asked these type of statistics questions or to derive formulas or coefficients on a data science interview.
@redcloud69753 жыл бұрын
Y’all getting interviewed?😂😭
@Hephasto3 жыл бұрын
What questions do you get then?
@jimbocho6603 жыл бұрын
@@Hephastobrief explanation of the idea behind ensembling of models; advantages and disadvantages of decision trees; the hyperparameters of a random forest classifier; detecting and explaining multicolinearity; basic probability especially simple conditional probability calculations; how to regularize neural networks; basic SQL and so on.
@Tusharchitrakar10 ай бұрын
@@jimbocho660but these questions seem way easier than ones that need deeper insight into mathematical revelations. I guess it depends on the company
@AniltonNeto3 жыл бұрын
13:04 is bad, cuz the result for the division is undefined, in this case, you change NULLIF(field, 1) instead :-P and filter zero values :)
@StraightCrossing3 жыл бұрын
I would prefer to filter the data so there just isn't null or 0 with WHERE message_recieved > 0
@vvalk2vvalk3 жыл бұрын
@@StraightCrossing My thoughts exactly.
@orangethemeow3 жыл бұрын
@@StraightCrossing Same. Then we don't have to worry about those 0s
@naraendrareddy2732 жыл бұрын
WTF? I didn't know they would go so deep into statistics. Multivariate regression? Derive the Beta coefficient? Wow, I'm stumped right at the beginning. :(
@vvalk2vvalk3 жыл бұрын
Thank you for the video. Pretty informative. This shows imposter syndrome is real. I do understand that there were follow-up interviews and further rounds, but it does give much more confidence, given that it is a SENIOR interview at FACEBOOK. I am now actually considering to try out Data Scientist path some time in the future.
@DataInterview3 жыл бұрын
A lot of people have imposter syndrome to some degree even those with many years of industry experience. I've been a data scientist for 5 years (2 years non-tech and 3 years in tech), and I still experience the syndrome at times. But, over time, you experience it less as you gain more experience.
@naraendrareddy2732 жыл бұрын
Oh thank you for letting me know this is a senior DS interview. I'm trying to become a junior DS first.
@genuinebasilnt3 жыл бұрын
I read the title *A Facebook data scientist mocks interviews*
@maddoo232 жыл бұрын
Um, the expression for beta is wrong (first question). its - beta = (X'X)^(-1)X'Y
@hotmilkritata2 жыл бұрын
Like the stat questions
@mehmetedex3 жыл бұрын
her keyboard I imagine made of keys made of ten inch springs with wooden top :D
@LouisChiaki3 жыл бұрын
It sounds like a very expensive mechanical keyboard!
@dsgarden3 жыл бұрын
Dude you need a shave asap, will make you 1000 yr younger
@toshb13843 жыл бұрын
3:15 - isn’t (X’X)^(-1)X’y derived from the maximum likelihood estimate? I thought the correct answer would be stochastic gradient descent.
@DataInterview3 жыл бұрын
The equation is the Least Squares Method that provides an unbiased estimation of a regression parameter. Maximum likelihood estimation is a different parameter estimation technique that maximizes the likelihood of a model given data. You can use SGD to run MLE. But, unlike the least-squares method, the maximum likelihood estimation does not always lead to an unbiased estimation of a regression parameter.
@DataInterview3 жыл бұрын
Hope this clarifies your question :)
@toshb13843 жыл бұрын
@@DataInterview thanks for the response. I guess what I’m trying to say: isn’t MLE the same thing as least squares? You can derive the least squares solution directly from maximum likelihood estimation, and you get the same solution. stats.stackexchange.com/questions/143705/maximum-likelihood-method-vs-least-squares-method
@mrblahblihblih3 жыл бұрын
yeah I think that's true to use SGD for non closed form, but deriving the beta coefficients from least squares and from MLE should actually give you the same, (X’X)^(-1)X’y
@DataInterview3 жыл бұрын
@Tosh B and @wjyu_, thanks for the comments. It's actually not correct to assume that the MLE is the same as the Least Squares Method. Even the author of the StackOverFlow comment notes that it's "equivalent" under certain conditions. That is, it provides the same solution under certain conditions (in this case the estimation of parameters in a linear model). But, just because the solutions are the same, it doesn't mean that the methods are the same. The Least Squares Method minimizes the distance between the target and projection vectors with no stochastic assumptions. MLE, on the other hand, estimates parameters by maximizing a likelihood function such that the observed data is most likely. Additionally, the Least Squares Method leads to unbiased estimations of model parameters. However, MLE can sometimes lead to biased estimations.
@pvss20003 жыл бұрын
For the influencer versus non-influencers, could you do something where first you identify those who actually have content that has products that are being 'advertised', then you correlate the presence/views of that video with sales of that product. If correlation reaches above a certain point then they are an influencer.
@pal9993 жыл бұрын
It would be helpful to post the correct answers at some point in the future
@konataizumi58293 жыл бұрын
They never do. It sucks.
@ПавлоСкляр-д6т3 жыл бұрын
@@konataizumi5829 at least some kind of grade for the interviewee would be pretty informative
@kristofmeszaros49243 жыл бұрын
I may be wrong but the sql question seemed pretty straightforward, shouldn't the solution just be select m.user_name , m.date, max(m.message_sent/m.message_received) as "Ratio" from Messages as m where m.message_received > 0 group by m.user_name , m.date order by Rate desc
@huzuvettin3 жыл бұрын
@@kristofmeszaros4924Looks much better to me tbh. Except the "rate" should be "ratio" as aliased earlier right?
@huzuvettin3 жыл бұрын
Exactly, I get that even watching this interviews applies somewhat knowledge to us but without the correct answers what values should we take as ground truth table am I right B-)
@HardawayLong Жыл бұрын
The first question, Do we really need a maximum likelihood estimate to deal with getting beta coefficients for regression problem? I think it will only been used in classification, right? Will gradient descent be the correct answer?
@ecotrix1329 ай бұрын
OLS, MLE , Grandient descent are different ways
@ASOT6662 жыл бұрын
amazing, super helpful!
@phyrajkumarverma4412 Жыл бұрын
Hi, I also want to give my mock interview. Could you take it please? I am doing my graduation and currently, I am in 3rd year of computer science. I want to be good in data science
@adamdreier3 жыл бұрын
That function in JavaScript is annoying me, please use ES6 arrow function for binding.
@oliesting49213 жыл бұрын
Hardly see anything...dark and font too small
@xiaowenkang95985 ай бұрын
👍thank you so much
@oaasal3 жыл бұрын
Is that a junior level interview?
@DataInterview3 жыл бұрын
Senior
@oaasal3 жыл бұрын
@@DataInterview That sounds easier than I thought. Maybe I should change my job.
@DataInterview3 жыл бұрын
@@oaasal Doesn’t hurt :) Do note that this emulates a phone screening. On-site and case studies are another set of challenges. Often much more challenging than phone screening. If you are Interested in prep content, make sure to check out www.topds.io
@orangethemeow3 жыл бұрын
@@DataInterview This domain doesn't exist anymore :(
@DataInterview3 жыл бұрын
@@orangethemeow Go to datainterview.com
@cooldudesheks3 жыл бұрын
Thanks for such an insightful content! I have a clarification question on 3rd stat problem. You asked if log-odds i.e. logit value can be 0 or not. Since the logit scale is -infinity to +infinity, log-odds can have 0 values dont they? She answered cannot have 0 but minimum of 1. I would appreciate if you can clarify if that was the right answer or I am missing something here. Thanks again! 👍
@neethualan65433 жыл бұрын
Interviewer asked what if CI of log odds contains zero. However answer was based on odds=1 (there is no association between independent and dependent variable). When log odds = 0 then there is no statistical significance. Answer is correct as odds = 0, means log odds = 1. Then either question or answer should’ve been more clear.
@ariss33043 жыл бұрын
I’m going with a reverse engineering path into college, please tell me I don’t have to learn these things.
@ariss33043 жыл бұрын
Specifically the beta coefficient part
@DataInterview3 жыл бұрын
If someone asked me these topics when I started learning stats 7 years ago, I would have been frightened myself. But, years of diligent studies, and working in multiple DS jobs helped me develop confidence. I'm confident you will go through a similar experience as well. Here's a video with a commentary that should provide a more "gentle" introduction to the interview in DS: kzbin.info/www/bejne/oqXLc56Kg52Jps0&lc=UgxNhFz1NZlAPWn1pI54AaABAg
@jeremythompson-seyon54633 жыл бұрын
Where do I start if I want to learn the skills needed to go into data science? I just started a statistics class and Ive been really interested in the modeling and practical applications. I only barely understand the basics of R and SQL to give you an idea of where my knowledge is. Thanks for the video
@DataInterview3 жыл бұрын
I would start with Kaggle. Emulate worked out examples provided by the community members on the site.
@caremacosta14353 жыл бұрын
Depends on the knowledge you already have. Learn python Highschool math is enough I would recommend the book machine learning for absolute beginners, is not that long and it summirizes basic concepts very well Keagke has many courses, for you to learn basics on machine learning, SQL, data visualization etc And practice practice practice
@superfreiheit13 жыл бұрын
Cant see anything to small. Zoom in
@DataInterview3 жыл бұрын
Thanks Joe. Duly noted for the next video.
@superfreiheit13 жыл бұрын
@@DataInterview Can you see something on the video?
@zakarie2 жыл бұрын
Actually the confidence interval is not interpreted as the chance that true value falls in the interval but the accurate interpretation should be there is 95% probability that the random interval falls on the true value.
@sirongzeng40963 жыл бұрын
Anyone want to come and join a group of mock interview for data analyst? I'm looking for people to mock together, in aspects of coding, behavioral questions, and resume. Thanks!
@sirongzeng40963 жыл бұрын
Or there is a group or slack channel, please let me know! Thanks!
@md.imrulhasan87572 жыл бұрын
I want to join... Can you please include me ?
@miraarora81423 жыл бұрын
solution for 1st SQL Question: select t_date, user_name from messages where message_received != 0 group by t_date, user_name order by sum(message_sent)::float/sum(message_received) desc limit 1;
@srk3123 жыл бұрын
for each day...one row..not one row overall
@CommentaryCentral3 жыл бұрын
This is the sort of stuff we covered on the Msc Data Science course in the UK, I cant believe its a senior level interview
@Cooldownman1973 жыл бұрын
You just covered not implemented
@DataInterview3 жыл бұрын
Some of these are covered in undergraduate level as well but, when you are under pressure, and a breath of things are covered in 30 minutes, it’s a much different experience than getting a homework assignment.
@CommentaryCentral3 жыл бұрын
@@DataInterview yeh im sure you are right
@tuanseattle2 жыл бұрын
"Covered" does not mean it is not hard because people forget stuffs that they do not use frequently. University covers a lot of stuffs
@ipvikas2 жыл бұрын
Sql#1: Correct MYSQL query is: select user_name, ROW_NUMBER() over () as 'Rank' from Messages window w as (partition by date order by message_sent/message_received desc)
@md.imrulhasan87572 жыл бұрын
done
@sourabhsharma98303 жыл бұрын
That is not the confidence interval, that is credible interval. Confidence interval means 95 % of the time the estimated beta coefficient will predicted the correct result which “y”. To get 95 % confidence of beta coefficient we need to use Bayes parameter estimation which will give you a posterior distribution of beta coefficient with 95% credible interval.
@MJ-pz6tn2 жыл бұрын
no
@sourabhsharma98302 жыл бұрын
@@MJ-pz6tn yes
@beaglesnlove5803 жыл бұрын
Lol these questions are a joke. I broke into fb. Least squires, MLE or gradient descent. Ans: Logistic regression, or something classsifier. Confidence intervals-these are estimates of regression variables. Presence of 0, u have to do t-test on the individual variable