Facebook Data Scientist Mock Interview

Facebook Data Scientist Mock Interview - Segment Influencers

Рет қаралды 124,501

Күн бұрын

Пікірлер: 111

@nanfengbb 3 жыл бұрын

This is by far the best DS mock interview I have seen on youtube, in terms of the authenticity of the coding environment, interview structure/flow/time control and interaction between the interviewee and interviewer. Thank you for making this! It would be great if you can provide feedback/comments to the interviewee at the end so we can learn what areas she did great/where she can further improve. The interviewee is amazing too! I am curious, what is her experience (YOE and job title)? Thanks.

@DrEhrfurchtgebietend 2 жыл бұрын

Watching her struggle with a simple SQL question really made me feel better

@tamabebe5551 3 жыл бұрын

Hello, I don't know why people are being so cold, you did great on the interview.

@junyanyao6977 3 жыл бұрын

The case study probably want to follow this structure: 1. why you want to distinguish influencer account ? [let's see it's for better target ads, or use these informations in recommender system, etc] 2. What kind of data are available to us (account contextual information and behavioral information)? 3. Clarify which features that can be helpful (can talk about some classification models here, but mainly should be features insights) 4. clarify which features are most important (from product sense and machine learning points- e.g. permutation importance, gini importance) 5. Summarize it.

@torinojuve1 3 жыл бұрын

Hi - was a Facebook DS and I gave many interviews. This is nothing like the Facebook DS interview.

@FuyangLiu 3 жыл бұрын

So what the real ones differ from this?

@asthasrivastava9564 3 жыл бұрын

Can you please share the real experience, please?

@MegaAntimason 3 жыл бұрын

The first sql answer is incorrect you cant filter on rank yet, you have to create a sub query.

@luhan5129 2 жыл бұрын

agree

@browser1232 2 жыл бұрын

Ugh, came to the comments just to say this. That was pretty bad.

@ipvikas 2 жыл бұрын

Correct MYSQL query is: select user_name, ROW_NUMBER() over () as 'Rank' from Messages window w as (partition by date order by message_sent/message_received desc)

@brothermalcolm 3 жыл бұрын

I feel like this is not the typical fb style interview, but I definitely learned something useful here!

@orangethemeow 3 жыл бұрын

It doesn't seem like an analytic role. FB prep session mentions that ML is not required for the analytics track.

@huanchenli4137 Жыл бұрын

@@orangethemeow Most DS roles at FB are just DA or BI, not real DS

@PremiumTrackerSilverStacker 3 жыл бұрын

I don't think she answered the question right on the log odds correctly. CI in log odds is insignificant if it includes 0. CI for odds is insignificant for including 1

@datahat642 3 жыл бұрын

The case study has been worked in detail. An additional important feature could be if there any other influencers following the particular user under consideration

@reanschwarzer1026 3 жыл бұрын

The third question about the confidence interval of logistic regression is kind of misleading and challenge from the interviewee's perspective. More clarification work should help to understand like if it is the logit format or probability format. First, the question is asking if log-odds (logit) could be 0, I think it is possible, log(p/1-p) definitely could be zero when p=1-p, then you jumped to the confidence interval of the odds ratio, which is kind of tricky if you are treating the odds ratio and log odds as the same stuff (odds ratio is not taking log). The odds ratio format should be like the exp(beta), then when 1 included in the CI, that means beta could be zero since exp(0)=1, then accept the null hp to say beta coefficient is not significant.

@qifeizhang4834 3 жыл бұрын

your comments is very helpful!

@sujaykha 3 жыл бұрын

Yup exactly what I though!.. thanks to uploader for great content though 👌

@jlh530i1 3 жыл бұрын

... a friend of mine was asked to write an algorithm for search autofill during the case portion of their interview

@jaeen7665 2 жыл бұрын

Dang coefficient would've gotten me off the bat. Idda said run regression and print the summary...whoops.

@simonhafner4750 3 жыл бұрын

Thanks a lot for sharing? May I ask which level this mock interview is meant for?

@ni12907 3 жыл бұрын

Hey the font size is too small, can you please post the questions somewhere?

@DataInterview 3 жыл бұрын

Sure thing. Noted for the next video.

@ajitkirpekar4251 2 жыл бұрын

Thank god it wasn't expected to derrive the MLE. Also, I am a bit surprised FB expects someone to remember the OLS matrix equations for beta coefficients. I mean, it was lasered into my brain sure, but I am not sure that's proof of anything other than I happened to commit it to memory. I also happened to commit the equations for generalized method of moments, but that's also not proof of anything.

@joelwillis2043 2 жыл бұрын

Well, she commuted the solution but it's not commutable. Her matrix product is not compatible. If you cant derive it from the residual sum of squares you probably don't understand anything from calculus.

@tuanseattle 2 жыл бұрын

Yeah, i thought the answer would be simply said OLS (because we do not do it by hand...). But it looked like equations need to be remembered lol

@joelwillis2043 2 жыл бұрын

@@tuanseattle Again, there is literally nothing to remember. Just take the derivate of the residual sum of squares and set it to 0 and solve. It is a very simple calculation. The analog of what you learned in 1st-semester calculus.

@stanislavdidenko8436 2 жыл бұрын

@@joelwillis2043 I can derive it, but during an hour or so, sitting with pen a paper. it is not trivial, because you are dealing with matrix forms and at some point you have to abstract it from partial derivative to the gradient form solution. It is not interview format task to derive it. I am 3years middle DS. There was no a single day in my carrier where this skill was needed.

@vnpikachu4627 3 жыл бұрын

The first sql you have to create a subquery, or use HAVING instead of WHERE.

@weiyangshi4729 3 жыл бұрын

Would filtering using HAVING work here? I thought SELECT is executed after HAVING. Correct me if I'm wrong!

@techsavy5669 3 жыл бұрын

What was the experience in years for interviewer & interviewee ?

@bcws Жыл бұрын

Isn' the Beta of the logistic regression the change in Y (or log odds in this case) given a 1 unit change in X? If so, then it is possible for Beta to be 0 (or 0 to be in beta's confidence interval) as that implies a 1 unit change in x does not have any change in log odds. However, if we want to look at odds, then we need to take the exponential of Beta, in which case it is not possible for the confidence interval of exponential of Beta to contain 0. The confidence interval here is not referring to the log odds, but the change in log odds given a change in x.

@chemtech7 3 жыл бұрын

I have never been asked these type of statistics questions or to derive formulas or coefficients on a data science interview.

@redcloud6975 3 жыл бұрын

Y’all getting interviewed?😂😭

@Hephasto 3 жыл бұрын

What questions do you get then?

@jimbocho660 3 жыл бұрын

@@Hephastobrief explanation of the idea behind ensembling of models; advantages and disadvantages of decision trees; the hyperparameters of a random forest classifier; detecting and explaining multicolinearity; basic probability especially simple conditional probability calculations; how to regularize neural networks; basic SQL and so on.

@Tusharchitrakar 10 ай бұрын

@@jimbocho660but these questions seem way easier than ones that need deeper insight into mathematical revelations. I guess it depends on the company

@AniltonNeto 3 жыл бұрын

13:04 is bad, cuz the result for the division is undefined, in this case, you change NULLIF(field, 1) instead :-P and filter zero values :)

@StraightCrossing 3 жыл бұрын

I would prefer to filter the data so there just isn't null or 0 with WHERE message_recieved > 0

@vvalk2vvalk 3 жыл бұрын

@@StraightCrossing My thoughts exactly.

@orangethemeow 3 жыл бұрын

@@StraightCrossing Same. Then we don't have to worry about those 0s

@naraendrareddy273 2 жыл бұрын

WTF? I didn't know they would go so deep into statistics. Multivariate regression? Derive the Beta coefficient? Wow, I'm stumped right at the beginning. :(

@vvalk2vvalk 3 жыл бұрын

Thank you for the video. Pretty informative. This shows imposter syndrome is real. I do understand that there were follow-up interviews and further rounds, but it does give much more confidence, given that it is a SENIOR interview at FACEBOOK. I am now actually considering to try out Data Scientist path some time in the future.

@DataInterview 3 жыл бұрын

A lot of people have imposter syndrome to some degree even those with many years of industry experience. I've been a data scientist for 5 years (2 years non-tech and 3 years in tech), and I still experience the syndrome at times. But, over time, you experience it less as you gain more experience.

@naraendrareddy273 2 жыл бұрын

Oh thank you for letting me know this is a senior DS interview. I'm trying to become a junior DS first.

@genuinebasilnt 3 жыл бұрын

I read the title *A Facebook data scientist mocks interviews*

@maddoo23 2 жыл бұрын

Um, the expression for beta is wrong (first question). its - beta = (X'X)^(-1)X'Y

@hotmilkritata 2 жыл бұрын

Like the stat questions

@mehmetedex 3 жыл бұрын

her keyboard I imagine made of keys made of ten inch springs with wooden top :D

@LouisChiaki 3 жыл бұрын

It sounds like a very expensive mechanical keyboard!

@dsgarden 3 жыл бұрын

Dude you need a shave asap, will make you 1000 yr younger

@toshb1384 3 жыл бұрын

3:15 - isn’t (X’X)^(-1)X’y derived from the maximum likelihood estimate? I thought the correct answer would be stochastic gradient descent.

@DataInterview 3 жыл бұрын

The equation is the Least Squares Method that provides an unbiased estimation of a regression parameter. Maximum likelihood estimation is a different parameter estimation technique that maximizes the likelihood of a model given data. You can use SGD to run MLE. But, unlike the least-squares method, the maximum likelihood estimation does not always lead to an unbiased estimation of a regression parameter.

@DataInterview 3 жыл бұрын

Hope this clarifies your question :)

@toshb1384 3 жыл бұрын

@@DataInterview thanks for the response. I guess what I’m trying to say: isn’t MLE the same thing as least squares? You can derive the least squares solution directly from maximum likelihood estimation, and you get the same solution. stats.stackexchange.com/questions/143705/maximum-likelihood-method-vs-least-squares-method

@mrblahblihblih 3 жыл бұрын

yeah I think that's true to use SGD for non closed form, but deriving the beta coefficients from least squares and from MLE should actually give you the same, (X’X)^(-1)X’y

@DataInterview 3 жыл бұрын

@Tosh B and @wjyu_, thanks for the comments. It's actually not correct to assume that the MLE is the same as the Least Squares Method. Even the author of the StackOverFlow comment notes that it's "equivalent" under certain conditions. That is, it provides the same solution under certain conditions (in this case the estimation of parameters in a linear model). But, just because the solutions are the same, it doesn't mean that the methods are the same. The Least Squares Method minimizes the distance between the target and projection vectors with no stochastic assumptions. MLE, on the other hand, estimates parameters by maximizing a likelihood function such that the observed data is most likely. Additionally, the Least Squares Method leads to unbiased estimations of model parameters. However, MLE can sometimes lead to biased estimations.

@pvss2000 3 жыл бұрын

For the influencer versus non-influencers, could you do something where first you identify those who actually have content that has products that are being 'advertised', then you correlate the presence/views of that video with sales of that product. If correlation reaches above a certain point then they are an influencer.

@pal999 3 жыл бұрын

It would be helpful to post the correct answers at some point in the future

@konataizumi5829 3 жыл бұрын

They never do. It sucks.

@ПавлоСкляр-д6т 3 жыл бұрын

@@konataizumi5829 at least some kind of grade for the interviewee would be pretty informative

@kristofmeszaros4924 3 жыл бұрын

I may be wrong but the sql question seemed pretty straightforward, shouldn't the solution just be select m.user_name , m.date, max(m.message_sent/m.message_received) as "Ratio" from Messages as m where m.message_received > 0 group by m.user_name , m.date order by Rate desc

@huzuvettin 3 жыл бұрын

@@kristofmeszaros4924Looks much better to me tbh. Except the "rate" should be "ratio" as aliased earlier right?

@huzuvettin 3 жыл бұрын

Exactly, I get that even watching this interviews applies somewhat knowledge to us but without the correct answers what values should we take as ground truth table am I right B-)

@HardawayLong Жыл бұрын

The first question, Do we really need a maximum likelihood estimate to deal with getting beta coefficients for regression problem? I think it will only been used in classification, right? Will gradient descent be the correct answer?

@ecotrix132 9 ай бұрын

OLS, MLE , Grandient descent are different ways

@ASOT666 2 жыл бұрын

amazing, super helpful!

@phyrajkumarverma4412 Жыл бұрын

Hi, I also want to give my mock interview. Could you take it please? I am doing my graduation and currently, I am in 3rd year of computer science. I want to be good in data science

@adamdreier 3 жыл бұрын

That function in JavaScript is annoying me, please use ES6 arrow function for binding.

@oliesting4921 3 жыл бұрын

Hardly see anything...dark and font too small

@xiaowenkang9598 5 ай бұрын

👍thank you so much

@oaasal 3 жыл бұрын

Is that a junior level interview?

@DataInterview 3 жыл бұрын

Senior

@oaasal 3 жыл бұрын

@@DataInterview That sounds easier than I thought. Maybe I should change my job.

@DataInterview 3 жыл бұрын

@@oaasal Doesn’t hurt :) Do note that this emulates a phone screening. On-site and case studies are another set of challenges. Often much more challenging than phone screening. If you are Interested in prep content, make sure to check out www.topds.io

@orangethemeow 3 жыл бұрын

@@DataInterview This domain doesn't exist anymore :(

@DataInterview 3 жыл бұрын

@@orangethemeow Go to datainterview.com

@cooldudesheks 3 жыл бұрын

Thanks for such an insightful content! I have a clarification question on 3rd stat problem. You asked if log-odds i.e. logit value can be 0 or not. Since the logit scale is -infinity to +infinity, log-odds can have 0 values dont they? She answered cannot have 0 but minimum of 1. I would appreciate if you can clarify if that was the right answer or I am missing something here. Thanks again! 👍

@neethualan6543 3 жыл бұрын

Interviewer asked what if CI of log odds contains zero. However answer was based on odds=1 (there is no association between independent and dependent variable). When log odds = 0 then there is no statistical significance. Answer is correct as odds = 0, means log odds = 1. Then either question or answer should’ve been more clear.

@ariss3304 3 жыл бұрын

I’m going with a reverse engineering path into college, please tell me I don’t have to learn these things.

@ariss3304 3 жыл бұрын

Specifically the beta coefficient part

@DataInterview 3 жыл бұрын

If someone asked me these topics when I started learning stats 7 years ago, I would have been frightened myself. But, years of diligent studies, and working in multiple DS jobs helped me develop confidence. I'm confident you will go through a similar experience as well. Here's a video with a commentary that should provide a more "gentle" introduction to the interview in DS: kzbin.info/www/bejne/oqXLc56Kg52Jps0&lc=UgxNhFz1NZlAPWn1pI54AaABAg

@jeremythompson-seyon5463 3 жыл бұрын

Where do I start if I want to learn the skills needed to go into data science? I just started a statistics class and Ive been really interested in the modeling and practical applications. I only barely understand the basics of R and SQL to give you an idea of where my knowledge is. Thanks for the video

@DataInterview 3 жыл бұрын

I would start with Kaggle. Emulate worked out examples provided by the community members on the site.

@caremacosta1435 3 жыл бұрын

Depends on the knowledge you already have. Learn python Highschool math is enough I would recommend the book machine learning for absolute beginners, is not that long and it summirizes basic concepts very well Keagke has many courses, for you to learn basics on machine learning, SQL, data visualization etc And practice practice practice

@superfreiheit1 3 жыл бұрын

Cant see anything to small. Zoom in

@DataInterview 3 жыл бұрын

Thanks Joe. Duly noted for the next video.

@superfreiheit1 3 жыл бұрын

@@DataInterview Can you see something on the video?

@zakarie 2 жыл бұрын

Actually the confidence interval is not interpreted as the chance that true value falls in the interval but the accurate interpretation should be there is 95% probability that the random interval falls on the true value.

@sirongzeng4096 3 жыл бұрын

Anyone want to come and join a group of mock interview for data analyst? I'm looking for people to mock together, in aspects of coding, behavioral questions, and resume. Thanks!

@sirongzeng4096 3 жыл бұрын

Or there is a group or slack channel, please let me know! Thanks!

@md.imrulhasan8757 2 жыл бұрын

I want to join... Can you please include me ?

@miraarora8142 3 жыл бұрын

solution for 1st SQL Question: select t_date, user_name from messages where message_received != 0 group by t_date, user_name order by sum(message_sent)::float/sum(message_received) desc limit 1;

@srk312 3 жыл бұрын

for each day...one row..not one row overall

@CommentaryCentral 3 жыл бұрын

This is the sort of stuff we covered on the Msc Data Science course in the UK, I cant believe its a senior level interview

@Cooldownman197 3 жыл бұрын

You just covered not implemented

@DataInterview 3 жыл бұрын

Some of these are covered in undergraduate level as well but, when you are under pressure, and a breath of things are covered in 30 minutes, it’s a much different experience than getting a homework assignment.

@CommentaryCentral 3 жыл бұрын

@@DataInterview yeh im sure you are right

@tuanseattle 2 жыл бұрын

"Covered" does not mean it is not hard because people forget stuffs that they do not use frequently. University covers a lot of stuffs

@ipvikas 2 жыл бұрын

Sql#1: Correct MYSQL query is: select user_name, ROW_NUMBER() over () as 'Rank' from Messages window w as (partition by date order by message_sent/message_received desc)

@md.imrulhasan8757 2 жыл бұрын

done

@sourabhsharma9830 3 жыл бұрын

That is not the confidence interval, that is credible interval. Confidence interval means 95 % of the time the estimated beta coefficient will predicted the correct result which “y”. To get 95 % confidence of beta coefficient we need to use Bayes parameter estimation which will give you a posterior distribution of beta coefficient with 95% credible interval.

@MJ-pz6tn 2 жыл бұрын

@sourabhsharma9830 2 жыл бұрын

@@MJ-pz6tn yes

@beaglesnlove580 3 жыл бұрын

Lol these questions are a joke. I broke into fb. Least squires, MLE or gradient descent. Ans: Logistic regression, or something classsifier. Confidence intervals-these are estimates of regression variables. Presence of 0, u have to do t-test on the individual variable