Top Data Science Interview Questions in 2021 [3 concepts tested on all interviews]

  Рет қаралды 6,837

StrataScratch

StrataScratch

Күн бұрын

Пікірлер: 27
@debjyotiroy842
@debjyotiroy842 3 жыл бұрын
You are amazing Nate. Your videos are really insightful. God bless ya :)
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching the vids! Glad you find it useful.
@vastavtailwal
@vastavtailwal Жыл бұрын
At 9:20 can we use " HAVING sum(non_paying) > sum(paying) " after group by clause. We can also use pivot table.
@stratascratch
@stratascratch Жыл бұрын
test it out and see if you get the same output. I think you will though.
@amberyang802
@amberyang802 3 жыл бұрын
Why we need subquery? Can we just use HAVING statement right after the GROUP BY statement? (sort by ascending date is default also.)
@stratascratch
@stratascratch 3 жыл бұрын
The HAVING is to evaluate non-paying and paying customers which were evaluated using a CASE statement. You'd need to add the CASE statement to the HAVING clause, which I don't think would work. There's a link to the question in the description that will bring you to the platform I was using to code. Give it a try and let me know. But I think you'd need to create a subquery or CTE first and then apply the HAVING.
@amberyang802
@amberyang802 3 жыл бұрын
@@stratascratch i did try and it didn't work. i am confused why HAVING clause is not working on the aggregated function. for example - select date, sum(non_pay_dl) as non, sum(yes_pay_dl) as yes from ( select date , case when paying_customer='no' then downloads end as non_pay_dl, case when paying_customer='yes' then downloads end as yes_pay_dl from ms_user_dimension a left join ms_acc_dimension b on a.acc_id=b.acc_id left join ms_download_facts c on a.user_id=c.user_id ) w group by date order by date having sum(non_pay_dl)>5 -- having sum(non_pay_dl) >sum(yes_pay_dl) ;
@stratascratch
@stratascratch 3 жыл бұрын
@@amberyang802 OK I'd have to take a look closer. Can you ask this question in the forum provided in the question? My team and I would take a look and you'd get a reply back much faster! Thanks!
@joaopedroreissilva7075
@joaopedroreissilva7075 3 жыл бұрын
Amazing! Thank you, Nate!
@yingqixu7002
@yingqixu7002 3 жыл бұрын
Thanks for posting these great videos! I have a quick question. Why do we need to add GROUP BY t.date t.paying and t.nonn_paying in the end? Thanks!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching! To answer your question, it's because t.paying and t.non_paying are aggregates (sum()) so we need a GROUP BY since date is not an aggregate.
@yingqixu7002
@yingqixu7002 3 жыл бұрын
@@stratascratch Sorry I am still confused. We group by date in the subquery t and the three columns in t are all at aggregate level already. So I do not see the reason to do group by in the main query. Please let me know where I am wrong. Thanks!
@stratascratch
@stratascratch 3 жыл бұрын
@@yingqixu7002 Oh sorry I get what you mean. I am de-duplicating the rows with that 2nd group by. In theory, you don't really need to do that unless you want to get rid of dups.
@chinmaypanchal3654
@chinmaypanchal3654 3 жыл бұрын
Thanks Nate for a good solution overview and just a quick query was ORDER BY Date necessary in the subquery or we can avoid it?
@stratascratch
@stratascratch 3 жыл бұрын
You can avoid the ORDER BY in the subquery in the final solution. It was just there to help explain a concept.
@chinmaypanchal3654
@chinmaypanchal3654 3 жыл бұрын
@@stratascratch Thanks a lot Nate for your quick reply :)
@rameespudussery2700
@rameespudussery2700 3 жыл бұрын
Hi Nate. Can't we use where clause instead of group by and having in last step
@stratascratch
@stratascratch 3 жыл бұрын
Yup, you can! And it's probably more efficient to do so also. Great catch.
@noobshady
@noobshady 2 жыл бұрын
I used where in the outer query and I have the same result select * from( select date, sum(case when paying_customer = 'yes' then downloads end) as paying, sum(case when paying_customer ='no' then downloads end) as non_paying from ms_user_dimension a left join ms_acc_dimension b on a.acc_id=b.acc_id left join ms_download_facts c on a.user_id=c.user_id group by date order by date asc) as t where non_paying > paying
@ayeoh47
@ayeoh47 2 жыл бұрын
Why cant you just do a WHERE clause nonpaying > paying easier
@stratascratch
@stratascratch 2 жыл бұрын
Because nonpaying and paying are derived in the SELECT clause and executed at the same time as the WHERE clause so it won't know what nonpaying and paying are. HAVING is executed later so the nonpaying and paying variables have already been created. That's the reason why i chose to use it in the HAVING clause.
@ramumandava8813
@ramumandava8813 3 жыл бұрын
Thank you Nate. 🙂
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching! Let me know if you have any feedback.
@classkori5507
@classkori5507 3 жыл бұрын
Excellent
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching.
@mysteriousbd3743
@mysteriousbd3743 3 жыл бұрын
Sir, Thank You So Much
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching. Let me know if you have any topic ideas.
Tricky Data Science Interview Question [By Facebook]
12:26
StrataScratch
Рет қаралды 5 М.
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН
SQL Case Statements For Data Science Interviews in 2021
14:44
StrataScratch
Рет қаралды 45 М.
Top 6 Coding Interview Concepts (Data Structures & Algorithms)
10:51
Coding Interviews Be Like
5:31
Nicholas T.
Рет қаралды 6 МЛН
Python Coding Interview Tips for Data Scientists
12:39
Emma Ding
Рет қаралды 84 М.
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 82 М.
Advanced Facebook Data Science SQL interview question [RANK()]
24:27