Hard SQL Interview Question From FACEBOOK | Data Science Coding Interviews (Popularity Percentage)

  Рет қаралды 26,817

StrataScratch

StrataScratch

Күн бұрын

This SQL data science interview question was asked by Facebook. I’ll cover both the question and answer and give a detailed explanation of the approach. I walkthrough each step of my answer, assumptions, approach, and explain every line of code I write. This is literally how I would answer every data science interview question and prepare for every data science interview at FAANG companies and others.
Link to question: platform.stratascratch.com/co...
This question is marked as hard from Facebook. The question involves manipulating your datasets so that you first are calculating total number of users in the table using a UNION of the two columns. You’ll then calculate the number of friends a user has by also using another UNION. These two queries become subqueries and you’ll be using a SQL JOIN ON 1=1. These concepts are what makes the question hard. Once you have the two SQL subqueries, you can implement the percentage formula.. This question covers concepts that are commonly found in data science interviews at Facebook and Google.
______________________________________________________________________
This series is for both beginner and intermediate data scientists and analysts interested in learning how to solve common data science interview questions in SQL. These are real data science interview questions. For some background context and an intro about what this series is about: bit.ly/36kKbxG
👉 Subscribe to my channel: bit.ly/2GsFxmA
👉 Playlist for more data science interview questions and answers: bit.ly/3jifw81
👉 Playlist for data science interview tips: bit.ly/2G5hNoJ
👉 Practice more real data science interview questions: platform.stratascratch.com/co...
______________________________________________________________________
Timestamps:
Intro: (0:00)
Interview Question: (0:11)
Exploring The Datasets: (0:44)
Developing The Framework For The Solution: (1:09)
Coding The Solution (Total Number Users On Platform): (4:21)
Coding The Solution (Total Friends): (7:03)
The Trick!: (8:00)
Coding The Solution (Percentage): (9:50)
Trick 2! JOINING 1=1: (10:40)
______________________________________________________________________
About The Platform:
I'm using StrataScratch, a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.
I created this platform because I wanted to build a resource to specifically help prepare data scientists for their technical interviews and to generally improve their analytical skills. Over my career as a data scientist, I never was able to find a dedicated platform for data science interview prep. LeetCode and HackerRank were the closest but these platforms specifically serve the computer developer community so their questions focus more on algorithms that working with data.
So, if you want more interview practice with real data science interview questions, visit platform.stratascratch.com/co.... All questions are free and you can even execute SQL and python code in the IDE, but if you want to check out the solutions from me or from other users, you can use ss15 for a 15% discount on the premium plans.
______________________________________________________________________
Contact:
If you have any questions, comments, or feedback, please leave them here!
Feel free to also email me at nathan@stratascratch.com
______________________________________________________________________

Пікірлер: 83
@tanvimehta7960
@tanvimehta7960 2 жыл бұрын
That's insightful. Using a CTE helped me shorten the query. Also when using UNION, you get rid of duplicates with the individual tables as well. So using distinct is not required. with total as ( select user1, user2 from facebook_friends union select user2, user1 from facebook_friends) select distinct user1, cast(count(user2) over (partition by user1) as float)*100/(select count(distinct user1) from total) from total order by user1;
@dwaipayansaha4443
@dwaipayansaha4443 Жыл бұрын
This is one of the best union combined with cte problem I have come across My solution:- with t1 as (select * from facebook_friends union all select user2,user1 from facebook_friends), t2 as (select user1, count(user2) no_friends from t1 group by user1 order by user1), t3 as (select user1, no_friends,count(*) over() total from t2) select user1,(no_friends/total)*100.0 pop_per from t3
@rashvinganesh
@rashvinganesh 3 жыл бұрын
Thank you so much Nate. Find it very easy to understand what’s the solution you’re taking. As a beginner in SQL this really helps me understand more clear.
@stratascratch
@stratascratch 3 жыл бұрын
That's great. Hope you enjoy the entire series of SQL problems then. Let me know if you have any questions or feedback.
@joaopedroreissilva7075
@joaopedroreissilva7075 2 жыл бұрын
Thank you, Nate. Really good approach.
@caiyu538
@caiyu538 2 жыл бұрын
Great solution, without your explanation, it is hard to figure it out. Thank you.
@avanichheda
@avanichheda 2 жыл бұрын
Great video and content! I was able to use count(user2) OVER (Partition by user1) after doing the union to get to the friends of the users. Was able to confirm by submitting on the platform too :)
@okygy2125
@okygy2125 2 жыл бұрын
Excellent!
@gagansingh3481
@gagansingh3481 Жыл бұрын
Awesome dear you did a great explanation kindly make more videos like this 😀
@stratascratch
@stratascratch Жыл бұрын
Thank you. I am glad you find it very helpful.
@StanleySI
@StanleySI 2 жыл бұрын
I found this platform is very helpful because it allows me to run code and let me see result step by step. In this way, I can better understand what I'm doing and where I got wrong.
@stratascratch
@stratascratch 2 жыл бұрын
That's for a great testimonial =)
@finvestomate
@finvestomate 2 жыл бұрын
Thank you for the insights. I also tried doing it in similar manner, with a shorter query select user1, Round((Friends/max(users_id) over())*100,3) as PP from (select *, count(user1) as Friends, row_number() over(order by user1) as users_id from (select * from facebook_friends UNION ALL select user2,user1 from facebook_friends) as A group by user1 order by user1) as B;
@yashovardhan9841
@yashovardhan9841 3 жыл бұрын
My approach - with temp as ( select user1 as u, count(*) as tot_friends from facebook_friends group by 1 union all select user2 as u, count(*) as tot_friends from facebook_friends group by 1 ) select u, (sum(tot_friends)/(select count(distinct u) from temp)::float)*100 as popularity_percentage from temp group by 1 order by 1
@meghasyam427
@meghasyam427 2 жыл бұрын
Tricky question but in reality all the users even in the user2 column should be present in the user1 column too. This is the basis of a user table. This question is all about what if all the users are not present in user1 column, then your code is the ultimate solution. Hatsoff to your thinking which considers the worst case i didn't expect.
@stratascratch
@stratascratch 2 жыл бұрын
I agree that if this was a user table then you'd have all the users in the user1 column. But what if this was a mapping table? That assumption might not be valid. It's always good to ask the interview to clarify any assumptions. I always assume the worse case =) Thanks for watching and for the great detailed comment.
@shyamsundark1556
@shyamsundark1556 Жыл бұрын
Exactly ! the only catch in this Q is the fact that they dont explicitly state the structure of the data.
@Hotobu
@Hotobu Жыл бұрын
This is a "hard" question? This is SUPER easy!
@stratascratch
@stratascratch Жыл бұрын
You are a cut above the rest.
@thepogchamp7783
@thepogchamp7783 2 жыл бұрын
Hey Nate! Thanks for the video - they are really insightful, and helps a lot when I do the questions! Quick question - is there a reason to join the tuu value? What I did was just to put it as the denominator in the select statement, which I thought should save some time since you don't have to join. Thanks!
@stratascratch
@stratascratch 2 жыл бұрын
I think it was just to ensure that I've captured all users since there is a user1 and user2. I would validate your approach with the platform. We capture most use cases/edge cases on the platform, so if your thinking is right, your solution should validate correctly.
@Soulfulreader786
@Soulfulreader786 Жыл бұрын
my solution with cte as (select user1,user2 from facebook_friends union select user2,user1 from facebook_friends), cte2 as (select distinct user1,count(user2) over(partition by user1) as total_friends from cte) select user1,total_friends/9.0*100 as perc_pop from cte2;
@rithikgaur3830
@rithikgaur3830 3 жыл бұрын
Thank you so much for videos and stratscratch , i had general query , as in the above video unless we see tha data and if we only go with the schema mentioned we tend to believe that user 1 column has all users already , so in a real facebook or top company interviews do we actually have the data available for us or do we actually have to ask the interviewer on the limitations on data set .
@stratascratch
@stratascratch 3 жыл бұрын
In my experience, you won't have the data so you need to ask the interviewer what the data set looks like. Basically what are some of the limitations and/or example values. Then you can start to develop your solution and identify edge cases.
@danielxing1034
@danielxing1034 2 жыл бұрын
Great walkthrough! But still think the number of friend of each user should not be computed using union because A friends with B does not mean B friends with A. Those who only show up in the second column should be considered having no friends.
@iceindia007
@iceindia007 2 жыл бұрын
u can simply use something like this: select a.* , a.i/(sum(a.i) over()) as aasd from ( select requester_id as id, count(accepter_id ) as i from request_accepted group by 1 union select accepter_id as id , count(requester_id) as i from request_accepted group by 1 ) a order by 3 desc
@poopah4497
@poopah4497 2 жыл бұрын
Hey Nate , I must over-simplify it. Why I cannot use a sub query to calculate total user and window function to calculate friends per user. Select user1, (select count(distinct user2) from xx) as totalususer, count(distinct user2 ) over (partition by user1) from …
@stratascratch
@stratascratch 2 жыл бұрын
You can probably use a subquery to solve it. Give it a try on the platform to see if you get the output you want
@sandeepvenkatasairam9922
@sandeepvenkatasairam9922 3 жыл бұрын
Thanks Nate for keeping us educated ! Just a logical Question I had , Since you are saying user1 is the user and user2 is their friend list , I hope even the friend should be a user on Platform , So can we cant consider select distinct user1 from facebook_friends as total users , Correct me if I am missing something .
@stratascratch
@stratascratch 3 жыл бұрын
That's a great assumption. That's something you'd want to talk to the interviewer about to see if your assumption makes sense. Usually when it's captured on a db table then the user's friend should also be on the platform. Why is it that you can't consider distinct user1 from the table?
@ismafoot11
@ismafoot11 3 жыл бұрын
if you use "union" you don't need distinct before user1 and user2. union will do that for you
@stratascratch
@stratascratch 3 жыл бұрын
Yup that's true! I just added the DISTINCT to be explicit.
@TheRaju991
@TheRaju991 2 жыл бұрын
This was uniquely tough to think through.
@chandraprakash0211
@chandraprakash0211 2 жыл бұрын
Hey Nate ! Can we use self join and not union for this problem ?
@stratascratch
@stratascratch 2 жыл бұрын
Yes! There are many ways to solve the problem. You can even try your self join approach on the platfrom. There's a link in the description.
@ghislaineamrani2775
@ghislaineamrani2775 3 жыл бұрын
Hi Nate, thank you ! This solution below seems to work also but I used WITH instead of subqueries, is it an issue ? Thank you for your feedback ! WITH total_users as ( select distinct user as all_users from facebook_friends union select distinct friend from facebook_friends ), consolidated_table as( select all_users, count(user) as number_user, (select count(*) from total_users) as total_u from total_users cross join facebook_friends where all_users = user OR all_users = friend group by 1 ) select all_users, (cast(number_user as float)/cast(total_u as float))*100 as percent from consolidated_table;
@stratascratch
@stratascratch 3 жыл бұрын
It's not at all an issue if you used a CTE (i.e., using a WITH) instead of a subquery. The reason why I used a subquery is because some SQL flavors don't allow using WITH so I wanted to write the solution so that the code can be used on any SQL flavor. Thanks for watching and more thanks for trying out this question!
@SudhirKumar-rl4wt
@SudhirKumar-rl4wt 2 жыл бұрын
I am not sure if we need to use DISTINCT if we are using union already
@AmanSingh-od2ue
@AmanSingh-od2ue 3 жыл бұрын
Hey Nate, I do have concern from the top query where the FROM clause ends at total_unique_users. Could you please explain as to why it is ending there and why not at the end where the LEFT JOIN is completed?
@stratascratch
@stratascratch 3 жыл бұрын
It's because the total_unique_users is the 1st subquery and then I do another subquery called tuu that uses total_unique_users. I use tuu to join to user_friends. Once I join those 2 subqueries, I take the aggregate count. So the main reason why it ends at tuu is because I'm doing a subquery within a subquery. Hope that makes sense. If you want to troubleshoot, I would try to see how the data is aggregated at each level of subquery. Then write more code and see how the data is aggregated again. Keep doing this until the entire query is built. That should help you understand what's going on.
@AmanSingh-od2ue
@AmanSingh-od2ue 3 жыл бұрын
@@stratascratch Thank you. I got your point, Nate.
@Baddiemuan4real
@Baddiemuan4real 3 жыл бұрын
Hi Nate at Strata Scratch pls can you RECOMMEND a platform where I can study ADVANCE SQL. Thanks in advance.
@stratascratch
@stratascratch 3 жыл бұрын
For advanced level SQL, I would recommend leetcode, stratascratch, interviewquery, sqlpad.io as great platforms that will give you some advanced level skills. If you need something even more difficult then I would suggest doing real projects to improved. Real projects tend to have much longer sql scripts that are necessary to be written.
@radhikashroff2643
@radhikashroff2643 2 жыл бұрын
Hi Nate, To calculate number of friends each user has, can we do this ? SELECT user1, COUNT(user2) FROM facebookfriends GROUP BY user1 so this would group all the friends of each user in user1 since right column are the friends right ?
@stratascratch
@stratascratch 2 жыл бұрын
Yes, so long as all users are found in the user1 column. But our assumption is that not all users are found in user1. So the question is slightly more complicated.
@GouthamSudini
@GouthamSudini 2 жыл бұрын
@@stratascratch wouldn't user1 carry all users based on the table structure? How can a user2 be a friend without being a user in 1st place?
@ohh_nina_nyc
@ohh_nina_nyc 3 жыл бұрын
Hi, why do you use MAX() to get all _users number? Can we just divide by tuu.all_users?
@stratascratch
@stratascratch 3 жыл бұрын
Actually, yes you could just use all_users since it's the count() and you'll just be given 1 number anyways. good catch.
@celiahan3787
@celiahan3787 3 жыл бұрын
since you are doing union to get all the users. So how do you think the user1 column did not contain all the users and we need to do the union?
@stratascratch
@stratascratch 3 жыл бұрын
I thought about edge cases where Friend A might be friends with Friend B, but Friend B might not be friends with A. Many platforms are like this (FB might not be one of them but I know IG is) so dong the UNION takes care of that. You can always ask the interviewer what they think. I don't think there's a right or wrong answer here since you have to assume certain things for you to end up coding a solution.
@celiahan3787
@celiahan3787 3 жыл бұрын
@@stratascratch Thank you so much, I got you. You are talking about the followers on IG. Yes, if in that case, it makes more sense
@96satenik
@96satenik 2 жыл бұрын
Can total number of users be defined as Max(users1) ?
@vijayjayaraman5990
@vijayjayaraman5990 2 жыл бұрын
While I agree with the solution, why wouldn't Bob be a user1 in the original table? That doesn't sound intuitive at all.. I would have done the SELECT user1, count(distinct user2) right away because user1 was supposed to have all the users on Facebook..
@stratascratch
@stratascratch 2 жыл бұрын
I didn't have the assumption that user1 should have all the users on FB. I could make that case if it was a user table (like a copy from production) but if it's a mapping table or some other type of table used for analytics, I wouldn't feel comfortable making that assumption without asking the interviewer first (or checking it if I was at work).
@nanditha4477
@nanditha4477 2 жыл бұрын
Hi Nate ..Can u please have a Sql session once. Can u please ping me if thats possible
@123admini2r
@123admini2r 3 жыл бұрын
damn, understanding the data provided is the tough part
@stratascratch
@stratascratch 3 жыл бұрын
To me it's the data schema that's hard because there's so many assumptions you'll need to make based on the structure of the table. And then once you have an understanding of that, the values for each column can add complexity. So this part takes a lot of time when on an interview.
@philintheblank4360
@philintheblank4360 3 жыл бұрын
it seems there's an assumtption that person a can be friends with person b, but person b is not friends with person a... I don't think that's how this platform or friendship works. Clarifying a simplifying assumption with the interviewer would streamline a lot of this.
@stratascratch
@stratascratch 3 жыл бұрын
Yea you're right. There are some assumptions that are needed to solve the question the right way. And you should talk to the interviewer to get the assumptions clarified. Even the question here has changed a bit since I created this video. Thanks for pointing this out.
@techiewithcamera
@techiewithcamera Жыл бұрын
My Solution: with total_population(pop_cnt) as (select count(user) from (select distinct user1 as user from facebook_friends union select distinct user2 as user from facebook_friends)x), total_pairs as (select distinct user1 as frnd1, user2 as frnd2 from facebook_friends union select distinct user2 as frnd1, user1 as frnd2 from facebook_friends) select frnd1, count(distinct frnd2)/max(pop_cnt)::float*100 as popularity_percentage from total_pairs, total_population group by 1;
@saikatdasgupta2006
@saikatdasgupta2006 3 жыл бұрын
Another solution select user1, (b.sum1/c.cnt)*100 as popularity_percent from (select count(distinct user1) as cnt from (select distinct user1 from facebook_friends union select distinct user2 from facebook_friends) d) c, ( select user1, sum(cnt) sum1 from (select user1,count(*) cnt from facebook_friends group by user1 union select user2,count(*) from facebook_friends group by user2) a group by user1) b order by user1
Advanced Facebook Data Science SQL interview question [RANK()]
24:27
когда повзрослела // EVA mash
00:40
EVA mash
Рет қаралды 1,1 МЛН
孩子多的烦恼?#火影忍者 #家庭 #佐助
00:31
火影忍者一家
Рет қаралды 2,1 МЛН
MEU IRMÃO FICOU FAMOSO
00:52
Matheus Kriwat
Рет қаралды 30 МЛН
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 17 МЛН
Tricky Data Science Interview Question [By Facebook]
12:26
StrataScratch
Рет қаралды 5 М.
Crack the Facebook Data Scientist Interview
29:42
DataInterview
Рет қаралды 19 М.
SQL Case Statements For Data Science Interviews in 2021
14:44
StrataScratch
Рет қаралды 45 М.
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 79 М.
когда повзрослела // EVA mash
00:40
EVA mash
Рет қаралды 1,1 МЛН