Advanced Data Science SQL Interview Question [Amazon] (window functions & aliasing)

  Рет қаралды 32,692

StrataScratch

StrataScratch

Күн бұрын

This SQL data science interview question was asked by Amazon and will test your date manipulation and window function skills. I’ll cover both the question and walk you through the approach. I’ll also talk about my 4 step approach to solving any data science interview question. This is literally how I would answer every data science interview question and prepare for every data science interview at FAANG companies and others.
Link to the question: platform.stratascratch.com/co...
______________________________________________________________________
This series is for both beginner and intermediate data scientists and analysts interested in learning how to solve common data science interview questions in SQL. These are real data science interview questions. For some background context and an intro about what this series is about: bit.ly/36kKbxG
👉 Subscribe to my channel: bit.ly/2GsFxmA
👉 Playlist for more data science interview questions and answers: bit.ly/3jifw81
👉 Playlist for data science interview tips: bit.ly/2G5hNoJ
👉 Practice more real data science interview questions: platform.stratascratch.com/co...
______________________________________________________________________
Timeline:
Intro: (0:00)
Question: (1:18)
4-Step framework to solve the question: (2:02)
Explore underlying data: (2:18)
Identify required columns: (3:45)
Visualize the output: (4:12)
Build solution step-by-step and test: (4:44)
Coding: (4:57)
Format data to YYYY-MM: (6:27)
Calculate current month's revenue: (7:46)
Calculate previous month's revenue: (8:36)
Aggregate to year-month: (10:25)
Implement month-over-month difference formula: (11:26)
Apply a window alias: (13:15)
Clean up formatting: (14:06)
Conclusion: (14:50)
______________________________________________________________________
About The Platform:
I'm using StrataScratch (platform.stratascratch.com/co..., a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.
______________________________________________________________________
Contact:
If you have any questions, comments, or feedback, please leave them here!
Feel free to also email me at nathan@stratascratch.com
______________________________________________________________________
#datascience​interview #sqlinterviews

Пікірлер: 114
@tamaboy311
@tamaboy311 11 ай бұрын
I love SQL. It's almost like a game.
@stratascratch
@stratascratch 11 ай бұрын
I always thought of them as puzzles!
@bandhammanikanta6302
@bandhammanikanta6302 2 жыл бұрын
The way you explain is excellent, Nate. Keep it up, please. Thank you :)
@aztxaz
@aztxaz 2 жыл бұрын
This is an awesome video! I learned something new. Would love to see more videos like this.
@KumarSameer
@KumarSameer 2 жыл бұрын
Thanks for breaking it down in this great detail. It really gives me a structured approach to solving interview SQL questions!
@stratascratch
@stratascratch Жыл бұрын
Very welcome!
@cabaymau5132
@cabaymau5132 2 жыл бұрын
This content is absolutely underrated, why didn't I find you sooner :< Pls do more topics on SQL
@aleixmp95
@aleixmp95 2 жыл бұрын
This is a really good video. Very clear and easy to understand. Thank you !
@ashutoshsingh5568
@ashutoshsingh5568 2 жыл бұрын
What a great and simple explanation it was. You’re a great teacher. I hope you will teach more important topics for SQL. I appreciate your hard work. Thanks
@vibekdutta6539
@vibekdutta6539 2 жыл бұрын
Solid explanation, looking forward to some more awesome contents, thanks a lot
@azadehmshchn4114
@azadehmshchn4114 2 жыл бұрын
speechless! brilliant!
@Seikenguy
@Seikenguy 11 ай бұрын
This was a "Hard" question on stratascratch, but you explained it in a way that one might think it's elementary. Since I've had experiences with teachers explaining these concepts where I understood virtually nothing, I think you're a really great teacher. Thank you!
@stratascratch
@stratascratch 11 ай бұрын
Glad it was helpful!
@shalinigarg859
@shalinigarg859 2 жыл бұрын
Nate! Your efforts are much appreciated 👍 Thank you so much for uploading these helpful videos 🔥 You are the best teacher in SQL series. Love from India 😊
@stratascratch
@stratascratch 2 жыл бұрын
Thank you for watching!
@nareshvadlamudi
@nareshvadlamudi 2 жыл бұрын
thanks a lot.! great example to cover many topics!
@mauriciovarela2337
@mauriciovarela2337 3 жыл бұрын
Thank you Nathan, well explained and easy to follow. cheers
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching and more to come!
@marinamondadorigessinger6390
@marinamondadorigessinger6390 2 жыл бұрын
Wow! you are the best!
@someguy9164
@someguy9164 3 жыл бұрын
This was great, very helpful. Thanks!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching!
@rollinas1
@rollinas1 3 жыл бұрын
I'm easy, I see Nate's vid, I click on it.
@stratascratch
@stratascratch 3 жыл бұрын
haha thanks man. That comment made my day! Glad I'm providing something useful. Wait for the next series to come out next week! I got some new new interview questions from various companies and have been analyzing the concepts they're testing for. Should be useful to help us all focus on just the relevant skills needed.
@prathameshmahankal4180
@prathameshmahankal4180 3 жыл бұрын
Such an elegant solution!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching!
@rick2591
@rick2591 2 жыл бұрын
I never knew about a window alias. That is cool. PostgreSQL has a lot of cool features. embedding an update within a with clause allows one to enter and leave a critical section within one SQL statement....
@bandhammanikanta6302
@bandhammanikanta6302 2 жыл бұрын
Thank you for putting this video explanation. but When I searched for this question "Monthly percentage difference stratascratch", youtube suggested me this video but I wasn't sure if this the exact one until I hit the question link from description. Please change the title according to the question. Thank you.
@ItsWithinYou
@ItsWithinYou 2 жыл бұрын
Many thanks for sharing this example!
@stratascratch
@stratascratch Жыл бұрын
Glad it was helpful!
@priyankasarkar6600
@priyankasarkar6600 3 жыл бұрын
Morning Bliss Tutorial!
@stratascratch
@stratascratch 3 жыл бұрын
Thank you for watching!
@pallavi_joseph
@pallavi_joseph 2 жыл бұрын
You are a star
@nargisparvin4267
@nargisparvin4267 3 жыл бұрын
Excellent Video
@stratascratch
@stratascratch 3 жыл бұрын
Thanks so much for watching! Let me know if you have any requests.
@flipcase
@flipcase 3 жыл бұрын
This channel is amazing! Very well broken down🙏🙏 thank you My feedback is please keep making such amazing videos
@stratascratch
@stratascratch 3 жыл бұрын
Thank you so much! Will keep making these videos!
@flipcase
@flipcase 3 жыл бұрын
@@stratascratch I cracked the technical round of one of these organisations thanks to you. Will keep you posted! I am endorsing you to all my friends. Keep it going Nate!
@stratascratch
@stratascratch 3 жыл бұрын
@@flipcase Congrats Sachin! Good luck to you on the next rounds! Keep prepping and you'll do well. Best of luck!
@flipcase
@flipcase 3 жыл бұрын
@@stratascratch Hey Nate I made it! Thank you so much for your videos. How can I email you?
@stratascratch
@stratascratch 3 жыл бұрын
@@flipcase Very cool and congrats! I'd love to get a testimonial to add to the site! Email me at nate@stratascratch.com
@hiovanycubillosgomez5901
@hiovanycubillosgomez5901 2 жыл бұрын
Hey man that's great video, beginning with this, can you recommend sites and also tools to practice topics T-SQL, really i appreciate it, thank you so much.
@immanuelsuleiman7550
@immanuelsuleiman7550 3 жыл бұрын
Very awesome Thank you
@stratascratch
@stratascratch 3 жыл бұрын
Thanks again for watching!
@prabhaker9031
@prabhaker9031 2 жыл бұрын
What a great insight into the solution besides the solution itself.
@stratascratch
@stratascratch 2 жыл бұрын
Yes, of course. Try stratascratch.com =).
@brianchuck9264
@brianchuck9264 2 жыл бұрын
Great content.. Keep it coming! If possible please reference differences in syntaxes across the different rdms.For me, the explanation of the double colon helped...as an example.. I use snowflake and terradata
@stratascratch
@stratascratch 2 жыл бұрын
Will do! I use HIVE at work so I get what you're saying. Will add more explanations between nuances. Thanks for the feedback.
@lizard_sinno
@lizard_sinno 3 жыл бұрын
Great!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching man!
@arkamukherjee457
@arkamukherjee457 2 жыл бұрын
For MySQL, I used: date_format(created_at, '%Y-%m')
@SuddenlySubtle
@SuddenlySubtle 2 жыл бұрын
Thank you for the video! What would happen if a particular month had no values? Then lag of 1 would not work... Self join on month and the calculated version of previous month, in a CTE could work... A month key = year*12 + month...prev month = month - 1
@myskillset5833
@myskillset5833 2 жыл бұрын
Good 👍 explanation. Is there possibility for data set so more helpful for practice @ local 🖥
@kylehuang7926
@kylehuang7926 3 жыл бұрын
Great video, an alternative solution is to self join the table but a.month = b.month - 1 (assuming only 2019)
@stratascratch
@stratascratch 3 жыл бұрын
Yea that would work too! Thanks for watching the videos.
@majafuntv4538
@majafuntv4538 3 жыл бұрын
thanks
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching!
@morninglory8988
@morninglory8988 Жыл бұрын
I have a question about that TO_CHAR part. It basically converted the date data type to a string data type. So for example, If I wanted to do some calculations on these dates (for example how many days have passed from one date to another) I wouldn't be able to do so, right? That TO_CHAR part is only to let us order by the month, right?
@priyankalad7789
@priyankalad7789 3 жыл бұрын
Crisp n clear explanation. Is it always after group by we have to specify Window alias ??
@stratascratch
@stratascratch 3 жыл бұрын
No I don't think it needs to be but I"m also not sure. I always put it at the bottom so it's away from the main part of the code.
@priyankalad7789
@priyankalad7789 3 жыл бұрын
@@stratascratch Thanks. that helps
@mithunnambiar1433
@mithunnambiar1433 2 жыл бұрын
Amazing vid!!! Great going....Could you also the schema used in the creation of this problem?
@stratascratch
@stratascratch 2 жыл бұрын
Sorry, could I what with the schema?
@mithunnambiar1433
@mithunnambiar1433 2 жыл бұрын
@@stratascratch typo*... Could you also upload the schema on your site
@stratascratch
@stratascratch 2 жыл бұрын
@@mithunnambiar1433 In the description you'll see a link to the platform. And it has the schema below the question. Hope that helps!
@bodwiser100
@bodwiser100 2 жыл бұрын
Thanks for the video! quick question - in the over() clause, why did we not use partition by instead of order by?
@stratascratch
@stratascratch Жыл бұрын
Do you mean to simply replace the ORDER BY clause with the PARTITION BY? In this case, the code would be: SELECT to_char(created_at::date, 'YYYY-MM') AS year_month, (sum(value) - lag(sum(value), 1) OVER (PARTITION BY to_char(created_at::date, 'YYYY-MM')))/ lag(sum(value), 1) OVER (PARTITION BY to_char(created_at::date, 'YYYY-MM')) FROM sf_transactions GROUP BY year_month This wouldn't work because, in the LAG() and LEAD() functions, the ORDER BY clause is mandatory. In other words, these functions won't work without it. Also, the PARTITION BY and ORDER BY clauses have different purposes in the window functions. They can't be simply replaced. If that's what you're asking. The PARTITION BY groups data into subsets according to the specified column. ORDER BY determines the order of the window function execution. In this case, it's from the oldest to the newest month. Does that answer your question?
@localmartian9047
@localmartian9047 Жыл бұрын
Here partition by was not needed because we need to apply lag on whole table not just inside a window over some col. Like this. We could have taken an empty partition by in spark sql which would be the same. While order by yyyy_mm was needed because we want to find data of previous month to find the revenue difference. All window functions like lead, lag, rank, dense rank, row number need the window to be ordered so that applying the function makes sense unlike normal agg functions like avg sum
@henryyin2731
@henryyin2731 2 жыл бұрын
very well explained... do you think this is a medium level interview question or hard level?
@stratascratch
@stratascratch 2 жыл бұрын
I would think this is a hard level question for an interview.
@TheHamous
@TheHamous 2 жыл бұрын
interesting, but i doubt that this is an amzon test interview because i think window aliasing is not available in aws redshift isnt it?
@mohars
@mohars 3 жыл бұрын
MySQL Syntax: select date_format(created_at,'%Y-%m') as "year_month", ROUND(((sum(value)-lag(sum(value),1,0) over(order by date_format(created_at,'%Y-%m')))/(lag(sum(value),1,0) over(order by date_format(created_at,'%Y-%m')))*100.00),2) from sf_transactions group by date_format(created_at,'%Y-%m') order by date_format(created_at,'%Y-%m');
@stratascratch
@stratascratch 3 жыл бұрын
Nice. That would also work in postgres right?
@zuzuyun4944
@zuzuyun4944 2 жыл бұрын
Quick question! Based on the SQL query execution sequence rule, shouldn''t we use the full query, to_char(created at::date, 'YYYY-MM', instead of year month next to GROUP BY?
@stratascratch
@stratascratch 2 жыл бұрын
I'm not sure what the sequence rule is. Can you share your code that would use a full query? The YYYY-MM and group by merely aggregates all the date to a YYYY-MM format so we can group the data by year-month
@awsomeboyz8112
@awsomeboyz8112 2 жыл бұрын
Hi, i notice in some of your videos you say that you cant put the alias in the group by clause, but here you do. Is there general rules of thumb in terms of when you can group by an alias vs writing the full derived column out?
@stratascratch
@stratascratch 2 жыл бұрын
Great attention to detail =). It depends on the SQL engine dialect you're on. With postgres, you can use an alias. But with other engines like HIVE, you can't. The platform, StrataScratch, is on postgres and MySQL so you can use an alias. Hope that helps.
@AlirezaAminian
@AlirezaAminian Жыл бұрын
Hi, thanks for the video. I have a question though. When you the GROUP BY clause, you only put year_month. However, when using GROUP BY, you have to indicate all the columns specified in SELECT in the GROUP BY clause otherwise it will through an error. Can you please advise?
@stratascratch
@stratascratch Жыл бұрын
This is the case, but not when there are window functions. The window functions are not allowed in the GROUP BY clause. They work on the query result; that's why it works without putting the revenue_diff column in the GROUP BY.
@mohars
@mohars 3 жыл бұрын
MySQL Query: select date_format(created_at,'%Y-%m') ,ROUND((SUM(value) - lag(SUM(value),1) over(w))/lag(SUM(value),1) over(w)* 100.00,2) as revenue_diff FROM sql_practice.sf_transactions group by 1 WINDOW w as (order by date_format(created_at,'%Y-%m')) order by 1;
@laurak5196
@laurak5196 2 жыл бұрын
Thanks so much for sharing. I’m new to this concept. What was the window clause called? And why was it at the end? I was thinking of a cte which would be at beginning along with the with clause? So this was something new for me. Thanks again for your help.
@stratascratch
@stratascratch 2 жыл бұрын
It's just called an ALIAS. It's in the end just because that's where it's usually written. It's not something that's used a lot but it does help with code readability. You don't have to use it. Glad you learned something!
@laurak5196
@laurak5196 2 жыл бұрын
@@stratascratch I always learn something from your work, even if it’s just hearing your thought process. I get impatient when I get stuck on a problem, and I know I need more practice. You and others here on KZbin make it look so easy - does it take awhile to be able to quickly solve problems like this, or am I just slow lol?
@stratascratch
@stratascratch 2 жыл бұрын
@@laurak5196 It takes a while. I wrote in SQL everyday for several years at my job to a point where I could write SQL better than English ;). The thought process and approach to solving the problem is something that you'll keep learning though. Problems get harder as you get more experience but you'll get the syntax down after a while. Good luck!
@laurak5196
@laurak5196 2 жыл бұрын
@@stratascratch thanks so much, this means a lot coming from you, Nate 🙏🏻 I really loved SQL from the start, and sometimes it does get frustrating but hearing that from you really encourages me to keep pressing forward. Thanks again for your time and expertise; you’re helping a lot of people. 😀
@VinodKumar-nn7go
@VinodKumar-nn7go 2 жыл бұрын
Hi Nate, can we not use PARTITION BY inside the window function instead of Group by ? Will that work? I will try and let you know if that works may be I will have to use a CTE
@stratascratch
@stratascratch 2 жыл бұрын
It could work. There are many ways to solve the problem. I think a PARTITION BY should work and would eliminate the need for a GROUP BY so give it a try on the platform!
@Lakshmi1010
@Lakshmi1010 Жыл бұрын
Hi Sir, I am learning advanced SQL and have a query here ... How its working when Year_month is not given in group by clause before aggregation of data by year_month wise ?
@NotFound-iu8wx
@NotFound-iu8wx 3 жыл бұрын
✌️
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching! See you on reddit =)
@shobhamourya8396
@shobhamourya8396 3 жыл бұрын
Window function aliases allow the complex calculations to be reused and with CETs programming logic can be modularized for readability and reusability with cet1 as ( -- extract YYYY-MM from created_at select id, created_at, to_char(cast(created_at as date), 'YYYY-MM') as year_month, value from sf_transactions ), cet2 as ( -- calculate the monthly revenue select year_month, sum(value) monthly_revenue from cet1 group by year_month order by year_month ), cet3 as( -- get the revenue from previous month and calculate the diff wrt current month select year_month, monthly_revenue, LAG(monthly_revenue,1) over(order by year_month) prev_revenue, (monthly_revenue - LAG(monthly_revenue,1) over(order by year_month)) revenue_diff from cet2 ) -- calculate percentage change in revenue from previous month -- select * from cet3 select year_month, round((revenue_diff*1.00/prev_revenue*1.0)*100,2) percent_change from cet3
@stratascratch
@stratascratch 3 жыл бұрын
Love it. That's why I like using CTEs!
@shobhamourya8396
@shobhamourya8396 3 жыл бұрын
@@stratascratch I just realised it's CTE and not CET 😀
@stratascratch
@stratascratch 3 жыл бұрын
@@shobhamourya8396 No worries. I got what you mean. CTE = common table expression
@privateaccount5917
@privateaccount5917 3 жыл бұрын
Thumps up :D, Can i user extract? to_char, extract which one is common? less messy
@stratascratch
@stratascratch 3 жыл бұрын
So to_char() was used to get the date to a YYYY-MM format which also turns it into a varchar data type. With extract() or date_part(), you'd only be able to take out one component of the date like the YYYY or the MM (unless I'm mistaken). Also, extract() and date_part() will keep the date as a date data type. Just depends on what you want to use it for.
@privateaccount5917
@privateaccount5917 3 жыл бұрын
@@stratascratch thank so much your clear answer 👏👏💯
@kritiverma1342
@kritiverma1342 2 жыл бұрын
How would this look like in SQL Server sql query: SELECT CONCAT(DATEPART(yy, created_at), ‘-‘, DATEPART(mm, created_at)) as year_month, ((SUM(value) - LAG(SUM( value), 1) OVER ( ORDER BY year_month))/ LAG(SUM( value), 1) OVER ( ORDER BY year_month))*100 FROM sf_transactions) GROUP BY year_month ORDER BY year_month
@saikatdasgupta2006
@saikatdasgupta2006 3 жыл бұрын
Here is one more approach with cte as(select to_date(concat(date_part('year',created_at),'-',date_part('month',created_at)),'yyyy-mm') dt, sum(value) sum1 from sf_transactions group by dt order by dt) select ct2.dt, round(((ct2.sum1 - ct1.sum1)/ct1.sum1)*100,2) from cte ct1 , cte ct2 where date_part('month',ct2.dt)=date_part('month',ct1.dt)+1 order by dt
@anggipermanaharianja6122
@anggipermanaharianja6122 2 жыл бұрын
yours is basically true, for readibility you can re-write even easier to read like this: with cte as ( select to_char(cast(created_at as date), 'YYYY-MM') as year_month, sum(value) as curr_revenue, lag(sum(value), 1) over w as prev_revenue from sf_transactions group by 1 window w as (order by to_char(cast(created_at as date), 'YYYY-MM')) ) select year_month, round(100 * (curr_revenue - prev_revenue) / prev_revenue, 2) as pct_change from cte order by 1 asc;
@iklintsov
@iklintsov 2 жыл бұрын
Is this only me understood "last month's revenue" as revenue for "2019-12"? Wouldn't it be better to call "next month's revenue"?
@amberyang802
@amberyang802 2 жыл бұрын
why not using partition by in the window function instead of group by?
@stratascratch
@stratascratch 2 жыл бұрын
Yea you can probably do that too. Many solutions to these types of problems. That's the beauty of it!
@joaopedroreissilva7075
@joaopedroreissilva7075 2 жыл бұрын
Hi, Nate! So, first of all, as always, impressive! Thank you for sharing. But, I tried to solve it by myself before and, I couldn’t undertand why my code stoped working at this point, as below: WITH t1 AS (SELECT to_date (to_char(created_at :: date ,'YYYY-MM'), 'YYYY-MM-01') AS month_year, SUM(value) AS tt_value FROM sf_transactions GROUP BY 1 ORDER BY 1) SELECT month_year, tt_value, LEAD (tt_value, 1, 0) OVER (ORDER BY month_year ASC) AS val_2 FROM t1 Could you help me? I reviewed it several times, but I couldn’t find my mistake. Regardless, I understood that, in this case, lag is more apropriate than lead...
@stratascratch
@stratascratch 2 жыл бұрын
Thanks for watching the video and giving it a try. Can you post your question on the discussion forum of the question? The link to the question is in the description. Someone from my team or myself will be able to get to it within a few days!
@joaopedroreissilva7075
@joaopedroreissilva7075 2 жыл бұрын
@@stratascratch You're welcome. Of course, thanks for your attention.
@adityabhardwaj8007
@adityabhardwaj8007 3 жыл бұрын
In SQL Server , to_Char is not working, what we need to use?
@stratascratch
@stratascratch 3 жыл бұрын
I think format() should work. stackoverflow.com/questions/47042286/ms-sql-server-equivalent-of-to-char/47042371
@adityabhardwaj8007
@adityabhardwaj8007 3 жыл бұрын
@@stratascratch working, Thank you!!
@aashi9781
@aashi9781 2 жыл бұрын
Hey Nate or anyone who has appeared for Amazon DS interviews, I have questions- 1) Do you think these questions are the part of initial round in Amazon interview? Do you think that the question may be hard/medium at this first one hour interview or are they more likely to be the part of interview further down the process? 2)Also I see that in Strata stat we get option of Python/PostgreSQL. Do you think that in the interviews these options are given to the candidate?
@stratascratch
@stratascratch 2 жыл бұрын
I have never been on an Amazon interview but all those questions on StrataScratch come from real Amazon interviews. Typically you'll get easier questions in the initial round (medium level) and hard questions come in the last rounds. Hope that helps.
@aashi9781
@aashi9781 2 жыл бұрын
@@stratascratch Thanks for you reply Nate. Gaining a lot from watching your videos.
@techiewithcamera
@techiewithcamera Жыл бұрын
My Solution: select to_char(created_at::date, 'YYYY-MM') year_month, round(((sum(value) - lag(sum(value)) over (order by to_char(created_at::date, 'YYYY-MM')))/lag(sum(value)) over (order by to_char(created_at::date, 'YYYY-MM')))*100, 2) revenue_diff from sf_transactions group by 1 order by 1;
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 5 МЛН
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 20 МЛН
⬅️🤔➡️
00:31
Celine Dept
Рет қаралды 47 МЛН
MEU IRMÃO FICOU FAMOSO
00:52
Matheus Kriwat
Рет қаралды 30 МЛН
Advanced Facebook Data Science SQL interview question [RANK()]
24:27
WINDOW FUNCTIONS | Advanced SQL
25:47
Mo Chen
Рет қаралды 16 М.
Top 5 Data Analyst Interview Mistakes
5:28
Alex The Analyst
Рет қаралды 79 М.
The Full Stack Data Scientist - Python Libraries You Need To Know
8:52
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 79 М.
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 5 МЛН