Amazon Data Science Interview Question Walkthrough | How to Deal with Streaks in SQL

  Рет қаралды 11,392

StrataScratch

StrataScratch

Күн бұрын

Пікірлер: 30
@shanaebrown1563
@shanaebrown1563 2 жыл бұрын
Is it weird that i want to build my skills and work for strata scratch I love the website . Keep it up guys I’m coming to work for you guys in the future lol
@TheRaju991
@TheRaju991 2 жыл бұрын
This was a pretty tough question!! Enjoyed it greatly.
@alfatmiuzma
@alfatmiuzma 2 жыл бұрын
Very good explanation 👌👌..love to see more such videos..maybe some medium level questions 😊
@stratascratch
@stratascratch 2 жыл бұрын
More coming in the medium to hard level!
@alfatmiuzma
@alfatmiuzma 2 жыл бұрын
@@stratascratch Thanks for considering my request.
@guinhaa
@guinhaa 2 жыл бұрын
Superb video! I would love to see this same format of video but focusing on a Interview for a Data Analyst position
@stratascratch
@stratascratch 2 жыл бұрын
We'll have a few more coming out soon. Glad you like our new format.
@supriya1316
@supriya1316 2 жыл бұрын
Thanks for your video 👍
@Isaiah094
@Isaiah094 2 жыл бұрын
The second code example was hell. All in all a really tough question to follow, I’ll just have to practice myself
@stratascratch
@stratascratch 2 жыл бұрын
Sorry to hear. It's a tough question overall. We'll try to break it down with more detail next time!
@Isaiah094
@Isaiah094 2 жыл бұрын
@@stratascratch you guys did well! It was just hell for me. Like I’m saying it’s a tough question overall not that you guys are bad! Haha
@danchensun9686
@danchensun9686 2 жыл бұрын
thank you for the clear explanation !
@stratascratch
@stratascratch 2 жыл бұрын
Glad it was helpful!
@ClaudioBOsorio
@ClaudioBOsorio 2 жыл бұрын
Great! Thanks for the video
@touchsoumya
@touchsoumya 2 жыл бұрын
@StrataScratch I think we would miss out displaying the player ids who has not won any matches. But as per the actual question, we should still be able to display all player ids and the streak length should be 0 for the player who has not won any matches. Could you please clarify?
@touchsoumya
@touchsoumya 2 жыл бұрын
select a.player_id,coalesce(longest_streak,0) longest_streak from ( (select distinct player_id from Matches ) a left join ( select distinct player_id, cnt as longest_streak from ( select player_id,streaks,count(*) as cnt, row_number() over(partition by player_id order by count(*) desc) as rn from ( select player_id , match_day , result, row_number() over (partition by player_id order by match_day ) as rn1, row_number() over (partition by player_id,result order by match_day ) as rn2, (row_number() over (partition by player_id order by match_day )) - (row_number() over (partition by player_id,result order by match_day )) streaks from Matches )a where result = 'win' group by player_id,streaks )b where rn=1 )c on a.player_id = c.player_id ) --This works in SQL Server
@stratascratch
@stratascratch 2 жыл бұрын
Since we are asked only to get the win streak and the longest one, the players that haven't won any match don't influence the solution.
@shima1960ify
@shima1960ify 2 жыл бұрын
The second solution is really difficult. The first solution is really cleaver one. Do you think there is another way to solve the question?
@stratascratch
@stratascratch 2 жыл бұрын
There can be a number of different ways to solve a question. You might find a solution that can work for you.
@SP-db6sh
@SP-db6sh 2 жыл бұрын
Easier options there in pandas to solve this problem , can't we take this into Pandas dataframe ?
@siddhantkumar6145
@siddhantkumar6145 2 жыл бұрын
I have some questions of SQL, how can i get the answers of those. can you make videos on that questions. and how can i send you those questions.
@yuthpatirathi2719
@yuthpatirathi2719 2 жыл бұрын
@nate or anyone from the stratascratch team show us how this question can be solved using lag and lead . I got a same question in a grubhub interview and they mentioned can you please use lag and lead rather than this method. Would really appreciate the guidance? Thank you nate
@stratascratch
@stratascratch 2 жыл бұрын
Hi yuthpati rathi, Sorry, but I can't imagine a good and robust solution with lag or lead functions. If streak is == 5, then you would need to have 4 lagged rows along with current row to determine streak is 5. But if it's 6 then you'd need 5 lagged rows along with the current row. Here is another solution that is maybe more clear: -- CTE: calculate streak_end variable that is 1 when loss happens -- CTE2: use window function sum of streak_end variable to determine streak_id -- CTE_STREAK: COUNT player, streak id combinations -- OUTER QUERY: output players with longest streak WITH cte AS (SELECT player_id, match_date, match_result, CASE WHEN match_result = 'L' THEN 1 ELSE 0 END AS streak_end FROM players_results ORDER BY 1, 2), cte2 AS (SELECT *, SUM(streak_end) OVER(PARTITION BY player_id ORDER BY match_date) AS streak_id FROM cte), cte_streak AS (SELECT player_id, count(*) AS streak FROM cte2 WHERE match_result = 'W' GROUP BY player_id, streak_id) SELECT player_id, streak FROM cte_streak WHERE streak = (SELECT max(streak) FROM cte_streak) Lag and lead functions can only work if you already know the longest streak and you just need to determine the player/entity with that streak.
@emiliogarza6446
@emiliogarza6446 2 жыл бұрын
@@stratascratch I think I might've come up with a solution with lead and lag that works without previously knowing the longest streak, I'd really appreciate it if I could get feedback/criticism on it! It's like 25 lines without comments. Really loving the premium questions, btw. Much better than leetcode :) --Creates a table that replaces dates with an ordered match number WITH match_number AS ( SELECT player_id, match_result, ROW_NUMBER() OVER (PARTITION BY player_id ORDER BY match_date) AS match_number FROM players_results ), -- Adds a column that integrates results of upcoming matches into the respective records next_match AS ( SELECT *, LEAD(match_result) OVER(PARTITION BY player_id) AS next_match_result FROM match_number ), -- Adds a boolean value to determine if the next or previous rows are consecutive -- by using lead() and lag() to check neighboring match numbers. Becomes -- meaningful in the next CTE when checking for consecutive wins. next_consecutive_bool AS( SELECT *, LEAD(match_number) OVER (PARTITION BY player_id) - match_number = 1 OR LAG(match_number) OVER (PARTITION BY player_id) - match_number = -1 AS next_conse FROM next_match WHERE match_result = next_match_result ), -- Filters the results to exclude losses. Using a ROW_NUMBER() function, this CTE adds -- a column that serves to identify consecutive wins by using the boolean value "next_conse" -- from the previous CTE. -- -- » If "next_conse" column = FALSE, this indicates a pair of consecutive wins only -- » If next_conse is TRUE, this indicates a presence of 3 or more consecutive wins, with the -- total number of wins being determined by the maximum row_num of that group. -- e.g. id 402 has a MAX row_num of 4, indicating 5 consecutive wins (row_num + 1) final_result_table AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY player_id, next_conse ORDER BY player_id, match_number, next_conse DESC) AS row_num FROM next_consecutive_bool WHERE match_result = 'W' ) -- Retrieves the player(s) who had the maximum amount of consecutive wins, determined -- by "MAX(row_num) from final_results_table". Adds +1 to the row_num to properly display -- the result. This +1 exists because our next_consecutive_bool table filters out the first -- match as to not get any non-consecutive pairs. SELECT player_id, row_num + 1 AS win_streak FROM final_result_table WHERE row_num = ((SELECT MAX(row_num) FROM final_result_table))
@oscararmandocisnerosruvalc8503
@oscararmandocisnerosruvalc8503 2 жыл бұрын
I tried everything but common patterns can be repeated, hence, it´s really difficult. Even if you are labeling your data and you use comparison through those windows functions it´s really hard
@techiewithcamera
@techiewithcamera 2 жыл бұрын
with consecutive_win_data as (select player_id, match_date, match_result, row_number() over (partition by player_id order by match_date) as row_num_1, row_number() over (partition by player_id , match_result order by match_date) as row_num_2 from players_results) select player_id, cnt, rank() over (order by cnt desc) from (select player_id, count(*) as cnt, rank() over (partition by player_id order by count(*) desc) as rank from (select player_id, match_date, match_result , (row_num_1-row_num_2) as diff from consecutive_win_data where match_result='W')x group by player_id, diff)y where rank=1 order by cnt desc limit 2;
@nikhilp2351
@nikhilp2351 2 жыл бұрын
The first solution wouldn't always work. For example, if the streak is WLWWW, it wouldn't work.
@duckpond2672
@duckpond2672 2 жыл бұрын
WWLLLWL wouldn't work neither... That method is often (like in this video) misapplied without discussing its use case.
@johnwig285
@johnwig285 2 жыл бұрын
I don't understand why they would ask this instead of logistics related questions like number of deliveries, shortest distance etc which would be more practical
@stratascratch
@stratascratch 2 жыл бұрын
They probably do ask those questions too. I've experienced simple, practice questions and longer, more complex questions later in the interview series.
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 82 М.
SQL Syntax Best Practices: How to Structure Your SQL Code
16:35
StrataScratch
Рет қаралды 23 М.
Practice SQL Interview Query | Big 4 Interview Question
14:47
Solving a tricky SQL Interview Query
19:24
techTFQ
Рет қаралды 50 М.
Advanced Facebook Data Science SQL interview question [RANK()]
24:27