Solving a LinkedIn Data Science SQL Interview Question

  Рет қаралды 33,422

Jay Feng

Jay Feng

4 жыл бұрын

Hi everyone, today I go over how to tackle a medium difficulty interview question asked on a data science interview at LinkedIn!
Try writing your own solution and check your answer here for the Repeat Job Postings SQL problem: www.interviewquery.com/questi...
👉 Subscribe to my data science channel: bit.ly/2xYkyUM
🔥 Get 10% off your next data science interview prep: www.interviewquery.com/pricin...
❓ Check out our data science interview course: www.interviewquery.com/course...
🔑 Get professional coaching here: www.interviewquery.com/coachi...
🐦 Follow us on Twitter: / interview_query
More from Jay:
Read my personal blog: datastream.substack.com/
Follow me on Linkedin: / jay-feng-ab66b049
Find me on Twitter: / datasciencejay
Relevant Links
Three SQL Concepts You Need to Know: www.interviewquery.com/blog-t...
SQL Case Study Guide: www.interviewquery.com/p/sql-...

Пікірлер: 59
@ryezizzle
@ryezizzle 3 жыл бұрын
instead of average, could you use MAX?
@icvsmiglani8790
@icvsmiglani8790 3 жыл бұрын
I am trying to imitate how you explain your thinking, I think it is very important in a live interview! Great video jay.
@neversaynever22
@neversaynever22 4 жыл бұрын
Nice. Thanks for these videos. I wouldn’t have thought of using an average. Are you going to do a video on the 2nd question?
@sophial7317
@sophial7317 3 жыл бұрын
Hey, keep up the good work! The trick of avg is super smart!
@rishw824
@rishw824 3 жыл бұрын
Max could have been another solution i beleive
@GadgetsOfHitanshYadav
@GadgetsOfHitanshYadav 3 жыл бұрын
I think the query where your rolling up for getting final output is incorrect. You should be using count of userid in outer query not sum as aggregate. Instead of using avg. Using max could be a better option.
@adibhatlavivekteja2679
@adibhatlavivekteja2679 3 жыл бұрын
Window Function of Rank on number of times posted partitioned by user_id, jobId and order by desc of number of times Filter for 1 thereby getting the max case when this max > 1 then more than once else less than once we can just use a max as well at the same hierarchy level
@jaysahu1117
@jaysahu1117 3 жыл бұрын
The answer to problem# 2: with posting_cnt as (select user_id, job_id, count(date_posted) as num_posting from job_posting j1 inner join job_posting j2 on j1.user_id=j2.user_id and j1.job_id=j2.job_id and j1.date_posted=j2.date_posted group by user_id, job _id having count(date_posted)>1) select (count(distinct pc.user_id)/(select count( distinct user_id) from job_posting))*100 as '% of users posted again in 180 days', (1-(count(distinct pc.user_id)/(select count( distinct user_id) from job_posting))*100 as '% of users posted once', from posting_cnt pc
@mariakamynina4550
@mariakamynina4550 4 жыл бұрын
Hi Jay, thanks so much for your video, this is so helpful! Is it ok to go through the same thought process (draw the output, and then dummy table) during the interview, or are you supposed to write the code straight away?
@iqjayfeng
@iqjayfeng 4 жыл бұрын
yeah definitely, it's all about showing your thought process
@Leon_1218
@Leon_1218 2 жыл бұрын
My solution - select sum(case when n_job_id = n_job_posted then 1 else 0 end) as post_one_time ,sum(case when n_job_id < n_job_posted then 1 else 0 end) as post_multiple_times from (select user_id ,count(distinct job_id) as n_job_id ,count(*) as n_job_posted from job_postings group by 1) a
@TheElementFive
@TheElementFive 3 жыл бұрын
This seems like a good use case for a self join
@nanfengbb
@nanfengbb 3 жыл бұрын
Is the whole problem#2 part missing? What about the "within 180 days of posting the same job again" constraint? I think you'll need a self join to do that.
@vandanalunia2307
@vandanalunia2307 3 жыл бұрын
It would be great to hear your thoughts on this query. It might be bit short but the credit for crux of the code goes to Jay. With cte ( Select user_id, job_id, count(distinct date_posted) as job_posted from job_posting Group by 1,2) Select sum(case when max(job_posted) over () = 1 then 1 end) as user_once, sum(case when max(job_posted) over() != 1 then 1 end) as user_multiple from cte Group by user_id;
@oliveiralgm
@oliveiralgm 2 жыл бұрын
I think this should work: select count(distinct case when num_post >1 then 1 end) as num_post_mult , count(distinct user_id) - num_post_mult as num_post_1 from( select user_id ,job_id ,count(distinct date_posted) as num_post from job_postings)
@shruthinalpetadimurthy6224
@shruthinalpetadimurthy6224 3 жыл бұрын
I think this should work for the 1st question - with cte(user_id, freq) as (select user_id, case when sum(ct) = count(job_id) then 'Y' else 'N' end as freq from (select user_id, job_id, count(id) as ct from job_postings group by user_id, job_id) group by user_id) select count(case when freq = 'Y' then 1 else null end) as only_once , count(case when freq = 'N' then 1 else null end) as more_than once from cte;
@poopah4497
@poopah4497 2 жыл бұрын
Did I oversimplify? Group by user ID, count post ID and count distinct post ID. Case when
@vincysagar4962
@vincysagar4962 2 жыл бұрын
Hi Jay, Thanks for the videos. Really helpful content. What do you think about the following solution? with user_job_agg as ( select user_id, count(distinct job_id) as disctinct_job_count, count(distinct date_posted) as distinct_post_date from jobs_tb group by user_id ) select sum(case when disctinct_job_count = distinct_post date then 1 else 0 end ) as posted_once, sum(case when disctinct_job_count < distinct_post date then 1 else 0 end) as posted_mult_times from user_job_agg ; And one clarification, while considering the users who posted only once, shouldn't we consider those users also who have posted a particular job once and another job multiple times, both?
@iqjayfeng
@iqjayfeng 2 жыл бұрын
You can try running your solution here to see if it's correct or not!: www.interviewquery.com/questions/repeat-job-postings
@jacquelineroronoad2254
@jacquelineroronoad2254 3 жыл бұрын
Hey Jay, here's my solution, please do let me know what you think🙈 I'm a bit unsure about the group by in the main query with t as (select user_id, job_id, count(date_posted) num_posted from job_postings group by 1, 2) select sum(count(job_id)=sum(num_posted)) user_once, sum(count(job_id)!=sum(num_posted)) user_multiple from t group by user_id If the above code doesn't work, the following should do, hopefully with t as (select user_id, job_id, count(date_posted) num_posted from job_postings group by 1, 2), tt as (select count(job_id) num_jobs, sum(num_posted) sum_num_posted from t group by user_id) select sum(num_jobs = sum_num_posted) user_once, sum(num_jobs != sum_num_posted) user_multiple from tt
@mechanicaltypewriteroperat9885
@mechanicaltypewriteroperat9885 3 жыл бұрын
Who does this in real life work problems. Is mostly QBE nowadays plus if anything there should be more data enhancements such as location, status, sub tasks, salary, perks, digital sign, file formats, size, manual fax just in case the system needs 'maintenance' or lockouts. I would be worried if LinkedIn ask those questions.
@shouryasrivastava6559
@shouryasrivastava6559 3 жыл бұрын
@@mechanicaltypewriteroperat9885 wtf has this got to do with her code wiseass
@FunnyComedian3875
@FunnyComedian3875 3 жыл бұрын
@@shouryasrivastava6559 The above user has said his opinion on the situation if linkedin asks more complex questions as well. What is wrong with u my friend. Any grudge? Please dont get offend with this too.
@arshiailaty
@arshiailaty 6 ай бұрын
5:27 I think we have two users that have posted their jobs only once. Userid 1 has posted a job id = 2 only once. So, we should consider this user as the one who has posted their jobs only once. Is my assessment accurate?
@garyboy7135
@garyboy7135 2 жыл бұрын
if each time user posts a job that creates a row, and assume that's keyed with unique ID with ID column, why don't you just count number of ids (which is essentially user-job_id-date_posted pair combination), and aggregate at user level. So it can be as simple as: with job_posting_user_agg as ( SELECT user_id, count(id) as num_postings FROM job_posting GROUP BY 1 ) SELECT count_if(num_jobs = 1) as single_posters, count_if(num_jobs > 1) as multiple_posters FROM job_posting_user_agg
@shirbarel7097
@shirbarel7097 2 жыл бұрын
with kpi as ( select user_id, count(distinct job_id) total_job, count(distinct date_posted) total_posts from job_posts group by 1) select sum(case when total_posts > total_job then 1 else 0 end) users_multiple, sum(case when total_posts = total_job then 1 else 0 end) users_once from kpi
@yanhanzhu8795
@yanhanzhu8795 2 жыл бұрын
Like this answer!! so clean and clever!
@aromax504
@aromax504 4 жыл бұрын
Is this postgres syntax?
@kartiikss8515
@kartiikss8515 3 жыл бұрын
Thank you for your help Jay. How would this query work? WITH user_job AS ( SELECT user_id, job_id, COUNT(DISTINCT date_posted) AS #_posted FROM job_postings GROUP BY user_id, job_id ) SELECT SUM(IF(MAX(#_posted) = 1, 1, 0 ) AS #_once, SUM(IF(MAX(#_posted) > 1, 1, 0) AS more_than_once FROM user_job GROUP BY user_id
@mechanicaltypewriteroperat9885
@mechanicaltypewriteroperat9885 3 жыл бұрын
I think I would disconnect because of the legacy system they probably have to many problems even with paperwork processes or just general reporting...
@brothermalcolm
@brothermalcolm 3 жыл бұрын
with jobs as ( select user_id, job_id, count(*) as num_posts from job_postings group by 1,2 ) select count(distinct case when num_posts > 1 then user_id end) as posted_at_least_one_job_multiple_times, (select count(distinct user_id) from job_postings) - count(distinct case when num_posts > 1 then user_id end) as posted_jobs_once from jobs
@MashiroRedo
@MashiroRedo 2 жыл бұрын
Shouldnt user be outputted as well?
@0yustas0
@0yustas0 2 жыл бұрын
WITH t AS ( SELECT user_id , COUNT(job_id) AS cnt_job_posting , COUNT(DISTINCT job_id) AS count_distinct FROM job_postings GROUP BY user_id ) SELECT SUM(CASE WHEN cnt_job_posting=count_distinct THEN 1 ELSE 0) once , SUM(CASE WHEN cnt_job_posting!=count_distinct THEN 1 ELSE 0) multiple FROM t
@mohammedghouse9088
@mohammedghouse9088 3 жыл бұрын
Using Average was a good idea to find uniqueness
@bbear1123
@bbear1123 2 жыл бұрын
Good problem. Max could have been used instead of avg
@sb-eb7yv
@sb-eb7yv 3 жыл бұрын
Does anyone know what SQL editor Jay is using?
@jinysong2990
@jinysong2990 3 жыл бұрын
sublime text
@limesrawsm
@limesrawsm 3 жыл бұрын
I cannot believe this was a "medium" level question, I would have totally expected that to be a hard difficulty level! haha guess I need more practice...
@l_Ryi_l
@l_Ryi_l 3 жыл бұрын
A bit confused as you assumed all job ids are the same job which I would say the opposite. Each job Id is a unique job so you taking to average would not be right. Where in the problem statement does it say all job ids are the same for a given user?
@charusamaddar6550
@charusamaddar6550 3 жыл бұрын
I guess this should work: I did not consider date coz that was not in specification, if a job id is posted multiple times, it can be on same day or any other day. Would appreciate your response if I missed anything : Select sum(case when totals= 1 then 1 else 0) as posted_once, sum(case when totals >1 then 1 else 0) as Posted_multiples from select userid, jobid, count(*) as totals from jobposting group by userid, jobid
@DelpreteVictor
@DelpreteVictor 3 жыл бұрын
Mic sound is a bit low
@carlosmercado9371
@carlosmercado9371 3 жыл бұрын
--MYSQL answer I came up with CREATE VIEW user_job_jobpost AS SELECT user_id, job_id, COUNT(job_id) AS num_job_posts FROM jobs GROUP BY user_id, job_id ORDER BY num_job_posts DESC; SELECT SUM(poster_status = "multiposter") AS count_multiposters, SUM(poster_status = "singleposter") AS count_singleposters FROM (SELECT user_id, IF(AVG(num_job_posts) > 1, "multiposter", "singleposter") AS poster_status FROM user_job_jobpost GROUP BY user_id) as user_poster_status;
@yogimaster007
@yogimaster007 2 жыл бұрын
for the second question, it seems as simple as this: SELECT count(A.user_id)/(SELECT(count distinct user_id) from job_postings) from ( select a.user_id, a.job_id, abs(a.date_posted - b.date_posted) as diff from job_postings a join job_postings b on a.job_id = b.job_id group by 1,2,3 qualify (ROW_NUMBER() over ( partition by a.user_id order by diff desc) =1 ) ) A where diff
@annachan6706
@annachan6706 3 жыл бұрын
uhhmmm uhhh uhhh
@ni12907
@ni12907 3 жыл бұрын
dude you unnecessarily complicated it, could have used count(*) as frequency, a switch case, and a having clause that checks count(*) >1
@mechanicaltypewriteroperat9885
@mechanicaltypewriteroperat9885 3 жыл бұрын
Who does this in real life work problems. Is mostly QBE nowadays plus if anything there should be more data enhancements such as location, status, sub tasks, salary, perks, digital sign, file formats, size, manual fax just in case the system needs 'maintenance'. I would be worried if LinkedIn ask those questions.
@ni12907
@ni12907 3 жыл бұрын
@@mechanicaltypewriteroperat9885 i have written full sql queries at work, yes there is qbe but not everyone has access to it. if your role is technical they expect you to write queries, and you aren't given access to qbe unless you urge... and i work for fortune 100, still dont hve qbe access.
@mechanicaltypewriteroperat9885
@mechanicaltypewriteroperat9885 3 жыл бұрын
@@ni12907 Thanks. Since we are dealing with various British English text depending on the system(s) structure is usually manipulated at higher programming languange(s) is much faster to save the QBE, some are proprietary, exceptions are SQL servers or PL/SQL, then to barbarically type each character. If anything expanding data is more important. Perhaps is clearance. Anyway I walk out if LinkedIIn ask these questions. Perhaps their system is for children to lacking narrow AI. I think HR duties are fully automated nowadays is just whether want to pay for it or build it with various teams which still costs money and is waste of labor resources. I assume your badge does not allow you to the safe and request the employee file manually with the custodian(s) signin and write the reason why The #### Department, Agency , and etc. need it. Anyway there are not really alot of advance computations with HR systems. If anything is expanding data structures, character count, and able to capture all written human languages with low logic/computations. I assume this is not a multinational corporation operating in almost all countries.
@learnlife8055
@learnlife8055 3 жыл бұрын
Kindly please share your thoughts on the below query for the above requirement Thanks a lot SELECT SUM(CASE WHEN [totalNumOFJobsPosted] > 1 THEN 1 END) AS 'JOB POSTED MULTIPLE TIMES', SUM(CASE WHEN [totalNumOFJobsPosted] = 1 THEN 1 END) AS 'JOB POSTED ONLY ONCE' FROM ( SELECT USER_ID,COUNT(DATE_POSTED) AS [totalNumOFJobsPosted] FROM [dbo].[JOB_POSTINGS] GROUP BY USER_ID ) AS T
@rick2591
@rick2591 2 жыл бұрын
with cntr as( select distinct user_id, count(user_id) over (partition by user_id) cnt from dbo.job_postings) select sum(case when cnt=1 then 1 else 0 end) once, sum(case when cnt>1 then 1 else 0 end) from cntr;
@kails1989
@kails1989 2 жыл бұрын
What about this? select Count(case when max = 1 then user_id end) as posted_once, Count(case when max > 1 then user_id end) as posted_more_than_once from (select user_id, max(n) as max from (select j.user_id, j.job_id , row_number() over(partition by user_id, job_id) as n from Job_postings j) group by 1)
@meliksahcelik
@meliksahcelik 2 жыл бұрын
There is a simpler way; GROUP BY user_id HAVING COUNT(job_id) = 1 --> posted once GROUP BY user_id HAVING COUNT(job_id) > 1 --> posted multiple
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 78 М.
Acing the Python Data Science Interview Questions
15:53
Jay Feng
Рет қаралды 60 М.
Indian sharing by Secret Vlog #shorts
00:13
Secret Vlog
Рет қаралды 32 МЛН
Glow Stick Secret (part 2) 😱 #shorts
00:33
Mr DegrEE
Рет қаралды 54 МЛН
КАХА и Джин 2
00:36
К-Media
Рет қаралды 2,9 МЛН
Acing the SQL Data Science Interview Questions
9:12
Jay Feng
Рет қаралды 22 М.
Ace Your Data Science Interview: Top SQL Questions to Master
12:49
Data Science SQL Mock Interview - Analyze Marketing Channels
23:23
Meta and Uber SQL Problems | Data Science Interview
14:04
DataInterview
Рет қаралды 8 М.
Real SQL Interviews: Amazon's TOP 2021 question REVEALED?
14:18
Real SQL Interview questions for Data Analysts
12:31
Jay Feng
Рет қаралды 4,5 М.
Обманет ли МЕНЯ компьютерный мастер?
20:48
Харчевников
Рет қаралды 182 М.
Vortex Cannon vs Drone
20:44
Mark Rober
Рет қаралды 15 МЛН
A4 Reset to zero
0:26
STYLE YT
Рет қаралды 17 М.