Solving SQL Interview Query for Data Analyst asked by a Product based company

  Рет қаралды 503,509

techTFQ

techTFQ

Күн бұрын

Пікірлер: 337
@7aMooDeTeMooN
@7aMooDeTeMooN Жыл бұрын
Why everyone is over complicating this? Check this simple solution : It could be done with where clause to filter years instead of writing all these rows for each year, also the divisor amount could considered when we turn null values into 0: Select customer_id, customer_name, Avg(case when billing is null then 0 else billing) as billing, From table Where Extract ( year from date_column ) in (“2019”, “2020”) Group by customer_id, customer_name I wrote this from my phone so iam sorry if it is not clear, i use this code in plSql could be there are some differences such as extract year but iam sure there is an equivalent in sql Thanks
@50_saifalikhan33
@50_saifalikhan33 Жыл бұрын
Bhai aapke approach me wo year ko consider nhi kiya ja rha jiska record exist nhi kr rha table me between 19 and 21. Suppose if we consider the customer A then according to table we have two records for year 20 and one record for year 21 but no record for year 19. If there would have been a record for year 19 with billingamt as null then your query may work well. But for now its of no use. That's what makes this question tricky and lengthy. According to your query the first output record would be: 1 A 116.666 but the expected output mentioned in ques is: 1 A 87.5
@7aMooDeTeMooN
@7aMooDeTeMooN Жыл бұрын
@@50_saifalikhan33hmmm i see that if we have no record of year 2019 we should add a record with 0 value. I understand now my answer could be not correct
@timothynathanael2475
@timothynathanael2475 Жыл бұрын
A little browsing, we can use recursive cte to generate date rows between start and end, And we just do right/left join from main table and the generated rows with year(tbl1.date) and year(tbl2.date) From that join, we will get result null if there's no trx for that user in that year Last, we just group by year(date) and customer_id, and select sum(trx amount)/count(trx amount)
@muditmishra9908
@muditmishra9908 Жыл бұрын
Nice question and a great explanation. Thanks a lot. I also tried my own and used string functions to come up with the solution. with cte1 as ( select customer_id, customer_name, count(*) as ct , sum(billed_amount) as amount, string_agg(cast(extract (year from billing_creation_date) as varchar), ',' ) as all_year from billing where extract(year from billing_creation_date) in (2019,2020,2021) group by 1,2 ) , cte2 as ( select *, case when all_year not like '%2019%' then 1 else 0 end as is_2019, case when all_year not like '%2020%' then 1 else 0 end as is_2020, case when all_year not like '%2021%' then 1 else 0 end as is_2021 from cte1 ) select customer_id, customer_name, round(amount*1.0/(ct+is_2019 + is_2020+is_2021),2)as avg from cte2
@KavishSrivastava
@KavishSrivastava 10 ай бұрын
well written query, quite simple and well concise query.
@muditmishra9908
@muditmishra9908 10 ай бұрын
@@KavishSrivastava thanks for the comment , I got a chance to revisist this question again, but now when i solved it again,i used different approach using the joins. sharing the recent solution below: with cte_year as ( SELECT 2019 AS year UNION ALL SELECT 2020 UNION ALL SELECT 2021 ) , cte_customer as ( select distinct customer_id from billing where year(billing_creation_date) in(2019,2020,2021) ) , cte_customer_with_year as ( select * from cte_customer cross join cte_year ) select cte_customer_with_year.customer_id , round(sum(billed_amount)/count(*),2) as average_billing_amount from cte_customer_with_year left join billing on cte_customer_with_year.customer_id = billing.customer_id and cte_customer_with_year.year = year(billing.billing_creation_date ) group by cte_customer_with_year.customer_id
@KavishSrivastava
@KavishSrivastava 10 ай бұрын
@@muditmishra9908 again a good approach which filters extra steps compare to your previous one, but the current query's output is incomplete which is missing customer_name in the result, which will eventually requires some changes in the code.
@KavishSrivastava
@KavishSrivastava 10 ай бұрын
here a try I gave : with c as ( SELECT f.customer_id, f.customer_name ,year, billed_amount FROM (SELECT DISTINCT customer_id, customer_name FROM billing) f CROSS JOIN (SELECT 2019 AS year UNION SELECT 2020 AS year UNION SELECT 2021 AS year) y LEFT JOIN billing b ON b.customer_id = f.customer_id AND YEAR(b.billing_creation_date) = y.year ORDER BY customer_id, year, billed_amount) SELECT customer_id, customer_name , ROUND(AVG(COALESCE(billed_amount, 0)),2) AS avg_billed_amount from c GROUP BY customer_id,customer_name ;
@vacksdevlopers9234
@vacksdevlopers9234 Жыл бұрын
select customer_name, sum(billing_amount)/(count(*) - count(distinct year(billing_creation_date)) +3) AS AVG_Billed_amount from table1 where year(billing_creation_date) BETWEEN '2019' and '2021' group by customer_name;
@miguelescalantemilke7204
@miguelescalantemilke7204 11 ай бұрын
Nice!! I’ve never seen someone explain SQL problems for interviews and gotta admit I love the format and the way you explain it. Insta-subscribed🎉. I’ve been practicing in codewars but I always feel like they’re either too simple tasks asking for a JOIN or something I just have never thought before. This really helped me improving my problem solving skills and the way I tackle SQL problems
@Siddharth_Matada
@Siddharth_Matada Жыл бұрын
Superb explanation...even a guy who doesnt even hear about SQL can understand from your video....great
@techTFQ
@techTFQ Жыл бұрын
Glad it helped bro
@shubhamagrawal7068
@shubhamagrawal7068 Жыл бұрын
Very complex approach in the video. Here is the most simplest approach (MySQL) : - SELECT customer_id, customer_name, ROUND(SUM(billed_amount) / (3 - COUNT(DISTINCT YEAR(billing_creation_date)) + COUNT(*)),1) AS avg_billed_amount FROM billing WHERE YEAR(billing_creation_date) BETWEEN 2019 and 2021 GROUP BY 1, 2
@amadei2
@amadei2 Жыл бұрын
Im confused by the second part of the AVG_billed_amount ( after the / ), wouldn't a simple AVG statement work since it's grouped ?
@himanshiparashar6545
@himanshiparashar6545 Жыл бұрын
@shubhamagrawal7068 I have written the below code for this but not getting the desired output. Can you help me find the mistake please. select customer_id,customer_name,sum(billing_amount)/(count(*)+3-count(distinct(year(x.billing_year)))) from( select *,year(billing_creation_date) as billing_year from billing where year(billing_creation_date) between 2019 and 2021) x group by customer_id,customer_name;
@AlleinArk
@AlleinArk Жыл бұрын
this is a banger answer
@justforfunpagla
@justforfunpagla 11 ай бұрын
Perfect!
@KavishSrivastava
@KavishSrivastava 10 ай бұрын
@@amadei2 No, it won't. Try to grasp the concept behind it. Suppose you have given 3 years to evaluate for, as in our case - 2019 to 2021. As for records- consider 1st id , it has 3 records, for 2020 twice and 2021. No. of years to evaluate for (2019 to 2021) = 3 No. of years (in 1st record )= 3 (i.e 2020, 2020, 2021) No. of distinct years(in 1st record) = 2 (i.e 2020, 2021) so the formula is - { No. of years (to evaluate for) - No.of distinct years } + No. of years = { 3 - Count(Distinct(years)) } + Count * = {3-2} + 3 = 4 for 1st record i.e id = 1, name =A Sum = 350 Count = 4 Avg = 350/4 = 87.5 . Hoping, now it is cleared.
@niazmorshed7847
@niazmorshed7847 Жыл бұрын
Great Explanation ! but if date range increases in will be difficult using case . Here I have dynamic Query WITH integer_sequence(n) AS ( SELECT 2019 -- starting value UNION ALL SELECT n+1 FROM integer_sequence WHERE n < 2021 -- ending value ) Select customer_id,customer_name,AVG(Amount) from ( Select A.n as bill_Year,A.customer_id ,A.customer_name,ISNULL(Amount,0) AS Amount from ( SELECT * FROM integer_sequence A cross join (Select distinct customer_id,customer_name from Test_SQL)B ) A left outer join Test_SQL B on A.n=DATEPART(YEAR,B.ddate) and A.customer_id=B.customer_id ) B group by customer_id,customer_name
@gomojo1024
@gomojo1024 Жыл бұрын
That was wild. A bunch of lights came on as I have been studying simple SQL terminology and basic query s cool thx
@robbiecarlos4537
@robbiecarlos4537 Жыл бұрын
I would be asking the interviewer if they wanted the average transaction billing for a customer for all transactions between 2019 to 2021 or if they wanted the average for each customer for each year (2019,2020,2021). Throwing in a transaction for 0 just because the customer did not have any other transactions that year makes no sense.
@blabberblabbing8935
@blabberblabbing8935 11 ай бұрын
Either the author of the post copied the problem's details wrongly or the problem was wrongly stated in the first place. In any case what average is wanted by the interviewer should be more clearly stated. If only the given info were available I'd assume that you should make an average over the single year for a single customer ( E_Y = Sum[transactions_yearY]/#transactions_yearY] ) and then average over the years ( E = Sum[ E_Y1, E_Y2, ..., E_YN]/N]. With this formula customer A in the table would have an average of 75, not 87.5.
@mithunkt1648
@mithunkt1648 Жыл бұрын
Hi Taufiq, First of all thank you for your service. You are doing a wonderful job for citizen data analyst like me. Hence I am sharing my code for review. SQL flavor - PostgreSQL. with year as ( Select extract(year from billing_creation_date) as year from billing where extract(year from billing_creation_date) is between '2019' and '2021' ) Select b.customer_id as customer_id, b.customer_name as cust_name, avg(coalesce(b.billing_amount,0)) as avg_bill_amt from year y left join billing b on y.year = extract(year from b.billing_creation_date) group by y.year
@sachinvishwakarma5868
@sachinvishwakarma5868 Жыл бұрын
The answer and approach is nice and this is good solution, I admire that. However, had I been interviewer then would have asked now can you write the same for between year 1990 - 2022? Would you write case statement for those 32 years? The approach should have been that the query will work for dynamic input..... PS : I am also trying to think of a dynamic solution.
@yogeshwarbhosikar1597
@yogeshwarbhosikar1597 Жыл бұрын
if u get the dynamic solution pls paste here, it will be helpful
@bragemogstad7124
@bragemogstad7124 Жыл бұрын
For average total: select id, name, sum(amount)/sum(counter) as avg_billed from ( select customer_id as id, customer_name as name, billing_amount as amount, if amount>0 then 1 else 0 end if as counter where billing_creation_date between '1990-01-01' and '2021-12-31' ) A group by id,name; For average per year goup by column year(billing_creation_date) as year.
@matthewthornton4082
@matthewthornton4082 Жыл бұрын
For a dynamic solution you would use a date scaffold
@BrainStroming1789
@BrainStroming1789 Жыл бұрын
very bad solution, outer join on sub select years (can be replace by args values) and groupe by average. More more more simple. Can be also write by using "with as ..." for more clarity
@maxwellotto4202
@maxwellotto4202 Жыл бұрын
Create a sequence of years in a cte based on year start and end. Create another cte that includes year as a column calculated from the original table. Right outer join 1st cte on to 2nd cte on year. Create a subsequent cte that groups on year and other columns of interest, aggregating measures of interest. In this case, avg(billingAmount). Boom, now you can aggregate over any year range including years with no billing.
@akash4517
@akash4517 Жыл бұрын
Hi Toufiq , good problem statement and video. My Soluition for the problemt . %sql WITH CTE AS( select distinct customer_id,customer_name,Year from Billing join( select '2019' as Year UNION select '2020' as Year UNION select '2021' as Year ) Y ) select C.customer_id,C.customer_name, Round(AVG(coalesce(B.billed_amount,0)),2) as avg_billing_amount from CTE C left join billing B ON C.customer_id=B.customer_id AND C.customer_name=B.customer_name AND C.Year=EXTRACT(year from B.billing_creation_date) group by C.customer_id,C.customer_name order by 1,2
@almassheraz9412
@almassheraz9412 Жыл бұрын
The way u explain is simply awesome.. initially I thought it must be very complex but after ur explanation it looks simple…
@georgezirbo4276
@georgezirbo4276 Жыл бұрын
I find the solution you provided to be quite complex, inefficient and a bit too specific. Here's my solution: Table Definition: Billing ( cid int, cname varchar(10), bid varchar(10) primary key, bdate date, bamount float ) Query: SELECT b.cid, b.cname, SUM(b.bamount) / (COUNT(*) + 3 - COUNT(DISTINCT(YEAR(bdate)) AS average FROM billing b WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021 GROUP BY b.cid; Explanation: Instead of having so many cases, we can add together the total no of billings per customer + the no of years that don't have billings (3 - COUNT(DISTINCT(YEAR(bdate)).Moreover, before grouping by customer, we filter using "WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021". Hope it's useful! :))
@yousifabdalla6215
@yousifabdalla6215 Жыл бұрын
Great video! Thank you for the insightful explanation. I applied a Nested CASE WHEN approach to solve this problem, and it produced the same result as shown in the video. Here's the query I used: SELECT customer_id, customer_name, (SUM(CASE WHEN strftime('%Y', billing_creation_date) IN ('2019','2020','2021') THEN billing_amount END) / (CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) END + CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) END + CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) END )) AS avg_billing_amount FROM billing GROUP BY customer_id, customer_name;
@g.sridhar3084
@g.sridhar3084 Жыл бұрын
For the first condition A they have mentioned billing amount should be taken for a year then the total billing amount for 2020 would be 100+150$=250$ and then we have to take the average which would be 0+250+100=350/3=116.66 rather than dividing it by 4. I think this is also a typo error.
@agusta2021
@agusta2021 Жыл бұрын
I think so too, it confusing me at first. What about it @techFTQ?
@mikatu
@mikatu Жыл бұрын
Yes, that is clearly an error. The average is per year therefore the division must be done always by 3 years, not four. Unless the average is per billing event, then we need to ignore the cases with zero.
@timopheim5479
@timopheim5479 Жыл бұрын
@@mikatu Wrong, learn english
@stevebennett7094
@stevebennett7094 Жыл бұрын
Thank You, been looking through the comments to see if anyone else had the same query. So the amounts should be : 1 A 116.67 2 B 200.00 3 C 183.33
@siddarameshwaruh5103
@siddarameshwaruh5103 Жыл бұрын
Sir your the dictionary for my SQL practice, Thanks for the video..!!😀😀
@techTFQ
@techTFQ Жыл бұрын
Glad to be helpful bro
@erdenealtan6482
@erdenealtan6482 Жыл бұрын
very interesting
@arturoramirez712
@arturoramirez712 Жыл бұрын
I create rows not columns. Not as sophisticated as some below but here is my version with a cross join and each step in a subquery. Yeah too long, but fun :) with bill_qry as ( select customer_id, customer_name, year(billing_creation_date) as bill_year, convert( decimal (10,2) , sum(billing_amount) ) as bill_sum , convert( decimal (10,2) , count(billing_id) ) as bill_cnt from billing where year(billing_creation_date) > 2018 -- should be a prompt group by customer_id, customer_name, year(billing_creation_date) ), d_year as ( select min(bill_year) as min_year, max(bill_year) as max_year -- one row from bill_qry ), y_range as ( select n = 0, (min_year) as t_year --year 1 from d_year union all select n+1, (min_year) as t_year --year 1 from y_range, --recursive query d_year -- one row where n < max_year - min_year -- can't be over 100 rows ), year_table as ( select t_year + n as t_year from y_range --use this table to cross join the data table ), bill_qry2 as ( select t_year , customer_id, customer_name ,case when t_year = bill_year then bill_sum else 0 end as bill_sum2 --create rows instead of columns ,case when t_year = bill_year then bill_cnt else 0 end as bill_cnt2 --create rows instead of columns from year_table cross join bill_qry ), bill_qry3 as ( select t_year, customer_id, customer_name, sum(bill_sum2) as bill_sum3, --years with no entry reduced to one row, zero amount, zero instance sum(bill_cnt2) as bill_cnt3 --years with no entry reduced to one row, zero amount, zero instance from bill_qry2 group by t_year, customer_id, customer_name ), bill_qry4 as ( select customer_id, customer_name, bill_sum3 as bill_sum4, case when bill_cnt3 = 0 then 1 else bill_cnt3 end as bill_cnt4 from bill_qry3 ) --final query select customer_id, customer_name, convert ( decimal (10,2), ( sum(bill_sum4) / sum(bill_cnt4) ) ) as avg_pay_final from bill_qry4 group by customer_id, customer_name
@prasaddalvi5230
@prasaddalvi5230 Жыл бұрын
with prasad as ( select *, extract (year from billing_creation_date) as years from billing where billing_creation_date between '2019-01-01' and '2021-12-31' ) select customer_id, sum(billed_amount)/ (case when count(distinct years)=1 then count(years)+2 when count(distinct years)=2 then count(years)+1 else count(years) end) as billed_amount from prasad group by 1;
@subhamagarwal5097
@subhamagarwal5097 Жыл бұрын
I learned something new .the way you explained it cleared my concept thank you so much .
@el-mehdichouki6355
@el-mehdichouki6355 Жыл бұрын
I create rows not columns. 😃 I used SQLLITE drop table billing; create table billing ( customer_id int , customer_name varchar(1) , billing_id varchar(5) , billing_creation_date DATE , billed_amount int ); insert into billing values (1, 'A', 'id1', DATE('2020-10-10'), 100); insert into billing values (1, 'A', 'id2', DATE('2020-11-11'), 150); insert into billing values (1, 'A', 'id3', DATE('2021-11-12'), 100); insert into billing values (2, 'B', 'id4', DATE('2019-11-10'), 150); insert into billing values (2, 'B', 'id5', DATE('2020-11-11'), 200); insert into billing values (2, 'B', 'id6', DATE('2021-11-12'), 250); insert into billing values (3, 'C', 'id7', DATE('2018-01-01'), 100); insert into billing values (3, 'C', 'id8', DATE('2019-01-05'), 250); insert into billing values (3, 'C', 'id9', DATE('2021-01-06'), 300); select * from billing; -- Soltion ❤ with billingTable as ( select customer_id, customer_name, strftime('%Y',billing_creation_date) as year, billed_amount from billing where year in ('2019', '2020', '2021') ), missingYears as ( select customer_id, customer_name, 3 - count(distinct year) as missingYears from billingTable group by customer_id, customer_name ), sumBilling as ( select customer_id, customer_name, sum(billed_amount) as amount_sum, count(year) as countYears from billingTable group by customer_id, customer_name ), result as ( select missingYears.customer_id, missingYears.customer_name, sumBilling.amount_sum, (sumBilling.countYears + missingYears.missingYears) as countYears from missingYears inner join sumBilling on missingYears.customer_id = sumBilling.customer_id and missingYears.customer_name = sumBilling.customer_name ) select customer_id, customer_name, ROUND(amount_sum*1.0 / countYears, 2)||'$' as avg_billed_amount from result
@bubs4552
@bubs4552 Жыл бұрын
Thank you for this! More concept understanding of every step you are taking and thinking
@abhishekgupta43380
@abhishekgupta43380 Жыл бұрын
We can create a cte with their customer name and number of time it comes in between 2019 to 2021 and then we can find the sum of the values for the years between 2019 and 2021 and then divide it by the count values which we get from cte and group by the customer names
@ברונוספורטס
@ברונוספורטס Жыл бұрын
I would resolve it like that: with all_year as (select 2019 as year from dual union select 2020 as year from dual union select 2021 as year from dual), all_ids_sal as( select distinct id, b.year, case when b.year in (select year from billing c where a.id=c.id) then (select salary from billing d where d.id=a.id and d.year=b.year) Else 0 end as salary from billing a, all_year b) select id, round(avg(salary)) from all_ids_sal group by id
@zouhairqantar4593
@zouhairqantar4593 Жыл бұрын
Following is my dynamic solution to the problem: With c_tot as { Select customer_id ,customer_name ,SUM(billing_amount) over( partition by customer_id) as sm ,COUNT(billing_creation_date) over (partition by customer_id) as nbr ,COUNT(DISTINCT YEAR(billing_creation_date)) over ( partition by customer_id) as discnbr From Where YEAR(billing_creation_date) ≥2019 and YEAR(billing_creation_date) ≤2021 } Select customer_id ,customer_name. sm/(nbr+(3-discnbr)) as average_billing From c_tot P.s: (end_year - start_year)+1 =3
@ajaykumargaudo6685
@ajaykumargaudo6685 Жыл бұрын
Thank you thoufiq for the question.
@techTFQ
@techTFQ Жыл бұрын
Your welcome bro
@anyany5946
@anyany5946 Жыл бұрын
I think Below Query Is more Dynamic with smaller output and less character But I was Learn From Your video. declare @f_y int = 2019,@l_y int = 2021 ;with ctc as( select @f_y as y union all select y+1 as y from ctc where y
@jaguara01
@jaguara01 11 ай бұрын
this works for me: SELECT customer_id, customer_name, sum(billing_amount) AS total_billing_amt, count(billing_id) as billing_cnt, 3-count(distinct year) as missing_billing_cnt, sum(billing_amount)/(count(billing_id)+3-count(distinct year)) as avg FROM (select * from df where year >= 2019 and year
@allanfernandes245
@allanfernandes245 Жыл бұрын
Great Vid !!!! I really enjoy your way of explaining complex things so easily !!! Keep going !!
@techTFQ
@techTFQ Жыл бұрын
Thank you Allan ☺️
@adarshagarwal9352
@adarshagarwal9352 Жыл бұрын
How about this? select customer_id, customer_name, concat(round((sum(billing_amount)/(count(distinct billing_id) - count(distinct year(billing_creation_date)) + 3)),2),"$") as avg_bill_amt from data where bill_date between '2019-01-01' and '2021-12-31' group by 1,2;
@saschagotz331
@saschagotz331 Жыл бұрын
Honestly, this is a terrible solution. This is my approach: Also 20 lines of sql code (excluding generation of data in the first CTE), but easily expandalbe to infintive number of years. --mssql syntax with data as ( select 1 as customer_id,'A' as customer_name,'id1' as billing_id,convert(date,'2020-10-10') as billing_creation_date,100 as billed_amount union all select 1 ,'A','id2',convert(date,'2020-11-11'),150 union all select 1 ,'A','id3',convert(date,'2021-11-12'),100 union all select 2 ,'B','id4',convert(date,'2019-11-10'),150 union all select 2 ,'B','id5',convert(date,'2020-11-11'),200 union all select 2 ,'B','id6',convert(date,'2021-11-12'),250 union all select 3 ,'C','id7',convert(date,'2018-01-01'),100 union all select 3 ,'C','id8',convert(date,'2019-01-05'),250 union all select 3 ,'C','id9',convert(date,'2021-01-06') ,300 ), allyears as (select distinct year(billing_creation_date) as [year] from data where year(billing_creation_date) between 2019 and 2021), allcustomers as (select distinct customer_ID,customer_name FROM data), allyearsandcustomers as ( select c.customer_id ,c.customer_name ,y.year from allcustomers c full outer join allyears y on 1=1 ), allyearsandcustomerswithdata as ( select allyearsandcustomers.customer_id ,allyearsandcustomers.customer_name ,convert(decimal(11,2),avg(isnull(d.billed_amount,0))) as billed_amount from allyearsandcustomers left outer join data d on d.customer_id = allyearsandcustomers.customer_id and year(d.billing_creation_date) = allyearsandcustomers.year group by allyearsandcustomers.customer_id ,allyearsandcustomers.customer_name ) select * from allyearsandcustomerswithdata
@viktorponomarev4048
@viktorponomarev4048 Жыл бұрын
Hi TFQ, thanks for sharing this problem and all your efforts!
@debasishpadhi1837
@debasishpadhi1837 Жыл бұрын
PLEASE PROVIDE THE TABLE CREATION AND DATA INSERTAION QUERIES WHEN U DO SUCH KIND OF VIDEOS.THAT WILL HELP SAVE TIME
@SANDATA764
@SANDATA764 Жыл бұрын
Big thanks bhai, you are doing amazing job
@techTFQ
@techTFQ Жыл бұрын
thanks for the constant support Ahmed ❤
@bhaskar9781
@bhaskar9781 Жыл бұрын
Great one , enjoyed every bit of it , thanks for sharing this 🙂
@techTFQ
@techTFQ Жыл бұрын
glad to hear that
@howto86421
@howto86421 Жыл бұрын
Easy .. SELECT customer_name, SUM(billing_amount) / (COUNT(*) - COUNT(DISTINCT YEAR(billing_creation_date)) + (YEAR('2021') - YEAR('2019') + 1)) AS AVG_Billed_amount FROM table WHERE YEAR(billing_creation_date) BETWEEN '2019' AND '2021' GROUP BY customer_name; ``` In this modified query, `(YEAR('2021') - YEAR('2019') + 1)` calculates the difference in years between 2019 and 2021 (inclusive), which is then added to the denominator of your formula. This way, you're accounting for the dynamic year range without hardcoding the value "3".
@ramshataqdees5570
@ramshataqdees5570 8 ай бұрын
Please solve more interview questions like this
@akshayb451
@akshayb451 Жыл бұрын
my approach: with c as (select * from (select *,row_number() over(partition by y,customer_id order by customer_id) rn from (select *, year(billing_creation_date) y from billing) t) t1 where rn=1 and y >= 2019 and y
@keepup2106
@keepup2106 Жыл бұрын
"SELECT customer_id, customer_name, AVG(ISNULL(avg_billing_amount,0)) FROM billing WHERE YEAR(billing_creation_date) BETWEEN 2019 AND 2021 GROUP BY customer_id, customer_name" Whats wrong with this??
@leoprabhakar4166
@leoprabhakar4166 Жыл бұрын
Wonderful explanation Toufiq 👌👌
@df170
@df170 Жыл бұрын
A lot of code. You can do select customer_name, customer_id, sum(billed_amount)/3 where year(billing_creation_date) between 2019 and 2021 group by customer_name, customer_id
@joelarackal2729
@joelarackal2729 Жыл бұрын
Exactly what I thought a lot of code for a simple solution. I think this is where understanding basic principles of math really help simplify code.
@edimathomas-cr4km
@edimathomas-cr4km Жыл бұрын
This is a smart and simpler way to do this. Everyone else is just out here writing unnecessarily compounded and lengthy codes.
@Biplob-ff8mp
@Biplob-ff8mp Жыл бұрын
Wouldn't work
@AnxiousMechtron
@AnxiousMechtron Жыл бұрын
We can also use avg() if using groupby
@attilapekkaszabo-iv6tl
@attilapekkaszabo-iv6tl Жыл бұрын
This won't work. If there are multiple transactions in a year. Avg() won't work as years with no data should be treated as 0 with a count of 1. This code is good as it is.
@brunob.7792
@brunob.7792 Жыл бұрын
Journalist here. I would do that with the Help of knime software, using joiner node and math formula node, also, a little bit or rule engine node.
@raushankumar-rq5yk
@raushankumar-rq5yk Жыл бұрын
Thanks for explanation ☺
@techTFQ
@techTFQ Жыл бұрын
your welcome
@venkateshbabu9391
@venkateshbabu9391 Жыл бұрын
This is a generic one... you can pass the year range as parameters. with y as ( select yr from generate_series(2019, 2021, 1) yr ), c as ( select distinct(cust_id) from bill ) select x.cust_id, sum(coalesce(y.sum_amt, 0)), avg(coalesce(y.sum_amt, 0)) from ( select yr, cust_id from c cross join y ) x left join ( select cust_id, date_part('year', bill_date) as billyr, sum(bill_amt) as sum_amt from bill group by cust_id, date_part('year', bill_date) ) y on x.cust_id = y.cust_id and x.yr = y.billyr group by x.cust_id order by x.cust_id;
@abhishekgowda1776
@abhishekgowda1776 Жыл бұрын
Hi bro, I learnt a lot after watching your videos, please make more videos
@ameybadami014
@ameybadami014 Жыл бұрын
Interesting: classic case of overfitting with the test sample. The averages should be calculated by 3(considering 3 years), unless this is a made up scenario where they are looking for per year average and the denominator is the count of bill amounts
@sansha3881
@sansha3881 Жыл бұрын
You nailed it.
@flyeagle320
@flyeagle320 Жыл бұрын
Wonderful thoufique . It was really helpful
@sripree
@sripree Жыл бұрын
Nice video. Very clear explanation. Keep it up.
@udaysaiathyakula1543
@udaysaiathyakula1543 Жыл бұрын
Super brooo...u rocked it maaaaaannnnn really u are SQL god
@muhammadtanveerislam5998
@muhammadtanveerislam5998 Жыл бұрын
🙂Very Nice Explanation.
@techTFQ
@techTFQ Жыл бұрын
Thank you 🙏🏼
@baloney_sandwich
@baloney_sandwich Жыл бұрын
Great practice
@amitahlawat7378
@amitahlawat7378 Жыл бұрын
easier solution- select customername,customerid, case when max(dr) = 3 then round(sum(billing_amount)/count(customerid),2) when max(dr) = 2 then round(sum(billing_Amount)/(count(customerid)+1),2) when max(dr) = 1 then round(sum(billing_Amount)/(count(customerid)+2),2) end as dlsf from(select *,dense_rank() over(partition by customerid,customername order by billing_cd) as dr from averagee where year(billing_cd) in('2019','2020','2021')) group by customerid,customername
@felipesignorellireis7839
@felipesignorellireis7839 Жыл бұрын
I tried with chat gpt, but failed to bring the correct value
@sravankumar1767
@sravankumar1767 Жыл бұрын
Superb explanation 👌 👏 👍
@Howto-ty4ru
@Howto-ty4ru Жыл бұрын
My Query with bill_summary as (select customer_id, customer_name, sum(billed_amount) as total_bill, count(customer_name) as bill_count, count(distinct(extract(year from billing_creation_date))) as distinct_years from billing where extract(year from billing_creation_date)>='2019' group by customer_id, customer_name) select customer_id, customer_name, round(total_bill/(bill_count+3-distinct_years),2) as average from bill_summary
@pssvkrn
@pssvkrn Жыл бұрын
Excellent explanation bro...
@michael_mancuso
@michael_mancuso Жыл бұрын
Hi Thoufiq - Great explanation. I was thinking, what if the problem statement and data set was expanded to 10 or 20 years and how would that impact the solution.
@techTFQ
@techTFQ Жыл бұрын
I feel the solution would work fine without much performance issue but still need to test it out with large data to be sure..
@vector4100
@vector4100 Жыл бұрын
I think this solution works nicely for a small range like the one he shown, but if we were to expand it, I'd rather use nested selects in order for it to be scalable
@SergeyKudrenko
@SergeyKudrenko Жыл бұрын
Hi, thanks for video. I believe your solution does not scale well. Imagine there are more than 3 years to take. In this particular case it's better to generate a "table" with all combinations of customers and billing years and join it with the actual data. Here is an example for postgresql: with users as (select distinct customer_id, customer_name from billing), period as (select generate_series(2019,2021) p_year) select u.customer_id, u.customer_name, avg(coalesce(b.billing_amount,0)) avg_billing_amount from users u cross join period p left join billing b on b.customer_id = u.customer_id and p.p_year = date_part('year', b.billing_creation_date)::int group by u.customer_id, u.customer_name
@StopWhining491
@StopWhining491 Жыл бұрын
Imagine a customer wasn't actually a customer during the defined billing period. A 0 wouldn't indicate no billing for that year because no billing data would be possible. A customer that had a large outlier billing amount for only one out of the 4 billing periods would yield a misleading result. For business analysis, I'd question the purpose of the results of the query.
@honey0011123
@honey0011123 Жыл бұрын
Hi Thoufiq! Great way to explain. Thank you. Please let me know if you can help with solving any particular interview query that is related to Day 1 retention rate for users. Really appreciate it.
@mathianandhan
@mathianandhan Жыл бұрын
I have used some other approach , with table1 as( select customer_id,customer_name,cast(billed_amount as decimal(6,2)) as billed_amount, year(billing_creation_date) as formatted_year from billing where year(billing_creation_date) > 2018 ), table2 as( select customer_name,case when count(distinct formatted_year) < 3 then cast(count(formatted_year) + 1 as decimal (3,2)) else cast(count( formatted_year) as decimal(3,2)) end as total_count from table1 t1 group by customer_name) select t1.customer_id,t1.customer_name,cast(sum(billed_amount)/max(total_count) as decimal(5,2)) as mm from table1 t1 left join table2 t2 on t1.customer_name = t2.customer_name group by t1.customer_id,t1.customer_name
@octotube463
@octotube463 Жыл бұрын
Dont think this works, what if we got a row for just one year (2020)? which is less than 3 and your code will add +1 but you actually need to add +2. I also think your total count calculation is not correct. Good luck running your query.
@matthewthornton4082
@matthewthornton4082 Жыл бұрын
Would a better, more dynamic solution be a date scaffold? Reduce the need for lots of case statements
@raddastronaut
@raddastronaut Жыл бұрын
Great work. Lots of fun. 👍🏽
@bragemogstad7124
@bragemogstad7124 Жыл бұрын
select id, name, sum(amount)/sum(counter) as avg_billed from ( select customer_id as id, customer_name as name, billing_amount as amount, if amount>0 then 1 else 0 end if as counter where billing_creation_date between '2019-01-01' and '2021-12-31' ) A group by id,name;
@tejeshgoteti
@tejeshgoteti 3 ай бұрын
Making the query complex without any need. We can use the below query for the same data retrieve SELECT customer_id, customer_name, COALESCE(AVG(billing_amount), 0) AS avg_billing_amount FROM Billing WHERE YEAR(billing_creation_date) BETWEEN 2019 AND 2020 GROUP BY customer_id, customer_name;
@anudeepreddy5559
@anudeepreddy5559 8 ай бұрын
Good Explanation ❤
@ride2cafe
@ride2cafe Жыл бұрын
More videos please🙂
@techTFQ
@techTFQ Жыл бұрын
noted, will try
@kurrwa
@kurrwa Жыл бұрын
interesting, i still have a lot to learn about SQL
@sakeenasammi1768
@sakeenasammi1768 Жыл бұрын
Keep it up 😃 I just love the way you explain minute things about sql ....thank you for not making me think that sql is complex 😇
@techTFQ
@techTFQ Жыл бұрын
Thank you 🙏🏼 glad you liked it
@LomeshSoni
@LomeshSoni Жыл бұрын
Hi . I have done this in a different approach . PLz give it a look. with cte1 as (select customer_id, customer_name, case when 2*count - dis_year < 3 then 3 else 2*count - dis_year end as to_divide,total from ( select customer_name, customer_id, count( Distinct year) as dis_year, count(year) as count,sum(billed_amount) as total from ( select *, extract(year from billing_creation_date) as year from billing) as p where year >= 2019 and year
@omarz2145
@omarz2145 Жыл бұрын
with total as (select customer_id, customer_name, date_trunc(billing_creation_date,year) as year, sum(billing_amount) as sum_billing from billing where year between '01-01-2019' and '01-01-2021' group by customer_id) ,final as (select customer_id,customer_name,sum_billing/count(distinct year) as avg_billing_amount) from total ) select * from final Wrote this solution down in notepad but didn't get to test it.
@user-yd4kz8nl5p
@user-yd4kz8nl5p Жыл бұрын
::decimal (only this part) isn't working in microsoft sql server..whats the other way of changing this avg_bill_amount into decimal?
@venupingali651
@venupingali651 Жыл бұрын
Great explanation sir....👍 I have doubt Can't we use aggregate function to sum_2019,20,21 to print average of them
@df170
@df170 Жыл бұрын
Definitely. This does not require any if statements
@unpluggedsaurav3186
@unpluggedsaurav3186 Жыл бұрын
Select customer_id, avg(Billing_amount) from table where date between 2019 and 2020 PS: PSUEDO Code
@sammoh5120
@sammoh5120 Жыл бұрын
Thanks
@techTFQ
@techTFQ Жыл бұрын
Your welcome 🙏🏼
@aaakel
@aaakel Жыл бұрын
Fun challenge! My take on it: WITH cte AS ( SELECT DISTINCT YEAR(b1.billing_creation_date) AS 'year' , b2.customer_id , b2.customer_name , COALESCE(b3.billed_amount,0) AS billing_amount FROM billing b1 LEFT JOIN billing b2 ON 1=1 LEFT JOIN billing b3 ON b2.customer_id = b3.customer_id AND YEAR(b1.billing_creation_date) = YEAR(b3.billing_creation_date) WHERE YEAR(b1.billing_creation_date) IN (2019,2020,2021) ) SELECT customer_id , customer_name , AVG(billing_amount) AS avg_billing_amount FROM cte GROUP BY 1,2 ORDER BY 1,2
@martineavila5708
@martineavila5708 Жыл бұрын
amazing.
@vineetpaulson9336
@vineetpaulson9336 Жыл бұрын
this will return the same result i believe.. pls let me know in case of any corrections: select customer_id, customer_name, coalesce(AVG(billing_amount),0) as avg_bill from billing where billing_creation_date between 01-01-2019 and 31-12-2021 group by 1,2 order by 1
@pathansharukh3411
@pathansharukh3411 Жыл бұрын
Can we do this without case statement by just using.."between date1 and date2" along with where condition, group by ??
@StopWhining491
@StopWhining491 Жыл бұрын
If this were a business problem for a real company, it seems like this solution leaves out some assumptions: What if customers weren't in the database for the entire period? What if there were outlier billing amounts? Could SQL IF be used to tighten up some of this variability?
@alessiodaini7907
@alessiodaini7907 Жыл бұрын
wow an easy question
@karthikbala8474
@karthikbala8474 Жыл бұрын
Sir, kindly upload Date and Time functions in SQL which are using organization
@AnalyticsWithVipul
@AnalyticsWithVipul Жыл бұрын
Hi Karthik, you can check a detailed video on SQL functions on my KZbin channel. here is the link: kzbin.info/www/bejne/gmGXaX-DbZ2cj5Ysi=5XYsHnEycSECwfoy
@ramshataqdees5570
@ramshataqdees5570 8 ай бұрын
How are we getting count as 3 for all the cells
@VITORB82
@VITORB82 Жыл бұрын
I like SQL a lot.
@Mahesh18225
@Mahesh18225 Жыл бұрын
Which one u r using for execution
@Sharukali5582
@Sharukali5582 Жыл бұрын
with recursive cte as ( select distinct customer_name ,2019 as year,0 as billed_amount from billing union select customer_name,year+1,billed_amount from cte where year < (select max(year(billing_creation_date)) from billing)) select customer_name,round(avg(billed_amount),2) as billed_amount from ( select a.customer_name,year,coalesce(b.billed_amount,a.billed_amount) as billed_amount from cte a left join billing b on a.customer_name=b.customer_name and a.year=year(b.billing_creation_date) ) a group by customer_name;
@bleepmaster23
@bleepmaster23 Жыл бұрын
Why isn’t it Select customer_id, customer_name, avg(billing_amount) as name_avg_biling_amount From table_name Group by customer_name Where billing_creation_date between 01-01-2019 and 12-31-2021
@Agrajag22
@Agrajag22 Жыл бұрын
It looks like the example accidentally added one to the count for every account, even if they didn’t have a $0 billing year ($600/4 = $200)
@cseveer
@cseveer Жыл бұрын
Thank you sir. How do we approach it if we have about 50 years worth of data and obviously we don't want to write CASE statement for each year.
@techTFQ
@techTFQ Жыл бұрын
we can solve it without case statement too. there is always multiple ways to solve a sql problem. just need to spend some time to think alternate ways
@cseveer
@cseveer Жыл бұрын
@@techTFQ That's True. Thank you for all your efforts in uploading these videos.
@naash9137
@naash9137 Жыл бұрын
We can use CTE to create a temp table carrying distinct years from the input data and then join it back with the data again to compute the sum n count values without running case for each year !!
@arturoramirez712
@arturoramirez712 Жыл бұрын
@@naash9137 With cross join? That worked for me. But if there are millions of customers not sure if cross join is efficient.
@souhaielmensi
@souhaielmensi Жыл бұрын
hello Thoufiq, assume you're working on a huge data, is it gonna be the perfect solution to provide? second why didn't you use the avg function with a where clause testing on the extracted year from billing_ date which should be between 2019 and 2021 ? I mean this way : with billing_2019 as ( select customer_id,customer_name, extract(year from billing_creation_date) as billing_year, billed_amount as billing_amount from p_work.billing) select customer_id,customer_name,round(avg(billing_amount),2) as billed_amount from billing_2019 where billing_year between 2019 and 2021 group by 1,2 order by 1 asc ;
@gaspermilitello7838
@gaspermilitello7838 Жыл бұрын
Did you try running the query? That doesn't meet the answer/criteria. You need to account for years customers were not billed, which changes the averages. Yes we are trying to find the average billing amount, but you need to include 0 for the years the customer wasn't billed which is broken down in this video.
@dixitchouhan3626
@dixitchouhan3626 Жыл бұрын
Why we weren't change null to 1?,it can be show 1 instead of null value?
@richarajani8438
@richarajani8438 10 ай бұрын
Select customer_id, customer_name, coalesce(avg( CASE WHEN YEAR(billing_creation_date)=2019 THEN BILLING AMOUNT END,0) as avg_2019 coalesce(avg( CASE WHEN YEAR(billing_creation_date)=2019 THEN BILLING AMOUNT END,0) as avg_2019 coalesce(avg( CASE WHEN YEAR(billing_creation_date)=2019 THEN BILLING AMOUNT END,0) as avg_2019 From billing Group by customer_id, customer_name Order by customer_id
@kanwalhemant
@kanwalhemant Жыл бұрын
Check errors: SELECT Customer_id, Customer_name, CAST(AVG (billing_amount) AS DECIMAL(10,2)) AS AvgAmount FROM billing WHERE YEAR(billing_creation_date) IN (2019, 2020, 2021) GROUP BY Customer_id, Customer_name;
@amareshkarnan7462
@amareshkarnan7462 Жыл бұрын
The below is my answer for this using analytics function not a full answer just an idea SELECT DISTINCT b.customer_id, b.customer_name, round(AVG(b.billed_amount) OVER(PARTITION BY b.customer_id),2) AS avg_spending FROM billing b;
@pirikitim
@pirikitim Жыл бұрын
I tried myself before seeing the solution and I took a completely different approach. with CTE as: (SELECT customer_id, customer_name, billing_creation_date and AVG(billing_amount) as avg_amount FROM billings GROUP BY customer_id, customer_name, billing_creation_date) SELECT customer_id, customer_name, coalesce(avg_amount,0) as avg_billing_amount FROM cte WHERE YEAR(billing_creation_date) BETWEEN '2019' and '2021'. It seems a lot simpler and works if the data gets bigger as well. What are your thoughts? Thank you in advance
@jqts6490
@jqts6490 Жыл бұрын
Why not: SELECT customer_id, Customer_name, AVG(billed_amount) as average FROM billing WHERE billing_creation_date BETWEEN ‘2019-01-01’ AND ‘2021-31’ GROUP BY 1,2 ORDER BY 1; ?
@Divine_Serpent_Geh
@Divine_Serpent_Geh Жыл бұрын
Exactly what I thought.
@sidharthmandal9957
@sidharthmandal9957 Жыл бұрын
Look at the ques, the condition is if there is no bill amount generated in any particular year then it is 0. For eg: look for A , there is no record for 2019. We need to think on how we should consider this case.
@mikatu
@mikatu Жыл бұрын
The answer is the average per year, not per billing. You always divide by 3, making the average wrong.
@e4caspnet
@e4caspnet Жыл бұрын
@@mikatu no the question asks for average per bill, not average per year, but it is poorly worded. I missed it the first time.
@01kumarr
@01kumarr Жыл бұрын
How many case statements we would write if we r asked to find average from 2001 to 2022 or even larger no of year....any other solution please ?
@wendylefty
@wendylefty Жыл бұрын
I think an easier solution is to aggregate the sum amount divided by the number of years and use a where clause to limit the billing year.
@swathireddy.cswathi2090
@swathireddy.cswathi2090 Жыл бұрын
I'm not clear about( with cte as) why we use this and what for all the queries we use the same statement. otherwise suggest me some other way
@ameenabdulbarr8342
@ameenabdulbarr8342 Жыл бұрын
I couldn't download the dataset just the SQL script available for download.
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,6 МЛН
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 1,8 МЛН
Поветкин заставил себя уважать!
01:00
МИНУС БАЛЛ
Рет қаралды 6 МЛН
Practice SQL Interview Query | Big 4 Interview Question
14:47
SQL Interview Query for Data Analyst
29:51
techTFQ
Рет қаралды 58 М.
The 25 SQL Questions You MUST Know for Data Analyst Interviews
32:47
KSR Datavizon
Рет қаралды 226 М.
Data Analyst Mock Interview
17:45
Avery Smith | Data Analyst
Рет қаралды 10 М.
Solving a tricky SQL Interview Query
19:24
techTFQ
Рет қаралды 50 М.
Real SQL Interview questions for Data Analysts
12:31
Jay Feng
Рет қаралды 7 М.
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,6 МЛН