No video

109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL

  Рет қаралды 15,124

Raja's Data Engineering

Raja's Data Engineering

Күн бұрын

Azure Databricks Learning: Coding Interview Exercise: Pyspark and Spark SQL
=================================================================================
Coding exercises are very common in most of the Bigdata interviews. It is important to develop coding skills before appearing for Spark/Databricks interviews.
In this video, I have explained a coding scenario to find out start and end date of data buckets. To get more understanding, watch this video
#CodingInterviewQuestion, #ApacheSparkInterview, #SparkCodingExercise, #DatabricksCodingInterview,#SparkWindowFunctions,#SparkDevelopment,#DatabricksDevelopment, #DatabricksPyspark,#PysparkTips, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Databricksforbeginners,#datascientists, #datasciencecommunity,#bigdataengineers,#machinelearningengineers

Пікірлер: 37
@prasadtelu9873
@prasadtelu9873 Жыл бұрын
FIrst time I saw scenario-based and interview-based solutions in KZbin videos. Thanks for your commitment and for sharing the knowledge.
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Thanks Prasad, for your comment! Hope it helps people in bigdata community
@adiityagupta-wu1tz
@adiityagupta-wu1tz 11 ай бұрын
Please continue this series it will be very helpful to crack the interview and thank for starting this series.
@rajasdataengineering7585
@rajasdataengineering7585 11 ай бұрын
Sure, I will create many videos on this series
@Vk-gw6ii
@Vk-gw6ii 4 ай бұрын
Excellent 👌
@rajasdataengineering7585
@rajasdataengineering7585 4 ай бұрын
Thank you!
@harithad1757
@harithad1757 3 ай бұрын
can i get the code copy pasted in description or maybe ink to the notebook
@prabhatgupta6415
@prabhatgupta6415 Жыл бұрын
Thanks Sir..create playlist of coding questions which are frequently asked.
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Sure I will a playlist and add more coding scenarios using Pyspark and SQL
@landchennai8549
@landchennai8549 4 ай бұрын
here is my SQL query for the same. declare @Event_Table table ( Event_date date, Event_status varchar(8)) insert into @Event_Table select getdate()+Value, case when value 3 and value < 7 then 'Lost' else 'Won' end from generate_series(1,10,1) ; with cte as ( select * , row_number() over ( order by Event_date) - row_number() over ( order by Event_status,Event_date) as GroupId from @Event_Table ) select GroupId , min(Event_status) as Event_status , min(Event_date) as Start_date , max(Event_date) as End_Date , count(1) as Consequtive_Events from cte group by GroupId
@rajasdataengineering7585
@rajasdataengineering7585 4 ай бұрын
Thanks for sharing your approach
@landchennai8549
@landchennai8549 4 ай бұрын
keep on post more like this@@rajasdataengineering7585
@rawat2608
@rawat2608 10 ай бұрын
Thanks you sir
@rajasdataengineering7585
@rajasdataengineering7585 10 ай бұрын
You are welcome!
@saurabh011192
@saurabh011192 8 ай бұрын
This solution will work only when the dates are in order wrt events. Tried jumbling them, didnt work.
@prabhatgupta6415
@prabhatgupta6415 Жыл бұрын
One more suggestion plz do put the daatset in description
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Sure, will add the dataset in description
@jinsonfernandez
@jinsonfernandez 19 күн бұрын
Thanks for this video, But I am curious why didnt you directly use min max with group by which would have fetched the same result ``` result = df.withColumn("event_date", F.to_date("event_date")) \ .groupBy("event_status") \ .agg( F.min("event_date").alias("event_start_date"), F.max("event_date").alias("event_end_date") ) \ .orderBy("event_start_date") result.show() ```
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Thanks for sharing your approach. Yes there are various approaches
@boyasuresh5998
@boyasuresh5998 19 күн бұрын
This won't work please check your code
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
It worked
@funworld8659
@funworld8659 12 сағат бұрын
@@rajasdataengineering7585 it did not show the last won start date and end date
@DataEngineering-ni2ot
@DataEngineering-ni2ot 11 ай бұрын
It should be 1 in first row of change event at 08:10 as previous value is not same with first row of event status but why it is coming as 0?
@johnsonrajendran6194
@johnsonrajendran6194 4 ай бұрын
for first row, previous value is null and we cannot compare null value with anything so by default our logic will go to else condition which is 0 in this case
@arrooow9019
@arrooow9019 Жыл бұрын
Hi sir could you please share the notebook and dataset in the description. as it will helpful for our practice. Thanks in advance.
@namratachavan6317
@namratachavan6317 Жыл бұрын
Hi sir could you please share the notebook and the github repository link to access the code
@bonysrivastava7779
@bonysrivastava7779 3 ай бұрын
🙌
@rajasdataengineering7585
@rajasdataengineering7585 3 ай бұрын
🙌
@roshniagrawal4777
@roshniagrawal4777 16 күн бұрын
This solution will not work if you have data like this, may be some tweak will be needed - data = [ ("2020-06-01","Won"), ("2020-06-02","Won"), ("2020-06-03","Won"), ("2020-06-03","Lost"), ("2020-06-04","Lost"), ("2020-06-05","Lost"), ("2020-06-06","Lost"), ("2020-06-07","Won") ]
@Tushar0797
@Tushar0797 Жыл бұрын
how to explain project in an interview data engineering project
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
I will create a mock interview video soon. Hope that will help you
@Tushar0797
@Tushar0797 Жыл бұрын
@@rajasdataengineering7585 thank you so much 👍❣️🙏
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Welcome
@user-cm7ei2dp9t
@user-cm7ei2dp9t Жыл бұрын
Thank you
@starmscloud
@starmscloud Жыл бұрын
I did it in something like this . By suing a default date , a running number and datediff from pyspark.sql.functions import to_date ,row_number,asc,date_add,lit,datediff,min,max from pyspark.sql.window import Window eventDF.withColumn("event_date",to_date(col="event_date" ,format= "dd-MM-yyyy")) \ .withColumn("rank",row_number().over(Window.partitionBy("event_status").orderBy(asc("event_date")))) \ .withColumn("startDate",date_add(lit("1900-01-01"),"rank")) \ .withColumn("datediff",datediff("event_date","startDate")) \ .groupBy("datediff","event_status").agg(min("event_date").alias("start_date"),max("event_date").alias("end_date")) \ .drop("rangeDate") \ .sort("start_date").show()
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Great!
111. Databricks | Pyspark| SQL Coding Interview: Exchange Seats of Students
22:50
Raja's Data Engineering
Рет қаралды 6 М.
ПОМОГЛА НАЗЫВАЕТСЯ😂
00:20
Chapitosiki
Рет қаралды 28 МЛН
ISSEI & yellow girl 💛
00:33
ISSEI / いっせい
Рет қаралды 20 МЛН
115. Databricks | Pyspark| SQL Coding Interview: Number of Calls and Total Duration
16:52
117. Databricks | Pyspark| SQL Coding Interview: Total Grand Slam Titles Winner
19:08
10 recently asked Pyspark Interview Questions | Big Data Interview
28:36
ПОМОГЛА НАЗЫВАЕТСЯ😂
00:20
Chapitosiki
Рет қаралды 28 МЛН