4 Recently asked Pyspark Coding Questions

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

Рет қаралды 35,533

Sumit Mittal

Күн бұрын

Пікірлер: 38

@sopankardile2603 11 ай бұрын

One of the best interview series Thank you sumit sir .

@sumitmittal07 11 ай бұрын

glad to know that you liked it.

@adityatomar9820 11 ай бұрын

One of the great explanation so far on youtube. I wish i could afford your course :(

@souradeep.official 5 ай бұрын

Need more Pyspark Interview Solutions like this 😊

@ritikadamani2008 3 ай бұрын

Best selection of questions and very good explanation.

@abhyaravya421 3 ай бұрын

Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.

@praptijoshi9102 9 ай бұрын

You are doing a great job posting these❤

@veerugandhad3437 11 ай бұрын

Very useful informative video which gives more confidence to the bigdata aspirants. Thanks Sumit.

@MamtaChoudhary-c4i 19 күн бұрын

Thank You sir for the best explanation. Can you please come up with more examples?

@singhjirajeev 9 ай бұрын

00:03 Recently asked Pyspark Coding Questions 02:37 Writing and executing Pyspark pseudo code 05:21 Creating a Spark dataframe from input and performing group by aggregation 08:04 Using aggregation functions and collect list in Pyspark. 11:15 Spark SQL solution for creating DataFrame and running queries. 14:18 Understanding the data frame reader API for reading JSON and the usage of explode function 17:11 Creating a Spark dataframe and performing operations on it. 19:44 Converting string to date and performing group by in Pyspark DataFrame 22:32 Finding the average stock value using PySpark 25:38 Practice more on data frames for interviews 28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding

@gudiatoka 11 ай бұрын

Sir...Share need more .. please continue this playlist

@naveenkumar-oq6zi 15 күн бұрын

Hi Sumit , Well the last question about aggregation and max average of stock , there should be time also with date. Because originally at different times the prices of stock changes. Then we need to convert it into yyyy-MM-dd format to get the day specific stock , get their average and then max of avg. Just thought of sharing. Well overall implementation would still be same :) cheers

@venugopal-nc3nz 11 ай бұрын

It will be great if you put questions in comment . Others can try without looking at solution first

@SusheelGajbinkar 5 ай бұрын

Thank you sir😄

@satishutnal 11 ай бұрын

Best explanation sir thanks

@sumitmittal07 11 ай бұрын

I am happy to hear this

@rohit-ll3rj 9 ай бұрын

We can apply distinct() too I guess for avoiding duplicate values in df.

@sravankumar1767 10 ай бұрын

Superb

@NextGen_Tech_Hindi 11 ай бұрын

thanks sumit make videos like this .

@sumitmittal07 11 ай бұрын

definitely

@2412_Sujoy_Das 11 ай бұрын

Much needed sir.....!!!

@sumitmittal07 11 ай бұрын

Sujoy, I am sure you will enjoy watching this.

@anjibabumakkena 11 ай бұрын

Nice explanation sir, kindly post scenario based questions

@sumitmittal07 11 ай бұрын

yes for sure

@shashankgupta2776 8 ай бұрын

Thank you Sir greatly explained, would be good if you can post data/schemas also in the decription box for us to query and do hands on. Thanks.! :)

@prasoonvijay5775 11 ай бұрын

Hi Sumit, Could you please create Video explaining pipelines on AWS Databricks End-End along with Orchestration of those.

@NextGen_Tech_Hindi 11 ай бұрын

What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on KZbin and when you will upload it on KZbin we are waiting for remaining 10 questions on pyspark Thank you ❤

@mdasif2411 11 ай бұрын

Hi Sir, can we not write in Spark sql in interview? As there is no difference in performance.

@Nikhil-qi4oz 11 ай бұрын

Amazing sir

@sumitmittal07 11 ай бұрын

Nikhil, I am sure you will find it useful.

@TheUMESH34 11 ай бұрын

This is great!

@sumitmittal07 11 ай бұрын

thank you Umesh

@sharankarchella2688 11 ай бұрын

Nice video

@sumitmittal07 11 ай бұрын

thank you

@rudrakasha-t1v 10 ай бұрын

in question number 2 = do we not need to remove duplicate as last can you please clear me on it ?

@VinodKumarChouhan-o8c 11 ай бұрын

Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.

@sonurohini6764 7 ай бұрын

Sir create coding interview playlist

@RAHULKUMAR-px8em Ай бұрын

Q2. Data=[('a','aa',1), ('a','aa',2), ('b','bb',5), ('b','bb',3), ('b','bb',4)] data_schema= "col1 string, col2 string, col3 int" df_data=spark.createDataFrame(data=Data,schema=data_schema) df_data.display() from pyspark.sql.functions import * from pyspark.sql.types import * result = ( df_data.groupBy(col('col1'),col('col2'))\ .agg(collect_set(col('col3'))) ) result.display()