One of the best interview series Thank you sumit sir .
@sumitmittal0711 ай бұрын
glad to know that you liked it.
@adityatomar982011 ай бұрын
One of the great explanation so far on youtube. I wish i could afford your course :(
@souradeep.official5 ай бұрын
Need more Pyspark Interview Solutions like this 😊
@ritikadamani20083 ай бұрын
Best selection of questions and very good explanation.
@abhyaravya4213 ай бұрын
Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.
@praptijoshi91029 ай бұрын
You are doing a great job posting these❤
@veerugandhad343711 ай бұрын
Very useful informative video which gives more confidence to the bigdata aspirants. Thanks Sumit.
@MamtaChoudhary-c4i19 күн бұрын
Thank You sir for the best explanation. Can you please come up with more examples?
@singhjirajeev9 ай бұрын
00:03 Recently asked Pyspark Coding Questions 02:37 Writing and executing Pyspark pseudo code 05:21 Creating a Spark dataframe from input and performing group by aggregation 08:04 Using aggregation functions and collect list in Pyspark. 11:15 Spark SQL solution for creating DataFrame and running queries. 14:18 Understanding the data frame reader API for reading JSON and the usage of explode function 17:11 Creating a Spark dataframe and performing operations on it. 19:44 Converting string to date and performing group by in Pyspark DataFrame 22:32 Finding the average stock value using PySpark 25:38 Practice more on data frames for interviews 28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding
@gudiatoka11 ай бұрын
Sir...Share need more .. please continue this playlist
@naveenkumar-oq6zi15 күн бұрын
Hi Sumit , Well the last question about aggregation and max average of stock , there should be time also with date. Because originally at different times the prices of stock changes. Then we need to convert it into yyyy-MM-dd format to get the day specific stock , get their average and then max of avg. Just thought of sharing. Well overall implementation would still be same :) cheers
@venugopal-nc3nz11 ай бұрын
It will be great if you put questions in comment . Others can try without looking at solution first
@SusheelGajbinkar5 ай бұрын
Thank you sir😄
@satishutnal11 ай бұрын
Best explanation sir thanks
@sumitmittal0711 ай бұрын
I am happy to hear this
@rohit-ll3rj9 ай бұрын
We can apply distinct() too I guess for avoiding duplicate values in df.
@sravankumar176710 ай бұрын
Superb
@NextGen_Tech_Hindi11 ай бұрын
thanks sumit make videos like this .
@sumitmittal0711 ай бұрын
definitely
@2412_Sujoy_Das11 ай бұрын
Much needed sir.....!!!
@sumitmittal0711 ай бұрын
Sujoy, I am sure you will enjoy watching this.
@anjibabumakkena11 ай бұрын
Nice explanation sir, kindly post scenario based questions
@sumitmittal0711 ай бұрын
yes for sure
@shashankgupta27768 ай бұрын
Thank you Sir greatly explained, would be good if you can post data/schemas also in the decription box for us to query and do hands on. Thanks.! :)
@prasoonvijay577511 ай бұрын
Hi Sumit, Could you please create Video explaining pipelines on AWS Databricks End-End along with Orchestration of those.
@NextGen_Tech_Hindi11 ай бұрын
What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on KZbin and when you will upload it on KZbin we are waiting for remaining 10 questions on pyspark Thank you ❤
@mdasif241111 ай бұрын
Hi Sir, can we not write in Spark sql in interview? As there is no difference in performance.
@Nikhil-qi4oz11 ай бұрын
Amazing sir
@sumitmittal0711 ай бұрын
Nikhil, I am sure you will find it useful.
@TheUMESH3411 ай бұрын
This is great!
@sumitmittal0711 ай бұрын
thank you Umesh
@sharankarchella268811 ай бұрын
Nice video
@sumitmittal0711 ай бұрын
thank you
@rudrakasha-t1v10 ай бұрын
in question number 2 = do we not need to remove duplicate as last can you please clear me on it ?
@VinodKumarChouhan-o8c11 ай бұрын
Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.
@sonurohini67647 ай бұрын
Sir create coding interview playlist
@RAHULKUMAR-px8emАй бұрын
Q2. Data=[('a','aa',1), ('a','aa',2), ('b','bb',5), ('b','bb',3), ('b','bb',4)] data_schema= "col1 string, col2 string, col3 int" df_data=spark.createDataFrame(data=Data,schema=data_schema) df_data.display() from pyspark.sql.functions import * from pyspark.sql.types import * result = ( df_data.groupBy(col('col1'),col('col2'))\ .agg(collect_set(col('col3'))) ) result.display()