14. Pepsico azure databricks interview question and answer | azure data engineer interview Q & A

  Рет қаралды 3,725

SS UNITECH

SS UNITECH

Күн бұрын

Пікірлер: 25
@goluSingh-su1xs
@goluSingh-su1xs 9 ай бұрын
Superb explanation, love watching
@ssunitech6890
@ssunitech6890 9 ай бұрын
Thank you so much 🙂
@amritasingh1769
@amritasingh1769 9 ай бұрын
Very nice question, your explanation in Excel is awesome 👍
@ssunitech6890
@ssunitech6890 9 ай бұрын
Thanks a lot
@rammohanrao1369
@rammohanrao1369 2 ай бұрын
sinmple solution: groupby(machineid,activity_type).agg(sum(timestamp)) and then lag and difference
@ssunitech6890
@ssunitech6890 Ай бұрын
Thanks 👍
@aranijayachandra0078
@aranijayachandra0078 9 ай бұрын
Great sir❤
@ssunitech6890
@ssunitech6890 9 ай бұрын
Thanks 🙏
@azuredb
@azuredb 9 ай бұрын
thank you sir
@ssunitech6890
@ssunitech6890 9 ай бұрын
Keep learning
@prashanthkammari6402
@prashanthkammari6402 9 ай бұрын
Can you please share code or maintain in public github
@ssunitech6890
@ssunitech6890 9 ай бұрын
I am maintaining code in description of each video, will add code here today.
@durgad4763
@durgad4763 7 ай бұрын
We can use pivot function to convert rows to separate columns right?
@ssunitech6890
@ssunitech6890 7 ай бұрын
Yes we can
@tejaspise4638
@tejaspise4638 5 ай бұрын
df2 = df1.withColumn('timestamp', when(df1.activityid == 'start', col('timestamp') * -1).otherwise(col('timestamp'))) df3 = df2.groupBy('Machine_id', 'processid').agg(sum('timestamp').alias('total_time')) res = df3.groupBy('Machine_id').agg(avg('total_time').alias('avg_processing_time'))
@ssunitech6890
@ssunitech6890 5 ай бұрын
Thanks 👍
@satishgs5355
@satishgs5355 2 ай бұрын
df = df.withColumn("lag",lag(col("Timestamp")).over(Window.orderBy("Machine_id")))\ .withColumn("time_diff",round((col("Timestamp")-col("lag")),2))\ .filter(col("Activity_type") == "end").groupBy(col("Machine_id")).agg(avg("time_diff")) df.show()
@enisertem9738
@enisertem9738 Ай бұрын
df1 = (df1.groupBy(df1.Machine_id,df1.processid) .agg(sum(when(df1.activityid=="start",df1.timestamp * -1).otherwise(df1.timestamp)).alias("process_time"))) df2 = (df1.groupBy(df1.Machine_id) .agg(mean(df1.process_time)).alias("avg_process_time")) df2.show()
@viduAndtwins
@viduAndtwins 9 ай бұрын
from pyspark.sql.window import Window from pyspark.sql.functions import * windowSpec = Window.partitionBy("Machine_id","processid").orderBy(col("processid"),col("activityid").desc()) df.withColumn("lag",col("timestamp")-lag(col("timestamp"),1).over(windowSpec)).groupBy("Machine_id").avg("lag").alias("avg_processingtime").show()
@neeraj.g
@neeraj.g 7 ай бұрын
better approach
@ssunitech6890
@ssunitech6890 7 ай бұрын
👍
Lazy days…
00:24
Anwar Jibawi
Рет қаралды 9 МЛН
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 50 МЛН
10 Big Data Interview Question That I Ask - Part 1
16:15
Data Engineering
Рет қаралды 55 М.
Azure Databricks Tutorial | Data transformations at scale
28:35
Adam Marczak - Azure for Everyone
Рет қаралды 402 М.