Job, Stage and Task in Apache Spark | PySpark interview questions

  Рет қаралды 2,089

The Big Data Show

The Big Data Show

Күн бұрын

Пікірлер
@Simrankotiya10
@Simrankotiya10 4 ай бұрын
Great explaination
@ChetanSharma-oy4ge
@ChetanSharma-oy4ge 7 ай бұрын
What if count function we used along with some variable and transformation?
@TheBigDataShow
@TheBigDataShow 7 ай бұрын
count is a tricky action. Most Data Engineers actually get confused with this. Ideally, count() is an action and should create a brand new JOB but Apache spark is a very smart computing engine and it uses its source and predicate pushdown and purning, if source stores the value of count() in their meta data then it will directly fetch the value of count() instead of creating a brand new JOB.
@ChetanSharma-oy4ge
@ChetanSharma-oy4ge 7 ай бұрын
@@TheBigDataShow Great, Thanks for answering ...do we have some other examples as well? or the resources from where i can get these concepts?
@siddheshchavan2069
@siddheshchavan2069 7 ай бұрын
Can you make end to end data engineering projects?
@TheBigDataShow
@TheBigDataShow 7 ай бұрын
I have already created one. Please check the channel. There is no prerequisite for this 3-hour long video and project. You just need to know the basics of PySpark. Please check the link. kzbin.info/www/bejne/eJ26hGecpLNsmbssi=qL0ZSXBELEEKe2L2
@siddheshchavan2069
@siddheshchavan2069 7 ай бұрын
@@TheBigDataShow great, thanks!
@debabratabar2008
@debabratabar2008 7 ай бұрын
is below correct ? df_count = example_df.count() ----> transformation example_df.count() ---> job ?
@NiteeshKumarPinjala
@NiteeshKumarPinjala 5 ай бұрын
No, count() it self is an action. In First line itself it will create Job
Repartition vs. Coalesce in Apache Spark | PySpark interview questions
19:22
All about Debugging Spark
18:29
BigData Thoughts
Рет қаралды 4,5 М.
$1 vs $500,000 Plane Ticket!
12:20
MrBeast
Рет қаралды 122 МЛН
How to read from APIs in PySpark codebase...
25:30
The Big Data Show
Рет қаралды 2,7 М.
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:19
Datamarcos_official
Рет қаралды 138
Interview Question on Cache v/s Persist - Part 1
14:05
The Big Data Show
Рет қаралды 1 М.
Salting in Apache Spark - Part I
17:46
The Big Data Show
Рет қаралды 2 М.
Brokers in Apache Kafka | Replication factor & ISR in Kafka
21:22
The Big Data Show
Рет қаралды 517
All about Spark DAGs
14:09
BigData Thoughts
Рет қаралды 16 М.