Top 15 Spark Interview Questions in less than 15 minutes Part-2

  ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 13,654

Sumit Mittal

Sumit Mittal

ะšาฏะฝ ะฑาฑั€ั‹ะฝ

๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค trendytech.in/?src=youtube&su... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!
"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."
๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLINR
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLUSD
Most commonly asked interview questions when you are applying for any data based roles such as data analyst, data engineer, data scientist or data manager.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - โ€ข SQL tutorial for every...
Python Playlist - โ€ข Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/#testimonials
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

ะŸั–ะบั–ั€ะปะตั€: 5
@vaibhavj12
@vaibhavj12 3 ะฐะน ะฑาฑั€ั‹ะฝ
Helpfulโค
@piyushjain5852
@piyushjain5852 ะะน ะฑาฑั€ั‹ะฝ
how number of stages = no of wide transformations + 1 ?
@sugunanindia
@sugunanindia ะะน ะฑาฑั€ั‹ะฝ
In Apache Spark, the number of stages in a job is determined by the wide transformations present in the execution plan. Here's a detailed explanation of why the number of stages is equal to the number of wide transformations plus one: ### Transformations in Spark #### Narrow Transformations Narrow transformations are operations where each input partition contributes to exactly one output partition. Examples include: - `map` - `filter` - `flatMap` These transformations do not require data shuffling and can be executed in a single stage. #### Wide Transformations Wide transformations are operations where each input partition can contribute to multiple output partitions. These transformations require data shuffling across the network. Examples include: - `reduceByKey` - `groupByKey` - `join` Wide transformations result in a stage boundary because data must be redistributed across the cluster. ### Understanding Stages #### Stages A stage in Spark is a set of tasks that can be executed in parallel on different partitions of a dataset without requiring any shuffling of data. A new stage is created each time a wide transformation is encountered because the data needs to be shuffled across the cluster. ### Calculation of Stages Given the nature of transformations, the rule "number of stages = number of wide transformations + 1" can be explained as follows: 1. **Initial Stage**: The first stage begins with the initial set of narrow transformations until the first wide transformation is encountered. 2. **Subsequent Stages**: Each wide transformation requires a shuffle, resulting in the end of the current stage and the beginning of a new stage. Thus, for `n` wide transformations, there are `n + 1` stages: - The initial stage. - One additional stage for each wide transformation. ### Example Consider the following Spark job: ```python from pyspark import SparkContext sc = SparkContext.getOrCreate() # Sample RDD rdd = sc.parallelize([(1, 2), (3, 4), (3, 6)]) # Narrow transformation: map rdd1 = rdd.map(lambda x: (x[0], x[1] * 2)) # Wide transformation: reduceByKey (requires shuffle) rdd2 = rdd1.reduceByKey(lambda x, y: x + y) # Another narrow transformation: filter rdd3 = rdd2.filter(lambda x: x[1] > 4) # Wide transformation: groupByKey (requires shuffle) rdd4 = rdd3.groupByKey() # Action: collect result = rdd4.collect() print(result) ``` **Analysis of Stages**: 1. **Stage 1**: Includes `parallelize`, `map`. This is all narrow transformations. 2. **Stage 2**: Starts with `reduceByKey` (a wide transformation) which triggers a shuffle. 3. **Stage 3**: Includes `filter`, which is a narrow transformation. 4. **Stage 4**: Starts with `groupByKey` (another wide transformation) which triggers another shuffle. So, there are two wide transformations (`reduceByKey` and `groupByKey`) and three stages (`number of wide transformations + 1`). ### Conclusion The number of stages in a Spark job is driven by the need to shuffle data between transformations. Each wide transformation introduces a new stage due to the shuffle it triggers, resulting in the formula: `number of stages = number of wide transformations + 1`. This understanding is crucial for optimizing and debugging Spark applications.
@epicdigger4110
@epicdigger4110 15 ะบาฏะฝ ะฑาฑั€ั‹ะฝ
bhai ne bola bapu dikhta toh bapu dikhta
@SabyasachiManna-sw2cg
@SabyasachiManna-sw2cg 9 ะบาฏะฝ ะฑาฑั€ั‹ะฝ
To answer your questions, Let's assume you have one wide transform function reduceByKey() , it will create two stages stage0 and stage1 where shuffling is involved. I hope it helps you.
15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview
12:44
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 11 ะœ.
10 recently asked Pyspark Interview Questions | Big Data Interview
28:36
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 28 ะœ.
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 85 ะœะ›ะ
Who has won ?? ๐Ÿ˜€ #shortvideo #lizzyisaeva
00:24
Lizzy Isaeva
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 65 ะœะ›ะ
Survival Skills: Amazing Basket for Extreme Conditions. #survival #camping #bushcraft #lifehacks
00:26
Sergio Outdoors
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 88 ะœะ›ะ
Spark Interview Question | How many CPU Cores | How many executors | How much executor memory
5:58
Learning Journal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 21 ะœ.
Cloud Data Engineer Mock Interview | PySpark Coding Interview Questions |Azure Databricks #question
31:45
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 16 ะœ.
Question 10: PWC Interview Questions | data engineers | #pyspark #bigdata #pwc #interview
11:34
pysparkpulse
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 3,5 ะœ.
Data Engineering Mock Interview | Spark Optimization Interview Questions | Best Coding Practices
43:49
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 9 ะœ.
Meet our TECH team at bluelearn! How to work at a startup?
9:07
Curious Harish
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 271 ะœ.
Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios
45:21
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 11 ะœ.
10 PySpark Product Based Interview Questions
39:46
The Data Tech
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 15 ะœ.
How Leaving Bangalore Helped This Couple Get Rich?
14:50
Wint Wealth
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 339 ะœ.
PySpark Interview Questions & Answers | PySpark Interview Questions
9:59
learn by doing it
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 7 ะœ.
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 277 ะœ.
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 85 ะœะ›ะ