Data Engineer Mock Interview | SQL | PySpark | Project & Scenario based Interview Questions

  ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 52,294

Sumit Mittal

Sumit Mittal

ะšาฏะฝ ะฑาฑั€ั‹ะฝ

๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค trendytech.in/... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!
"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."
๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLINR
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLUSD
30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Ankur Bhattacharya, / ankur-bhattacharya-100... shares invaluable insights and practical advice coming from his extensive experience, catering to aspiring data engineers and seasoned professionals alike.
Our talented guest interviewee, Praroop Sacheti, / praroopsacheti has a remarkable approach to answering the interview questions in a very well articulated manner.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - โ€ข SQL tutorial for every...
Python Playlist - โ€ข Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/...
Discussed Questions : Timestamp
1:30 Introduction
3:29 When you are processing the data with databricks pyspark job. What is the sink for your pipeline?
4:58 Are you incorporating fact and dimension tables, or any schema in your project's database design?
5:50 What amount of data are you dealing with in your day to day pipeline?
6:33 What are the different types of triggers in ADF?
7:45 What is incremental load ? How can you implement it through ADF ?
10:03 Difference between Data Lake and Data Warehouse?
11:41 What is columnar storage in a data warehouse ?
13:38 What were some challenges encountered during your project, and how were they resolved? Describe the strategies implemented to optimize your pipeline?
16:18 Optimizations related to Databricks or pyspark ?
20:41 What is broadcast join ? What exactly happens when we broadcast the table ?
23:01 SQL coding question
35:46 PySpark coding question
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

ะŸั–ะบั–ั€ะปะตั€: 49
@PradyutJoshi
@PradyutJoshi 10 ะฐะน ะฑาฑั€ั‹ะฝ
Good initiative. This is quite helpful on how to answer the scenario based questions, with an example. Thank you sir, Ankur and Praroop! ๐Ÿ™Œ
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
So nice of you
@harshitgoel2985
@harshitgoel2985 10 ะฐะน ะฑาฑั€ั‹ะฝ
Please attach the questions list link(in view mode) that are asked in mock interview in description
@HrudanandaMahanta-j9z
@HrudanandaMahanta-j9z ะะน ะฑาฑั€ั‹ะฝ
Thanks, Very helpful , as I'm newly trying to migrate my domain in data fields
@rajnarayanshriwas4653
@rajnarayanshriwas4653 10 ะฐะน ะฑาฑั€ั‹ะฝ
For incremental laod why we go about MERGE or UPSERT. MERGE or UPSERT we use to implement SCD types. For incremental load what we want is to copy newly arrived data in ADLS. For which we keep track of some reference key, through which we can recognize the new data. For example, in an Order fact table lets say it is Order_ID which keeps on increasing whenever we get a new order.
@prashanththokala1820
@prashanththokala1820 16 ะบาฏะฝ ะฑาฑั€ั‹ะฝ
What if some order returned?
@Journey_with_Subham
@Journey_with_Subham 10 ะฐะน ะฑาฑั€ั‹ะฝ
Great Initiative Sumit Sir !
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
thank you. A big thanks to people who are participating in this.
@yifeichen5198
@yifeichen5198 10 ะฐะน ะฑาฑั€ั‹ะฝ
great content! very insightful questions and answers!
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Glad you enjoyed it!
@MrPython100
@MrPython100 4 ะฐะน ะฑาฑั€ั‹ะฝ
The better way to handle the location question scenario would be creating a hash map and use it to fetch complete location. This Hash map can be extended in future too. You can broadcast this hash map to make it more optimised if you are dealing with TB's of data.
@ShubhamYadav-gq6fe
@ShubhamYadav-gq6fe 10 ะฐะน ะฑาฑั€ั‹ะฝ
Please provide the interview feedback in few mins at the end to help more with this.
@WadieGamer
@WadieGamer 10 ะฐะน ะฑาฑั€ั‹ะฝ
Great video for new data engineers like me.
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Glad you enjoyed it
@VaidehiH-v2l
@VaidehiH-v2l 10 ะฐะน ะฑาฑั€ั‹ะฝ
thank you so much sumit sir its really helpful
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Happy to share more such informative videos for the community!
@BooksWala
@BooksWala 10 ะฐะน ะฑาฑั€ั‹ะฝ
Please also some video regarding what kinds of problems data engineer face in their day to days working
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
noted, will bring a video on this soon
@3A3A11
@3A3A11 10 ะฐะน ะฑาฑั€ั‹ะฝ
Sir please make videos on topics like " Someone working in Tech Support from past 5 years and now moving to Data Engineer" What they should write in their resume like in experience section... Whether should give try as fresher or whatever
@ravichakraborty3878
@ravichakraborty3878 10 ะฐะน ะฑาฑั€ั‹ะฝ
Sir, I also have the same question.
@Shivamyogi10
@Shivamyogi10 10 ะฐะน ะฑาฑั€ั‹ะฝ
Yes that is very valuable. As most of the people are working in different roles but being in support roles in data field we are interested to switch into data engg.
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
surely will release a video on this soon
@NabaKrPaul-ik2oy
@NabaKrPaul-ik2oy 10 ะฐะน ะฑาฑั€ั‹ะฝ
Hi Sir, Thanks for this series, very insightful. Just a query, does majority of the interviews goes till coding part or majority cases its theory only? or is it mix and match?
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Yes they do
@harish7548
@harish7548 4 ะฐะน ะฑาฑั€ั‹ะฝ
we need to controll flow with cfg file for incremental dataload not merge or upsert .
@adityanjsg99
@adityanjsg99 ะะน ะฑาฑั€ั‹ะฝ
Ankur is everywhere!!
@gopalgaihre9710
@gopalgaihre9710 10 ะฐะน ะฑาฑั€ั‹ะฝ
Please make videos for freshers as well, because these days no one is looking for freshers for data engineering roles...
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
will make a video for sure
@Raghavendraginka
@Raghavendraginka 10 ะฐะน ะฑาฑั€ั‹ะฝ
sir please make complete video on sql and mock interviews too
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Definitely, will be covered in the upcoming videos
@salonisacheti7350
@salonisacheti7350 10 ะฐะน ะฑาฑั€ั‹ะฝ
Good Work Praroop โค
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
Praroop has rocked it.
@swapnildande4706
@swapnildande4706 10 ะฐะน ะฑาฑั€ั‹ะฝ
Hi Sir ,Request you to please upload more videos on Data engineer mock interview
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
one video daily for next 30 days
@ashwinigadekar2956
@ashwinigadekar2956 10 ะฐะน ะฑาฑั€ั‹ะฝ
Please make interview session for fresher.
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
surely
@shivanisaini2076
@shivanisaini2076 9 ะฐะน ะฑาฑั€ั‹ะฝ
I want to give mock interview.
@saurabhgavande6728
@saurabhgavande6728 10 ะฐะน ะฑาฑั€ั‹ะฝ
can u make a video for aws cloud as of azure
@sumitmittal07
@sumitmittal07 10 ะฐะน ะฑาฑั€ั‹ะฝ
surely
@digantapurkait6231
@digantapurkait6231 10 ะฐะน ะฑาฑั€ั‹ะฝ
wahh
@joerokcz
@joerokcz 10 ะฐะน ะฑาฑั€ั‹ะฝ
๐Ÿ˜…
@data_eng_tuts
@data_eng_tuts 10 ะฐะน ะฑาฑั€ั‹ะฝ
Really?
@karthikeyanr1171
@karthikeyanr1171 10 ะฐะน ะฑาฑั€ั‹ะฝ
Solution for Pyspark Problem def location_f(loc): if loc == 'CHN': return 'CHENNAI' elif loc == 'AP': return 'ANDHRA PRADESH' elif loc == 'HYD': return 'HYDERABAD' else: return loc re_location = F.udf(location_f, StringType()) df1 = df.withColumn('ref_id1', F.split('ref_id','\DIV-|\_')).drop('ref_id') df2 = df1.withColumn('ref_id', F.col('ref_id1')[2]).withColumn('location', re_location(F.col('ref_id1')[1])) df3 = df2.select('name', 'ref_id', 'salary','location') df3.show
@ArunKumar-mr7pc
@ArunKumar-mr7pc 8 ะฐะน ะฑาฑั€ั‹ะฝ
from pyspark.sql.functions import col, lit,when df_employee.withColumn("LOCATION", when(col("REF-ID").like("DIV-CHN%"), "CHN-CHENNAI") .when(col("REF-ID").like("DIV-HYD%"), "HYD-HYDERABAD") .when(col("REF-ID").like("DIV-AP%"), "AP-ANDHRA PRADESH") .when(col("REF-ID").like("DIV-PUNE%"), "PUNE-PUNE")).show()
@RahulSaini-ng6po
@RahulSaini-ng6po 10 ะฐะน ะฑาฑั€ั‹ะฝ
Hi Folks, below is the solution to the PySpark problem written in >>SCALA
@rabbadi6126
@rabbadi6126 2 ะฐะน ะฑาฑั€ั‹ะฝ
Another way of doing is from itertools import chain city_dict = { "CHN": "CHENNAI", "HYD": "HYDERABAD", "AP" : "ANDHRA PRADESH", "PUNE" : "PUNE" } mapping_expr = f.create_map([f.lit(x) for x in chain(*city_dict.items())]) df.withColumn("city_code", f.regexp_replace(f.col("ref-id"), f.lit("DIV-"), f.lit("")))\ .withColumn("city_code", f.split(f.col("city_code"), "_").getItem(0))\ .withColumn("location", mapping_expr[f.col("city_code")])\ .show() PS : this will very helpful to avoid lots of WHEN condetions when you have lots of mapping
@vishaldeshatwad8690
@vishaldeshatwad8690 10 ะฐะน ะฑาฑั€ั‹ะฝ
df_new = df.select(col("name"),col("refid"),col("salary"),split("refid","-")[1].alias("l"),split("l","_")[0].alias("loc")).drop(col("l")) final_result_df = df_new.withColumn("location",when(col("loc")=="CHN","CHENNAI")\ .when(col("loc")=="HYD","HYDERABAD")\ .when(col("loc")=="AP","ANDRA_PRADESH")\ .when(col("loc")=="PUN","PUNE") ).drop("loc")
Cloud Data Engineer Mock Interview | PySpark Coding Interview Questions |Azure Databricks #question
31:45
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 30 ะœ.
Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios
45:21
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 19 ะœ.
ะั€ั‹ัั‚ะฐะฝะฝั‹าฃ ะฐะนา›ะฐัั‹, ะขำ™ัƒั–ั€ะถะฐะฝะฝั‹าฃ ัˆะฐะนา›ะฐัั‹!
25:51
QosLike / าšะพัะ›ะฐะนะบ / ะšะพัั‹ะปะฐะนั‹า›
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 700 ะœ.
So Cute ๐Ÿฅฐ who is better?
00:15
dednahype
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 19 ะœะ›ะ
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 16 ะœะ›ะ
15 SQL Interview Questions TO GET YOU HIRED in 2025 | SQL Interview Questions & Answers |Intellipaat
24:29
Intellipaat
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 95 ะœ.
Azure Data Engineer Mock Interview - Project Special
26:28
Azurelib Academy
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 33 ะœ.
In discussion with Sumant Lohar (Part 1)
30:07
Amish S
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 3
Big Data Engineering Mock Interview | Big Data Pipeline | AWS Cloud Services | Project Architecture
31:41
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 13 ะœ.
Data Engineer Mock Interview | ADF | Medallion Architecture | BRONZE, SILVER & GOLD Layer| ADLS GEN2
41:04
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 19 ะœ.
Azure Data Engineer Mock Interview - ADF Scenario Based Special
25:36
Azurelib Academy
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 24 ะœ.
4 Recently asked Pyspark Coding Questions | Apache Spark Interview
28:39
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 37 ะœ.
How to Crack Data Engineering Interviews
20:41
Ankit Bansal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 28 ะœ.
How to NOT Fail a System Design Interview (By a Data Engineer)
19:32
Jash Radia
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 47 ะœ.
ะั€ั‹ัั‚ะฐะฝะฝั‹าฃ ะฐะนา›ะฐัั‹, ะขำ™ัƒั–ั€ะถะฐะฝะฝั‹าฃ ัˆะฐะนา›ะฐัั‹!
25:51
QosLike / าšะพัะ›ะฐะนะบ / ะšะพัั‹ะปะฐะนั‹า›
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 700 ะœ.