Big Data Engineering Mock Interview | Big Data Pipeline | AWS Cloud Services | Project Architecture

  ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 9,350

Sumit Mittal

Sumit Mittal

ะšาฏะฝ ะฑาฑั€ั‹ะฝ

๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค trendytech.in/?src=youtube&su... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!
"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."
๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLINR
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLUSD
30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Satinder, / satinder-singh-699aab2b shares invaluable insights and practical advice coming from her extensive experience.
Our talented guest interviewee Aditya Patil, / ap-patil has an impressive approach to answering the interview questions in a very well articulated manner.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - โ€ข SQL tutorial for every...
Python Playlist - โ€ข Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/#testimonials
Discussed Questions : Timestamp
2:34 Brief overview of projects.
3:19 Describe your data pipeline flow and architecture.
5:10 What transformations do you use, and in which format do you write data to Redshift?
6:44 How do you handle null values?
9:03 Which file format do you use for end-user data?
9:50 Why is Parquet preferred over ORC?
11:10 What are the join types in Hive?
12:07 Which types of joins are used to avoid shuffling in Hive and PySpark? Do you know the specific term?
12:53 Explain how broadcast join avoids shuffling.
14:07 Which property controls broadcast join in Spark?
14:40 How do you start a Spark application in PySpark?
16:09 What does the builder do in Spark session creation?
17:43 What are the partitioning types in Hive?
18:36 Difference between managed and external tables in Hive.
19:16 Have you performed Spark performance tuning?
19:36 Difference between repartition and coalesce in Spark?
20:25 Have you used NoSQL databases?
21:02 SQL coding question
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

ะŸั–ะบั–ั€ะปะตั€: 53
@imranhossain1660
@imranhossain1660 4 ะฐะน ะฑาฑั€ั‹ะฝ
parquet is a columnar based storage format, so it is a very good file format in terms of retrieving the data through the query. It definitely reduces the usage of i/o read and network bandwidth. Besides that it has built in support for compression in the form of snappy format. So it reduces the space usgae. Another one I can think of is, parquet files comes up a structure with 3 components, they are header, body and footer. Heder actually the name of the file(part001,part002). Body is actual data content which it is storing and footer is basically for the metadata. This metadata includes the minimum and maximum values of the columns. So whenever we try to query the data which is stored in parquet format this metadata helps us for the data skipping which in turn fast our query execution. Hope it helps.
@pallavigosavi6851
@pallavigosavi6851 3 ะฐะน ะฑาฑั€ั‹ะฝ
Thank you!! ๐Ÿ‘
@mojibshaikh4092
@mojibshaikh4092 3 ะฐะน ะฑาฑั€ั‹ะฝ
Informative and Excellent interview.
@sauravroy9889
@sauravroy9889 3 ะฐะน ะฑาฑั€ั‹ะฝ
Really nice interview sir.โค
@sruthiselvakumar9817
@sruthiselvakumar9817 4 ะฐะน ะฑาฑั€ั‹ะฝ
This interview is really great as Satinder explained some concepts like property for broadcast etc more clearly. Thanks Sumit Sir!! Expecting more videos like this..
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
satinder will be conducting more interviews
@KiyanshLife
@KiyanshLife 4 ะฐะน ะฑาฑั€ั‹ะฝ
Best Interview I ever seen. Both of you too good at your level.
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
yes this interview was next level
@tanujarora4906
@tanujarora4906 2 ะฐะน ะฑาฑั€ั‹ะฝ
Satinder sir is awesome, always something to learn from his questions.
@mohammedalikhan9819
@mohammedalikhan9819 4 ะฐะน ะฑาฑั€ั‹ะฝ
The interview was more focused on pyspark, sql we expect interviewer to ask more qns on AWS cloud as well. Because in most of the interview videos posted pyspark has been asked a lot.If qns on AWS would have been asked it would have been very helpful.
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Hi Mohammed, will definitely have some interviews planned specifically for AWS in the upcoming days.
@mohammedalikhan9819
@mohammedalikhan9819 4 ะฐะน ะฑาฑั€ั‹ะฝ
Thank you sir๐Ÿ˜Š
@avinash7003
@avinash7003 4 ะฐะน ะฑาฑั€ั‹ะฝ
I see mostly asked 70% in Pyspark SQL rest cloud โ€‹@@mohammedalikhan9819
@goldykarn5922
@goldykarn5922 3 ะฐะน ะฑาฑั€ั‹ะฝ
Best interview session so far.
@abhishekmodak8496
@abhishekmodak8496 4 ะฐะน ะฑาฑั€ั‹ะฝ
This was a good interview and Satinder has good experience as an interviewer.
@akshaykumarverma8644
@akshaykumarverma8644 4 ะฐะน ะฑาฑั€ั‹ะฝ
This was a very good video
@user-im6ui9zd8v
@user-im6ui9zd8v 4 ะฐะน ะฑาฑั€ั‹ะฝ
This was a good interview. Different from the earlier one's. Satinder's question and advice was very good.
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
this interview has really gone well
@safarnama65
@safarnama65 4 ะฐะน ะฑาฑั€ั‹ะฝ
Very Informative one of the best mock interview with proper answering and details
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Keep watching for more such insightful interviews
@ashwenkumar
@ashwenkumar 2 ะฐะน ะฑาฑั€ั‹ะฝ
Aditya - u need to be strong in the basics and always answer straight forward and crisply on points . Donโ€™t beat the bush
@grim_rreaperr
@grim_rreaperr 4 ะฐะน ะฑาฑั€ั‹ะฝ
Hi Sumit Sir, In the first sql problem where we are required to find subject wise toppers, one case where row_number() will fail is when we have two top-scorers with the same marks in a specific subject. Please check the example below: student_name, subject, marks (-- derived column) stud_1, maths, 90 -- 1 stud_2, maths, 90 -- 1 stud_1,economics, 95 --1 stud_2, economics, 90 -- 2 stud_3, economics, 88 -- 3 Instead of row_number(), we can choose any one from rank or dense_rank as we just need the first rankers(based on highest marks scored in each subject). My approach will be as follows: WITH top_scorers AS ( SELECT student_name, subject, marks, DENSE_RANK() OVER(PARTITION BY subject ORDER BY marks DESC) AS rnk FROM student_marks ) SELECT student_name, subject, marks FROM top_scorers WHERE rnk = 1;
@DataJourneyHuub
@DataJourneyHuub 4 ะฐะน ะฑาฑั€ั‹ะฝ
Itโ€™s really helpful sir. Thank you so much
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Most welcome
@sabyspeaksonline
@sabyspeaksonline 3 ะฐะน ะฑาฑั€ั‹ะฝ
What's the difference between parquet and delta format?
@abhishekkmalik4399
@abhishekkmalik4399 4 ะฐะน ะฑาฑั€ั‹ะฝ
Very informative video, liked the point of view by Satinder Sir.
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
satinder is a very knowledgeable person
@DesireIsIrrelevant
@DesireIsIrrelevant 4 ะฐะน ะฑาฑั€ั‹ะฝ
Thanks for uploading such a great Interview video Sir!
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Glad you found the interview informative!
@Sagar0155
@Sagar0155 4 ะฐะน ะฑาฑั€ั‹ะฝ
Interview was insightful. Learnt core concepts of spark from Satinder
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
glad that it helped you
@AliKhanLuckky
@AliKhanLuckky 4 ะฐะน ะฑาฑั€ั‹ะฝ
Sir i personaly want to see satinder sirs more interviews ๐Ÿ˜Š
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
yes definitely, he will be conducting more interviews
@zaffer2024
@zaffer2024 4 ะฐะน ะฑาฑั€ั‹ะฝ
Excellent
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Thanks
@amritmanash7950
@amritmanash7950 4 ะฐะน ะฑาฑั€ั‹ะฝ
Very nice interview
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
glad that you liked it
@Abhishek-14
@Abhishek-14 4 ะฐะน ะฑาฑั€ั‹ะฝ
Sir please continue python course along with this ๐Ÿ™
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
yes, one video coming tomorrow at 7 pm
@Abhishek-14
@Abhishek-14 4 ะฐะน ะฑาฑั€ั‹ะฝ
@@sumitmittal07 thank you so much sir that's a relief to hear this.
@doyouwanttoknow3366
@doyouwanttoknow3366 4 ะฐะน ะฑาฑั€ั‹ะฝ
Please upload a gcp data engineer interview video sir
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
very soon
@mohitbutola1140
@mohitbutola1140 4 ะฐะน ะฑาฑั€ั‹ะฝ
have anyone have taken the course ?
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
Please share your contact number if you would like to know more about the courses that I offer
@ameygoesgaming8793
@ameygoesgaming8793 4 ะฐะน ะฑาฑั€ั‹ะฝ
My SQL would be: SELECT student_id, max(marks) FROM class GROUP BY subject
@grim_rreaperr
@grim_rreaperr 4 ะฐะน ะฑาฑั€ั‹ะฝ
every non-aggregated column in your select statement must be included in the group by statement.( here student_id is a non aggregated column and it should be in your group by clause and same applies for the subject column too which is not being called in the select statement)
@ameygoesgaming8793
@ameygoesgaming8793 4 ะฐะน ะฑาฑั€ั‹ะฝ
@@grim_rreaperr Oh yes, its a typing bug. It should be: SELECT subject, max(marks) FROM class GROUP BY subject
@ameygoesgaming8793
@ameygoesgaming8793 4 ะฐะน ะฑาฑั€ั‹ะฝ
what is NC SQL way?
@SB-ix7db
@SB-ix7db 4 ะฐะน ะฑาฑั€ั‹ะฝ
ANSI
@ameygoesgaming8793
@ameygoesgaming8793 4 ะฐะน ะฑาฑั€ั‹ะฝ
so ANSI SQL is normal SQL syntax which we write right?@@SB-ix7db
@zaffer2024
@zaffer2024 4 ะฐะน ะฑาฑั€ั‹ะฝ
Why data engineer roles have very easy questions
@sumitmittal07
@sumitmittal07 4 ะฐะน ะฑาฑั€ั‹ะฝ
we make it look easy, else its complex.. haha
@akhilsingh3801
@akhilsingh3801 15 ะบาฏะฝ ะฑาฑั€ั‹ะฝ
Bro is cheating on mock interview with zero fundamental knowledge of Spark or Hadoop ๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚. At least interviewer has asked questions to get something out of this video.
Data Engineer Mock Interview | ADF | Medallion Architecture | BRONZE, SILVER & GOLD Layer| ADLS GEN2
41:04
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 10 ะœ.
Big Data Engineer Mock Interview | Big Data Project Pipeline | Managerial #interview #question
31:19
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 5 ะœ.
Smart Sigma Kid #funny #sigma #comedy
00:26
CRAZY GREAPA
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 20 ะœะ›ะ
ะะ ะะ’ะ˜ะขะกะฏ ะญะขะžะข ะคะžะ ะœะะข??
00:37
ะœะฏะขะะะฏ ะคะะะขะ
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 6 ะœะ›ะ
Cool Items! New Gadgets, Smart Appliances ๐ŸŒŸ By 123 GO! House
00:18
123 GO! HOUSE
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 17 ะœะ›ะ
ะšะžะœะŸะžะข ะ’ ะกะžะ›ะž
00:16
โšก๏ธะšะะ ะะะ”ะ ะ•ะ™โšก๏ธ
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 30 ะœะ›ะ
Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios
45:21
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 11 ะœ.
Data Engineering Complete Roadmap ๐Ÿ”ฅ | How to Become a Data Engineer in 2023
22:31
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 41 ะœ.
Question 10: PWC Interview Questions | data engineers | #pyspark #bigdata #pwc #interview
11:34
pysparkpulse
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 3,5 ะœ.
Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview
12:46
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 13 ะœ.
Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark
29:08
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 4,5 ะœ.
Azure Data Factory Part 3 - Creating first ADF Pipeline
24:43
databag
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 49 ะœ.
AWS Solution Architect Interview Questions and Answers - Part 2
11:56
Architecture Bytes
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 4,3 ะœ.
What Tools Should Data Engineers Know In 2024 - 100 Days Of Data Engineering
17:31
Seattle Data Guy
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 33 ะœ.
15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview
12:44
Sumit Mittal
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 11 ะœ.
Data modeling interview filters so many data engineers! How to model slowly-changing dimensions
2:58
Data with Zach
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 17 ะœ.
Smart Sigma Kid #funny #sigma #comedy
00:26
CRAZY GREAPA
ะ ะตั‚ า›ะฐั€ะฐะปะดั‹ 20 ะœะ›ะ