Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

  Рет қаралды 23,314

Sumit Mittal

Sumit Mittal

Күн бұрын

Пікірлер: 16
@sampadapatil5225
@sampadapatil5225 12 күн бұрын
salting for data skewness is correct as it adds a key to the data which helps it to be evenly spread out but repartition is not used in data skew scenario, because it just shuffles the data again so the uneven chunk of data is still there and it is not getting partitioned properly
@AshishStudyDE
@AshishStudyDE 5 ай бұрын
Interviewer (Chandrali) definitely have deep understanding of topics. If we can get some interview videos for senior post where she is giving the interview for scenario based question that will be more helpful.
@rutvikkokane2997
@rutvikkokane2997 Ай бұрын
Really Insight-full video , as a fresher got to learn a lot
@MeowUniX
@MeowUniX 6 ай бұрын
Data governance includes access permission for a user or a group , we can restrict user to read any specific location or we call it data masking ... Second thing unity catalog also store metadata
@ravulapallivenkatagurnadha9605
@ravulapallivenkatagurnadha9605 6 ай бұрын
Please continue python and Sql series
@SusheelGajbinkar
@SusheelGajbinkar 4 ай бұрын
Insightful! thank you chandrali and abhirup
@jithindev9185
@jithindev9185 5 ай бұрын
If repartition happens at driver, do u think entire gbs of data travell to driver from all executors and then driver divide and give it to all executors... Then your understanding about spark is wrong.....
@gauravgaikwad2939
@gauravgaikwad2939 Ай бұрын
I think answers are not on point. for ex. difference between dataframe and dataset: They both are basically the same but with slight difference. One thing I have observed is that, Dataset Provides type safety and compile-time checks, meaning errors are caught during code compilation (e.g., wrong column names or data types) on the other hand DataFrame Errors occur at runtime due to its untyped nature. If you ask me what is more preffered then in my opinion, dataframe is more preferred as dataset comes with overhead, dataframe serialization is managed by tungsten binary format but dataset's serialization is managed by java serializaer(slower). So, using dataset will help us to cut down on developer mistake, but it will come with an extra cost of casting and expensive serialization.
@skybluelearner4198
@skybluelearner4198 4 ай бұрын
Her question on rank and dense rank was not complete. Rank and dense rank but based on what was not told to the candidate.
@AkashRusiya
@AkashRusiya Ай бұрын
Right. Also, partition by clause was not even required if she actually wanted to understand the difference among the 3 functions based on the given data.
@sandhu01
@sandhu01 23 күн бұрын
import re def comp_str(x: list): if len(x.split()) != 2: return print("wrong input") else: a=re.search(r"(^[a-zA-Z].*?)(\s[a-zA-Z].*)",x) if a.group(1)[0].lower()==a.group(2)[1].lower(): ## remove lower() if need to match cases also print("True") else: print("false") lis="Crazy Chocolate" comp_str(lis)
@sandhu01
@sandhu01 23 күн бұрын
if don't want to use 're', here is another simpler version: def comp_str(x: list): if len(x.split()) != 2: return print("wrong input") else: a=x.split() if a[0][0].lower()==a[1][0].lower(): print("true") else: print("false") lis="Crazy Chocoloate" comp_str(lis)
@AMM2012
@AMM2012 2 ай бұрын
interesting
@hdr-tech4350
@hdr-tech4350 4 ай бұрын
Data lake vs delta lake Unity catalog Data profiling Data governance
@usmanahmed1835
@usmanahmed1835 Ай бұрын
29:00 - 29:30 👀
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 12 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 18 МЛН
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 15 МЛН
2024년 11월 11일 (월) 설거지  ( Washing dishes )  #asmr
27:54
설거지하는남자
Рет қаралды 199
Azure Data Engineer Mock Interview - ADF Scenario Based Special
25:36
Azurelib Academy
Рет қаралды 15 М.
How He Got $600,000 Data Engineer Job
19:08
Sundas Khalid
Рет қаралды 166 М.
How I Prepare for Data Engineering Interviews?
13:19
GeekCoders
Рет қаралды 6 М.
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 12 МЛН