Interviewer (Chandrali) definitely have deep understanding of topics. If we can get some interview videos for senior post where she is giving the interview for scenario based question that will be more helpful.
@sampadapatil52252 ай бұрын
salting for data skewness is correct as it adds a key to the data which helps it to be evenly spread out but repartition is not used in data skew scenario, because it just shuffles the data again so the uneven chunk of data is still there and it is not getting partitioned properly
@adityanjsg99Ай бұрын
A discussion of a high quality. Interviewer asked very conceptual questions
@Jimmy-jc6pbАй бұрын
One of the most knowledgeable interviews.
@rutvikkokane29973 ай бұрын
Really Insight-full video , as a fresher got to learn a lot
@MeowUniX8 ай бұрын
Data governance includes access permission for a user or a group , we can restrict user to read any specific location or we call it data masking ... Second thing unity catalog also store metadata
@ravulapallivenkatagurnadha96058 ай бұрын
Please continue python and Sql series
@SusheelGajbinkar6 ай бұрын
Insightful! thank you chandrali and abhirup
@avicool08Ай бұрын
Professional interview 👍
@jithindev91857 ай бұрын
If repartition happens at driver, do u think entire gbs of data travell to driver from all executors and then driver divide and give it to all executors... Then your understanding about spark is wrong.....
@gawlianilnrayan14 күн бұрын
in repartition, data will re-shuffle across all available partitions
@jithindev918514 күн бұрын
@ yea it wont go to driver
@abhishek_kumar0709Ай бұрын
Which company’s interview is this ?
@gauravgaikwad29393 ай бұрын
I think answers are not on point. for ex. difference between dataframe and dataset: They both are basically the same but with slight difference. One thing I have observed is that, Dataset Provides type safety and compile-time checks, meaning errors are caught during code compilation (e.g., wrong column names or data types) on the other hand DataFrame Errors occur at runtime due to its untyped nature. If you ask me what is more preffered then in my opinion, dataframe is more preferred as dataset comes with overhead, dataframe serialization is managed by tungsten binary format but dataset's serialization is managed by java serializaer(slower). So, using dataset will help us to cut down on developer mistake, but it will come with an extra cost of casting and expensive serialization.
@skybluelearner41986 ай бұрын
Her question on rank and dense rank was not complete. Rank and dense rank but based on what was not told to the candidate.
@AkashRusiya2 ай бұрын
Right. Also, partition by clause was not even required if she actually wanted to understand the difference among the 3 functions based on the given data.
@sandhu012 ай бұрын
import re def comp_str(x: list): if len(x.split()) != 2: return print("wrong input") else: a=re.search(r"(^[a-zA-Z].*?)(\s[a-zA-Z].*)",x) if a.group(1)[0].lower()==a.group(2)[1].lower(): ## remove lower() if need to match cases also print("True") else: print("false") lis="Crazy Chocolate" comp_str(lis)
@sandhu012 ай бұрын
if don't want to use 're', here is another simpler version: def comp_str(x: list): if len(x.split()) != 2: return print("wrong input") else: a=x.split() if a[0][0].lower()==a[1][0].lower(): print("true") else: print("false") lis="Crazy Chocoloate" comp_str(lis)
@AMM20124 ай бұрын
interesting
@hdr-tech43506 ай бұрын
Data lake vs delta lake Unity catalog Data profiling Data governance