AQE in spark | Lec-19

  Рет қаралды 17,190

MANISH KUMAR

MANISH KUMAR

Күн бұрын

Пікірлер: 63
@piyushjain5852
@piyushjain5852 Жыл бұрын
Manish bhai, mai AQE pdh k gaya is video se, aaj interviewer ne puchha skew data ko kse optimize karoge, mene coleasce or split bataya pr usne puchha ki split kese hoga internally vo mujhe nahi samajh aya fir usne bataya SALTING krni hogi, bs vahi ek topic mene nhi padha tha qki mujhe laga important nhi hoga aapne b video me skip kr dia, pr vhi puch lia usne. Baki sb answer kr die mene thanks to your videos, but he rejected me, anyway I'm feeling more coinfident for upcoming interviews. Keep up the good work. You're helping guys like me a lot.
@Matrix_Mayhem
@Matrix_Mayhem 9 ай бұрын
Thanks for sharing Piyush. Can we connect on LinkedIN? I am preparing too
@vipul7010
@vipul7010 Жыл бұрын
Best Explanation I found anywhere in youtube, Great Work!
@deepjyotimitra1340
@deepjyotimitra1340 3 ай бұрын
Har ek video apka master class hain. ❤
@greendaywithtrading7408
@greendaywithtrading7408 Жыл бұрын
GREAT EXPLANATION KINDLY PROVIDE DATA FOR PRACTICE
@Fr4529
@Fr4529 9 ай бұрын
Great .Very precise and detailed explanation on AQE. Do you also have the Github or the dbc file of the example.
@Aryan-lu5js
@Aryan-lu5js Жыл бұрын
Azure Synapse was launched in Nov 2019 but Indian Recruiters wants someone who has 5+ years of experience in Azure Synapse 😢
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
😂😂
@piyushjain5852
@piyushjain5852 Жыл бұрын
Lol
@isharkpraveen
@isharkpraveen 5 ай бұрын
Recruiter pagla gaya he..😂
@SouvindSajeev
@SouvindSajeev 4 ай бұрын
🤣🤣🤣
@prashantmehta2832
@prashantmehta2832 6 ай бұрын
Hello sir, at 11.40, is there any rule for coalesce when we do AQE?
@BishalKarki-pe8hs
@BishalKarki-pe8hs 6 ай бұрын
same question here
@ajaypatil1881
@ajaypatil1881 Жыл бұрын
Best Explanation bhaiya
@mission_possible
@mission_possible Жыл бұрын
Thanks for the videos Manish❤
@RohitSingh-we9mo
@RohitSingh-we9mo 3 ай бұрын
@Manish Kumar -Aapne bataya tha ki coalesce me partition size reduce hota hai but yha pe add kyu kr rhe dynamically coalescing me.Iska explain kr denge
@aryankhandelwal8517
@aryankhandelwal8517 Жыл бұрын
Why does it create duplicate partitions for the join? It simply takes Table 2 partitions and joins them with both Table 1 partitions.
@gauravkothe9558
@gauravkothe9558 Жыл бұрын
if we do split then how will join is perform on that split data please explain
@isharkpraveen
@isharkpraveen 5 күн бұрын
Sir ji,, bohoth time se video nai aara...please upload videos
@stevedz5591
@stevedz5591 Жыл бұрын
Sir, plz make video on Orchestration & HDFS.
@AyushMandloi
@AyushMandloi Ай бұрын
Why AQE is not always enabled?
@shadabalam17
@shadabalam17 3 ай бұрын
I have a small confusion, when the final plan is updated, and AQE chooses Broadcast join instead of Sort merge join as per DAG, then, we are avoiding shuffling infact, you told in the end of Dynamic switching of join strategy, that we cannot avoid shuffling, But we know, Broadcast join avoid shuffling, Can anyone clarify me pls
@siddharthdas9952
@siddharthdas9952 19 күн бұрын
"AQE converts sort-merge join to broadcast hash join when the runtime statistics of any join side is smaller than the adaptive broadcast hash join threshold. This is not as efficient as planning a broadcast hash join in the first place, but it’s better than keep doing the sort-merge join, as we can save the sorting of both the join sides, and read shuffle files locally to save network traffic". From spark aqe documentation one side shuffling will be done other side can be avoided if the first one is below broadcast limit
@SubhamKumar-or8vc
@SubhamKumar-or8vc Жыл бұрын
Why would the shuffling still happen in case we've got 8gb and 10mb tables after filtering? The shuffling you talked about, is it before the tables are filtered or after the tables are filtered?
@Matrix_Mayhem
@Matrix_Mayhem 9 ай бұрын
Before
@kartikgupta2299
@kartikgupta2299 4 ай бұрын
@@Matrix_Mayhemare lekin pehele join hua kaha jo shuffling hui , pehele transformation hua aur uske baad to size kam hogya to na pehele hui shuffling aur baad m to hogi hi nhi kuki broadcast join hogya switch , vrna AQE ka fayda hi kya jab chidiya agar pehele hi khet chug jayegi to
@shadabalam17
@shadabalam17 3 ай бұрын
@@kartikgupta2299 yes, Evn I feel you are correct, I was going through this video, and got confused, that shuffling will still heppen.
@jai3863
@jai3863 Жыл бұрын
Have you created the discord group ? Link please if you have
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
discord.gg/6r9QF6TE3F
@ansarmobile9758
@ansarmobile9758 Жыл бұрын
Sir, how many more lectures are pending, pls?
@SouAzu-b8e
@SouAzu-b8e Жыл бұрын
Thank you, please explain the salting and how to implement it in next video.
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Will do soon
@Amarjeet-fb3lk
@Amarjeet-fb3lk 6 ай бұрын
At 27.54 Apne note me dala he, Skew join hone k baad, 1 partition me 8 gb data bhi aa sakta he,for example. But, 1 partition he,wo ek core pe store hogi,uspe process hogi. To, Agar 8 gb ka ek partition ban gaya, join k baad, But, core k pas memory hi ,4 gb ki hai, To wo kaise process karega?
@manish_kumar_1
@manish_kumar_1 6 ай бұрын
Iteration me process hoga.
@mission_possible
@mission_possible Жыл бұрын
Please make video on how to read physical plan and DAG
@rohitsharma-mg7hd
@rohitsharma-mg7hd Ай бұрын
sir ye skewed fact table ka data de dijiye practice karne k lie
@ritikpatil4077
@ritikpatil4077 Жыл бұрын
When i was going through your Job, Stage and Task video, i was only getting 1 Task even when i used groupBy and repartition. Now i know why :)
@akhiladevangamath1277
@akhiladevangamath1277 6 ай бұрын
Bcz of AQE?
@ritikpatil4077
@ritikpatil4077 6 ай бұрын
@@akhiladevangamath1277 yes
@ajaypatil1881
@ajaypatil1881 Жыл бұрын
Top Class Video
@tanushreenagar3116
@tanushreenagar3116 10 ай бұрын
Best content 👌
@kartikgupta2299
@kartikgupta2299 4 ай бұрын
Shuffling ku hui jab AQE ne sort merge ko change krke broadcast join m badal dia taaki shuffle na ho
@shadabalam17
@shadabalam17 3 ай бұрын
Looking for the reply from Manish on this, I got confused with this line, that we cannot avoid shuffling !
@tahersadri4220
@tahersadri4220 Жыл бұрын
Hey manish can you provide autoloader and copyinto function of databricks using configs and incremental load
@thedatamaster
@thedatamaster Жыл бұрын
Databricks Auto Loader kzbin.info/aero/PL7S7dD8r4QdWGQ1XT8FEM0ebf7I1RRAy2
@mohammedasif4349
@mohammedasif4349 2 ай бұрын
Ye AQE internally apne aap lag jata hai ya humein code me koi config set karna hota hai??
@manish_kumar_1
@manish_kumar_1 2 ай бұрын
Apne aap hoga if you are using spark 3.x else you will have to add the config
@madanmohan6487
@madanmohan6487 Жыл бұрын
Hi Manish, can you make detailed video on fact and dimension tables.
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Sure
@sauravbhattacharjee6297
@sauravbhattacharjee6297 Жыл бұрын
Can you create a course of AWS for data engineers?
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
No. Because I am not good at any cloud technology
@sanooosai
@sanooosai 8 ай бұрын
great sir thank you
@dharmendersingh7165
@dharmendersingh7165 2 ай бұрын
Tell me one thing Manish and you need to be clear this time Assuming you have 2 executors and 2 partitions on each executor after shuffle So keys will lies in different partitions in same executor? Or Keys like in same partition in a executor?
@younevano
@younevano 15 күн бұрын
Sub-partitions formed for skewed partition after AQE skew handling might reside on different executors or physical partitions. They can be distributed across the cluster. But even when data for a single key (skewed partition) is physically split into sub-partitions (for AQE skew optimization), it is still logically grouped. Spark uses the logical partition ID and maintains metadata to track which sub-partitions belong to the same key. Spark ensures that all sub-partitions for the same key are processed as a unit during aggregation or joining. Spark ensures correctness by logically merging results post-processing based on the logical partition key.
@saikumarjakki3802
@saikumarjakki3802 Жыл бұрын
Hi guys, does any one have idea about how to solve java heap space in spark cluster?
@KartikKumar-xr8gu
@KartikKumar-xr8gu Жыл бұрын
bhai aap yeh bta skte ho ki or kitna time lgega spark pura hona mai ?
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
September tak ho jayega
@arpittrivedi6636
@arpittrivedi6636 Жыл бұрын
Sir aap data science sikhte ho kya from scratch? If yes then please share video link
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
No main data science nhi sikhata hu
@unpluggedsaurav3186
@unpluggedsaurav3186 Жыл бұрын
Salting: To mitigate data skew, you can apply a salting technique. Salting involves adding a random or semi-random prefix to each key or value. This prefix is generated in a way that ensures that the same original key or value always gets the same prefix. # Sample data with skewed keys data = [("A", 1), ("B", 2), ("A", 3), ("C", 4), ("A", 5)] # Apply salting by adding a random prefix to each key salted_data = [(str(hash(key) % 1000) + "_" + key, value) for key, value in data] # Now, data is more evenly distributed, and you can perform operations like groupBy result = spark.sparkContext.parallelize(salted_data) \ .groupByKey() \ .mapValues(lambda values: sum(values)) result.collect()
@krupas8277
@krupas8277 Жыл бұрын
no practical videos
@agarwalankita504
@agarwalankita504 5 ай бұрын
your videos are very useful but your speaking content is more which makes the video large and boring
@sankuM
@sankuM Жыл бұрын
24:24, I think it's Projection Pruning @manish_kumar_1 🤓🤓
cache and persist in spark | Lec-20
32:44
MANISH KUMAR
Рет қаралды 16 М.
dynamic resource allocation in spark | Lec-21
26:35
MANISH KUMAR
Рет қаралды 11 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 14 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 43 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 11 МЛН
XmasDev 2024 - Live Panettone
8:36:55
DotNetCode.IT
Рет қаралды 559
Advancing Spark - Crazy Performance with Spark 3 Adaptive Query Execution
18:48
salting in spark | how to handle data skew issue | Lec-23
20:27
MANISH KUMAR
Рет қаралды 19 М.
DAG and Lazy Evaluation in spark
20:35
MANISH KUMAR
Рет қаралды 36 М.
join in pyspark | Lec-19 | spark interview questions
34:17
MANISH KUMAR
Рет қаралды 10 М.
repartition vs coalesce | Lec-12
21:20
MANISH KUMAR
Рет қаралды 23 М.
Handling corrupted records in spark | PySpark | Databricks
19:36
MANISH KUMAR
Рет қаралды 30 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 14 МЛН