transformation and action in spark

Рет қаралды 43,344

Күн бұрын

Пікірлер: 258

@gaurav_singh1017 3 ай бұрын

Hello Manish, I am presently working as a Data Engineer at Hyland, and it is your theory and practical playlists that I've been following to get me up and running with regard to learning Spark. Your content is very useful not just to me but to many professionals out there. Please do keep up the good work-although some may be casually watching, there are plenty of us who are seriously following through in the long term. Your work is making a real difference, and I really value your commitment. Thank you for all that you do, and please continue the great work! Best regards, Gaurav.

@abc_987 3 ай бұрын

Give this guy some medal for Quality

@Shivam-Tiwari11 10 ай бұрын

Literally i am watching it at 3 am in the morning and my eyes are not blinking. That much interesting ur way of teaching is. I hope you will not stop creating the content because of demotivation of any type let’s say less number of subscribers ( i can feel it’s disheartening sometimes when u put that much efforts) . You just increased my crave for learning. Dil se shukriya bhaii ❤

@tusharkantighosh4641 7 ай бұрын

same happens with me bro....these contents are diamonds....

@ranjithbohara3775 4 ай бұрын

@@tusharkantighosh4641 even with me also i was almost to quite spark but after this video i got much interest in spark

@renukasrivastava1167 2 ай бұрын

Hi Sir, I am very thankful that I came across your playlist . Trust us many of us are implementing whatever things you are explaining so don’t be demotivated. We understand the amount of extra work you are doing for us and we won’t let your effort go on vein. Your playlist is a hidden gem.

@panyamaravind852 2 ай бұрын

Because of you I’m confidently attending interviews. Thank you sooo much

@Samar009-m 4 күн бұрын

I just have one word for this ..awesome

@khurshidhasankhan4700 Жыл бұрын

Interview k liye bahut benifit ho raha hai ❤

@apoorvkansal9266 6 ай бұрын

Hello Sir, You are doing a great job and there are very few humble and selfless Data Engineers like you who could dedicate their valuable time for sharing their knowledge and experience. Please continue creating these videos on PySpark. I am preparing for Data Engineering Interviews for the the 3rd time i.e. for my 3rd switch of companies and interviews are getting scheduled and looking upto this series for cracking them. Thank you in advance for creating such informative and detailed videos. Jai Mata Di! 🙏

@ARGTalks-gx6xs 6 ай бұрын

Please continue this series brother, I just started watching this and found it to be fantastic resource. Keep up the good work.

@prashantmehta2832 6 ай бұрын

I am preparing for the interview, and I can say you are playing main part. I do make note and follow all steps. Thanks for the supper.

@RaviKumar-xo5bz 8 ай бұрын

sach mei aapko bahut ache se aata hai ... one place for all IT courses

@harshitmittal1434 20 күн бұрын

Manish You are doing a great work giving to society for free. keep it up.

@sanooosai 8 ай бұрын

sir g dont loose motivations, we are fallowing with all dedication, sometime people with all the implementation knowledge also see your video for revision or to brush up basics, thank you for all the hard work and giving these great content in hindi

@ranjithbohara3775 4 ай бұрын

to be honest the way is explain the fundamental is amazing, for a beginners the course is really worth to watch instead of buying some random expensive courses.. thank you bother for such amazing and useful content .. 👍👍👍

@ghvendrashra4912 8 ай бұрын

Bhai..you are the best teacher for data Engineer... concepts are very clear.....thanks

@manvika Жыл бұрын

Your series is really good,. I started watching it today.. Please keep making it.. really helpful.

@udittiwari8420 9 ай бұрын

very helpful series sir thank you for giving us your imp time ❣

@ashwinichoure3203 2 күн бұрын

Hi Sir, I am very thankful that I came across your playlist . very nice , please keep it up

@ranidalvi1064 9 ай бұрын

Most informative and needful video. thank u so much.

@shravanshenoy3873 7 ай бұрын

Fantastic explanation, understood the concepts very clearly. Thanks for making such amazing content.

@abhishekkumar-gupta Жыл бұрын

Very informative video. Must watch for someone beginner or intermediate in pyspark.

@alokmishra5367 10 ай бұрын

No no sir .. I m honestly following your course.

@harmanpreetkaur8049 18 сағат бұрын

This is the best series related to Spark that I have seen so far. Tried so many other videos and couldn't complete those. You make it so easy to understand. Hands down the best teacher! Thanks for all your efforts. Subscribed :)

@deeksha6514 8 ай бұрын

I do not have words to praise this masterpiece. Keep creating this awesome content.

@lifeisfun9 Жыл бұрын

You are a gentle man seriously :)

@PraveenkumarVinukonda 4 ай бұрын

I am watching and following from past 1 week

@RiyaBiswas-r1p 8 ай бұрын

The way you explained everything made it so simple to understand. Earlier it used to be difficult for me to understand the architecture concept and I would forget that in sometime, but now I feel I wont forget it as this video felt like a story and everything was explained in detail and in very simple way. Its such a great content.

@vaibhavmore7936 Жыл бұрын

We got the file and used it as you taught. Didn't feel need to comment this also, after you are giving this much for free. Thanks for awesome content.

@Azure-Mahesh 11 ай бұрын

Your inputs are really helping in understanding complicated things easily, thank you

@mmohammedsadiq2483 Жыл бұрын

I mean to say in my last comment, you have explained very well with simple examples

@Varunsharma-sg2nt 5 ай бұрын

Sir, Awesome explanation Thank you so much

@shyammohan3611 8 ай бұрын

Sir I have got you link in the last video, sir i am following you both playlist practical and theory. please don't stop videos

@jigsparikh7961 9 ай бұрын

You are awesome. I make it simple to understand. thanks

@lazycool3611 8 ай бұрын

Ab tak bhaut sahi chl rha hai jo aap smjhaye 100 marks aayenge ab hume

@dilipkuncha5728 Жыл бұрын

hi , im learning well through your course . I literally realised after watching this video , about the csv file , because i had to literally make a new one using the data. Thanks for motivation and Please be Motivated !!😊

@vishwajeetkushwaha64 9 ай бұрын

Most informative @ Thanks Bro

@vilaspatil-r3q Жыл бұрын

Manish You are simply awesome. i'm glad that i have found best video's on KZbin to learn spark.

@bforbhakti1 11 ай бұрын

Your teaching is really good and helpful.thanks alot

@sarpeshmishra6739 14 күн бұрын

very nice , please keep it up

@rakeshjadhav_O 7 ай бұрын

Great explanation Manish. Thank you very much for such informative videos

@manojkaransingh5848 Жыл бұрын

Outstanding tutorial....maja aa gaya sir..@manish

@khurshidhasankhan4700 Жыл бұрын

Sir aapke lecture se bahut sare doubt clear ho raha hai

@omkarm7865 Жыл бұрын

Very very helpful content.. Please keep it up

@SANJAYYADAV-hm2bs 9 ай бұрын

Sir you are doing really awesome job. Please making all your videos. As its will help to others in future as well. I am working as ETL tester and still i keep watching your videos to get some insight on data engineer roadmap.

@adityaabhinav4171 Жыл бұрын

Most underrated Teacher on KZbin I wish you all the best and one day you will be famous because the way you teach is awesome.

@manish_kumar_1 Жыл бұрын

Aap hame jaante hai itna hi kaafi hai😂😂

@bobbygupta830 2 ай бұрын

Thank you manish bro,

@praveenkumarrai101 Жыл бұрын

your kadva sach is very very true

@ayeshaagrawal4987 Жыл бұрын

Best teacher ever met , thank you sir

@pushkarratnaparkhi2205 9 ай бұрын

धन्यवाद मनीष भाई। 🙏

@divyanshusingh3966 2 ай бұрын

Bro you are doing a great job keep going..

@ranjithbohara3775 4 ай бұрын

i was there almost to quite spark but after this video i got much interest in spark

@automationwithwasi 5 ай бұрын

Manish bhaiya your videos are very valuable. I learn new things every time.

@ujjalroy1442 Жыл бұрын

Phenomenal lectures loved it✌️👍

@divyanshusingh3966 2 ай бұрын

Bro your content is very good. We need tutors like you. Hats off to you.

@rohit-ll3rj 7 ай бұрын

Really appreciate your efforts Manish! You teach really well. I work as a Data Engineer in one of the Big 4 and go through your videos whenever i need understanding of topics in depth.

@pritiiBisht 11 ай бұрын

Very Informative.

@SanjayKumar-rw2gj 6 ай бұрын

Wonderful explanation bro. Read couple of articles about Narrow and Wide transformation but could not grasp it completely but now after watching your video things got clear.

@younevano Ай бұрын

What resources can a beginner educate themselves from?

@ashishkumar9538 9 ай бұрын

Thank you for sharing your knowledge among us and I am really enjoying learning the concepts. I am sure these informative videos are helping a lot of people. Keep on teaching us!!

@MrFirstScientist Жыл бұрын

Appreciate your hard work.

@hubspotvalley580 Жыл бұрын

You are doing great job. Your lectures are awesome.

@mohdrizwanahmed5537 2 ай бұрын

bht ache se undrstand horha bhai

@muizzrehan1433 Жыл бұрын

Amazing work. Thanks for lectures...👏

@KumarKumar-en5xq 3 ай бұрын

Hi Manish, Thanks for your efforts. It's really good.

@t1mt0m97 Жыл бұрын

Manish Ji....excellent hands-on course.. Keep adding and let your channel grow !

@Podcast-Bites-Hub 28 күн бұрын

Really great content ✨

@samirdeshmukh9886 Жыл бұрын

Very helpful sir.. Thank you..

@pankajsolunke3714 Жыл бұрын

This series is phenomenal.❤

@RishikaJain-f4m 4 ай бұрын

Thanks a lot for your course

@adityasaini8437 Жыл бұрын

Manish bhai is real 💎!!

@lucky4-vj 10 ай бұрын

Amazing explanation Sir 🤩

@abidsyman Жыл бұрын

good morning sir, ur lectures are great

@sureshydv724 Жыл бұрын

You are outstanding. I love the way you are teaching. Thanks Manish bhai explaining along with flowchart👏

@Daily_Code_Challenge 4 күн бұрын

Thank you ❤

@amarnaik1819 6 ай бұрын

Sir your videos are very helpfull

@abegpatel5300 Жыл бұрын

You are just awesome ❤

@akash4606 Жыл бұрын

Manish bhai bhot ache se pdha rhe ho.......lge rho....hm bhi apke sath lge hue h....Maine abhi start Kia pr complete zrur krunga

@seethroughmyeyes423 6 ай бұрын

You make each and every topic very simple to understand. Thank you so much and keep up the work. 👍 If possible, please make some videos on Kafka, Kubernetes, Azure, Databricks. Your videos are really very helpful!

@AbhisekLipun Жыл бұрын

Sach bat hai sir

@DpIndia Жыл бұрын

Nice Video, got lots of learning

@souravdas-kt7gg 9 ай бұрын

content is very good

@aasthajain4814 11 ай бұрын

Appreciate it truly

@karansinghrajpurohit3500 Жыл бұрын

Please continue this series

@manish_kumar_1 Жыл бұрын

Sure

@mantukumar-qn9pv Жыл бұрын

Sach me ab promise to myself karta hu ab practice karunga

@shivakrishna1743 Жыл бұрын

I want to tell you that I am following your videos and also implementing them. Whenever possible I saw links from your video and got the doc or data file. Once, I couldn't find data so I asked you :). Please don't loose motivation for doing these videos!!

@Icelander00 Жыл бұрын

Manish I am implementing all side by side

@nilavnayan4521 2 ай бұрын

Sir one question. At 14:03, when you showed output of question 1 (list of people with age less than 18) Here, as we can see the output contains redundant rows, so in this case once both the executors have returned their outputs which are kind of appended on top of each other in the output you showed - what happens after this step to remove the redundancy? Is a ‘distinct’ run by spark?

@younevano Ай бұрын

Only if you write 'DISTINCT' in your code! Then saprk performs the wide dependency transformation I believe!

@MsSubhrajeet 9 ай бұрын

Awesome 👏

@swarupsarangi734 7 ай бұрын

you should make another video in which you address whether the spark.read methods are transformation or action.

@younevano Ай бұрын

spark.read() is neither a transformation nor an action in the strict Spark context. It’s more of a data source method used to define a DataFrame by reading data from an external source (like a file, table, or API) into Spark's DataFrame abstraction. Here's a breakdown of Spark operations for context: Transformations (like map(), filter()) define a new dataset based on the current one but are lazily evaluated, meaning they don’t execute until an action is called. Actions (like count(), collect()) trigger the execution of the transformations defined on the DataFrame. In short, spark.read() sets up the DataFrame for subsequent transformations or actions but does not trigger any execution itself.

@dattak-gb7ez 6 ай бұрын

Sir App Azure ke bhi kuch session loge to bahot acha hoga..kyuki apki padane ki technique bahot achi hai..

@Finoboost Жыл бұрын

Hi Manish Bhai, csv file was downloaded by the steps which you guided, you are doing great work, i am following and applying all the concepts which you taught.

@manish_kumar_1 Жыл бұрын

Good

@RajvirKumar-n1p 6 ай бұрын

great bhai bahut acha concept padha raheho Azure Data Brick ka v lekar aawo aap series

@jbb6906 Жыл бұрын

Awesome

@CctnsHelpdesk 8 ай бұрын

well explained

@tnmyk_ 9 ай бұрын

Very well explained! I have a doubt though - What if for example - One partition can handle only 100 records of data but the original input data consists of 150 records all having the same ID. In that case how will the groupby() function work because it wont be possible to bring all data in to the same partition even after shuffling

@EVSprakash 9 ай бұрын

What is the answer

@younevano Ай бұрын

When using `groupBy()` in Spark, the goal is to bring all records with the same key (in this case, the same ID) to the same partition to perform the grouping operation. However, as you've pointed out, if the data for a single key is larger than a single partition's capacity (here, 100 records per partition), Spark can encounter memory issues. Here's how Spark addresses this: ### 1. **Spill to Disk**: Spark can spill data to disk when the in-memory size limit is reached. If all records for a single ID cannot fit in memory within one partition, Spark writes some of the data to disk temporarily. This allows Spark to handle larger datasets without running out of memory, though it may be slower than purely in-memory processing. ### 2. **Tungsten Execution Engine Optimization**: Spark’s Tungsten execution engine is optimized to handle large data processing tasks. It uses techniques like off-heap memory management and binary processing to efficiently manage memory. This can alleviate some memory pressure by keeping only essential data in memory and managing the rest on disk. ### 3. **Alternative Aggregations with Approximate Solutions**: If exact grouping isn't strictly necessary, using approximate aggregations like `approxQuantile()` or `countApproxDistinct()` may help reduce memory usage and avoid data skew issues. ### 4. **Custom Partitioning Strategies**: For cases where specific keys have significantly larger data sizes, a custom partitioning strategy can sometimes help by pre-processing the data to manage key distribution more effectively. In practice, large shuffles with skewed keys can lead to inefficiencies. Optimizing partition sizes, monitoring skew, and using aggregations designed for big data can often prevent these issues from becoming bottlenecks.