Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training | Edureka

Рет қаралды 640,616

Күн бұрын

Пікірлер: 207

@edurekaIN 6 жыл бұрын

Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Apache Spark Certification Training Curriculum, Visit the website: bit.ly/2KHSmII

@arunasingh8617 2 жыл бұрын

Well explained the concept of Lazy Evaluation!

@edurekaIN 2 жыл бұрын

Good To know our videos are helping you learn better :) Stay connected with us and keep learning ! Do subscribe the channel for more updates : )

@sarthakverma5921 4 жыл бұрын

his teaching is pure gold

@ajanthanmani1 7 жыл бұрын

1.5x for people who don't have 2 hours to watch this video :)

@edurekaIN 7 жыл бұрын

Whatever rocks your boat, Ajanthan! :) Since we have learners from all backgrounds and requirements we make our tutorials as detailed as possible. Thanks for checking out our tutorial.Do subscribe to stay posted on upcoming tutorials. We will be coming up with shorter tutorial formats too in the future. Cheers!

@draxutube Жыл бұрын

SO GOOD TO WATCH I UNDERSTANDED SO MUCH

@moview69 7 жыл бұрын

you are undoubtedly the king of all instructors...you rock man

@AdalarasanSachithanantham 5 ай бұрын

1st time really impressed in how the way you are teaching God bless you

@vinulovesutube 7 жыл бұрын

Before starting this session I had no clue of Bigdata nor Spark . Now I have pretty decent insight . Thanks

@edurekaIN 7 жыл бұрын

Thank you for watching our videos and appreciating our work. Do subscribe to our channel and stay connected with us. Cheers :)

@JanacMeena 5 жыл бұрын

Jump to 21:25 for the example

@daleoking1 6 жыл бұрын

This makes things more clear after my Data Science class lol. Thank you so much for a great tutorial, I think this will sharpen me up.

@edurekaIN 6 жыл бұрын

Hey, thank you for watching our video. Do subscribe and stay connected with us. Cheers :)

@ranjeetkumar2051 2 жыл бұрын

thank you sir for making this video

@edurekaIN 2 жыл бұрын

Most welcome

@kag1984007 7 жыл бұрын

So far this is 4th course I am watching, Instructors from Edureka are amazing. Very well explained RDD in first half. Worth watching !!!

@edurekaIN 7 жыл бұрын

Hey Kunal, thanks for the wonderful feedback! We're glad we could be of help. We thought you might also like this tutorial: kzbin.info/www/bejne/q3XComeIopmcaLM. You can also check out our blogs here: www.edureka.co/blog Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!

@Successtalks2244 4 жыл бұрын

I love this edureka tutorials very much

@nshettys 4 жыл бұрын

Brilliant Explanation!!! Thank you

@leojames22 6 жыл бұрын

One of the best video I ever watched. MapReduce was not explained in this way wherever i checked. Really thank you to post this. Use Cases are really good. Worth the time watching almost 2 hrs. 5 star to you the instructor. Very impressed.

@taniakhan71 7 жыл бұрын

thank you so much for this wonderful tutorial.. I have a question.. while discussing about lazy evaluation, you mentioned that for B1 to B6 RDD memory is allocated, but they remain empty till collection is invoked. My qs is.. what is the size size of the memory that is allocated for each RDD? How does the framework predict the size before hand for each RDD without processing the data? eg, B4, B5 , B6 might have different sizes and smaller or equal to B1, B2, B3 respectively... I didn't get this part. Could you please clarify?

@edurekaIN 7 жыл бұрын

What is the size of the memory that is allocated for each RDD? 1. There is no easy way to estimate the RDD size and approximate methods were used in Spark Size Estimator's methods). 2. By default, Spark uses 60% of the configured executor memory (--executor- memory) to cache RDDs. The remaining 40% of memory is available for any objects created during task execution. In case your tasks slow down due to frequent garbage-collecting in JVM or if JVM is running out of memory, lowering this value will help reduce the memory consumption. How does the framework predict the size before hand for each RDD without processing the data? 1. One can determine how much memory allocated to each RDD by looking at the Spark Context logs on the driver program. 2. A recommended approach when using YARN would be to use --num-executors 30 --executor-cores 4 --executor-memory 24G. Which would result in YARN allocating 30 containers with executors, 5 containers per node using up 4 executor cores each. The RAM per container on a node 124/5= 24GB (roughly). Hope this helps :)

@kadhirn4792 4 жыл бұрын

Great Video. He is my tutor for ML

@yitayewsolomon4906 3 жыл бұрын

thanks very much, I'm biggner for data science i got clear explanation for spark thanks alooot.

@edurekaIN 3 жыл бұрын

Thank you so much for the review ,we appreciate your efforts : ) We are glad that you have enjoyed your learning experience with us .Thank You for being a part of our Edureka team : ) Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )

@arunasingh8617 2 жыл бұрын

I have a question here, if we have almost 60M data then creating RDD while processing the data will helps in handling such huge data or some other processing steps required?

@mix-fz7ln Жыл бұрын

Awesome session! Hats off to the instructor, I was searching hard to understand spark and nothing pop up to me and explained this session amazing I love how the instructor clarify every concept and frames you are amazing!

@edurekaIN Жыл бұрын

Good to know our contents and videos are helping you learn better . We are glad to have you with us ! Please share your mail id to send the data sheets to help you learn better :) Do subscribe the channel for more updates 😊 Hit the bell icon to never miss an update from our channel

@Yashyennam 4 жыл бұрын

This is top notch 👍👍👌

@suvradeepbanerjee6801 Жыл бұрын

Great tutorial. Really explained things up! thanks a lot

@edurekaIN Жыл бұрын

You're welcome 😊 Glad you liked it!! Keep learning with us..

@areejabdelaal4446 5 жыл бұрын

thanks a lot!

@shamla08 7 жыл бұрын

Very detailed presentation and a very good instructor! Thank you!

@niveditha-7555 4 жыл бұрын

Wow!! extremely impressed with this explanation

@moneymaker2328 6 жыл бұрын

Excellent session no words to describe anything about it ...trainer is too good...worth watching

@edurekaIN 6 жыл бұрын

Hey Apurv, thank you for watching our video and appreciating our effort. Do subscribe and stay connected with us. Cheers :)

@sahanashenoy5895 4 жыл бұрын

Amazing way of explanation. crystal clear.. way to go edurekha

@krutikachauhan3299 3 жыл бұрын

It was a totally new topic for me.. but still I was able to grasp it easily. Thanks to the whole team.

@edurekaIN 3 жыл бұрын

Hey:) Thank you so much for your sweet words :) Really means a lot ! Glad to know that our content/courses is making you learn better :) Our team is striving hard to give the best content. Keep learning with us -Team Edureka :) Don't forget to like the video and share it with maximum people:) Do subscribe the channel:)

@rmuru 7 жыл бұрын

Excellent session...very informative..trainer is too good and explained all concepts in detail...thanks lot

@hymavathikalva8959 2 жыл бұрын

Very helpful section. Now I have some idea on hadoop. Nice explanation sir. Tq

@edurekaIN 2 жыл бұрын

Thank you so much : ) We are glad to be a part of your learning journey. Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )

@sunithachalla7840 4 жыл бұрын

awesome session...

@nileshdhamanekar4545 6 жыл бұрын

Awesome session! Hats off to the instructor, you are amazing! The RDD explanation was the best

@edurekaIN 6 жыл бұрын

Hey Nilesh, we are delighted to know that you liked our video. Do subscribe to our channel and stay connected with us. Cheers :)

@AdalarasanSachithanantham 5 ай бұрын

Superb 🎉

@ramsp35 5 жыл бұрын

This one of the best and simplified Spark tutorial I have come across. 5 stars...!!!

@edurekaIN 5 жыл бұрын

Thank you for appreciating our efforts, Ramanathan. We strive to provide quality tutorials so that people can learn easily. Do subscribe, like and share to stay connected with us. Cheers!

@girish90 4 жыл бұрын

Excellant session!

@joa1paulo_ 5 жыл бұрын

Thanks for sharing!

@srividyaus 7 жыл бұрын

This is the best spark demo I have ever heard. Very clear and planned way of explaining things! Have taken up Hadoop basics classes with Edureka, which are great! Planning to enroll for spark as well. Would you explain more realtime use cases in spark training? Hadoop basics doesn't have use case explanation, which is the only drawback of the course! Great going , thanks a lot for this video.

@edurekaIN 7 жыл бұрын

+Srividyaus thanks for the thumbs up! :) We're glad you liked our tutorial and the learning experience with Edureka! We have communicated your feedback to our team and will work towards coming up with more real time use case videos on top of existing hands-on projects. Meanwhile, you might also find this video relevant: kzbin.info/www/bejne/sJanhquVf8tka5Y. Do subscribe to our channel to stay posted on upcoming videos and please feel free to reach out in case you need any assistance. Cheers!

@theabhishekkumardotcom 7 жыл бұрын

Thank you for the quick introduction on the architecture of spark....

@edurekaIN 7 жыл бұрын

Hey Abhishek, Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@seenaiahpedipina1165 7 жыл бұрын

Good explanation and useful tutorial. Conveyed a lot just in two hours. Thank you edureka !

@edurekaIN 7 жыл бұрын

Hey Srinu, thanks for the wonderful feedback! We're glad we could be of help. Here's another video that we thought you might like: kzbin.info/www/bejne/q3XComeIopmcaLM. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!

@laxmipriyapradhan8087 2 жыл бұрын

thank u sir , i just love ur teching style. is there any other vdos of urs in youtube, plz give that link

@edurekaIN 2 жыл бұрын

Hi Laxmipriya glad to hear this from you please feel free to visit our channel for more informative videos and don't forget to subscribe to get notified on our new videos

@sagarsinghrajpoot3832 5 жыл бұрын

Awesome video sir 🙂

@manishdev71 5 жыл бұрын

Excellent session.

@tsuyoshikittaka6636 7 жыл бұрын

wonderful tutorial ! thank you :)

@JanacMeena 5 жыл бұрын

26:34 Why do we consider the step for incrementing our indicies as a bottle neck, but we don't consider sorting as a bottle neck? EDIT: I think I understand the bottleneck. If we don't know what the all the possible words are, then we can't have a simple array index based counter. Instead we we would use a hashmap, and would need to check for the existence of the word in the hashmap. for each word in the file if the word is in our hashmap increment hashmap if the word is not in our hashmap create the index, and increase the hashmap This is a looping bottleneck for sure

@gaurisharma9039 6 жыл бұрын

Kindly put a video on spark pipelining. I would really appreciate that. Thanks much in advance

@gyanpattnaik520 7 жыл бұрын

Its an amazing video . Gives a complete concept of spark as well as its implementation in real world. Thanks

@hemanthgowda5855 6 жыл бұрын

Good lecture. An action is a trigger for lazy eval to start right? .collect() is not equivalent to printing..

@edurekaIN 6 жыл бұрын

Hey Hemanth, sorry for the delay. Yes, action is a trigger for lazy evaluations to start. To print all elements on the driver, one can use the collect() method to first bring the RDD to the driver node. Hope this helps!

@taniakhan71 7 жыл бұрын

Thank you for the explanation.

@edurekaIN 7 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@coolprashantmailbox 7 жыл бұрын

very useful video for beginners.. awesome.thank u

@ManishKumar-ni4pi 6 жыл бұрын

The way of representation is wonderful. Thank you

@edurekaIN 6 жыл бұрын

Thanks for the compliment Manish! We are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!

@umashankarsaragadam8205 7 жыл бұрын

Excellent explanation ...Thank you

@sagnikmukherjee5108 4 жыл бұрын

Its an awesome session. The way you explain everything with examples, its remarkable. Thanks mate.

@edurekaIN 4 жыл бұрын

Thanks for the wonderful feedback! We are glad we could help. Do subscribe to our channel to stay posted on upcoming tutorials.

@chandan02srivastav 5 жыл бұрын

Very well explained!! Amazing Tutor

@SkandanKA 4 жыл бұрын

Nice, brief explanation @edureka. Keep going with more such good tutorials.. 👍

@IsabellaYuZhou 5 жыл бұрын

1:01:07

@nagendrag2441 5 жыл бұрын

Explanation is very good.. Thank you Now i understood overview completely...

@shubhamshingi9618 4 жыл бұрын

wow. Such an amazing content. Thanks, edureka for this

@pritishkumar6514 6 жыл бұрын

Loved the way how the trainer explained about it. Watched for the first time and it cleared all my doubts. Thanks, edureka.

@edurekaIN 6 жыл бұрын

Thanks for the compliment Pritish! we are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!

@ankitas7293 5 жыл бұрын

This Shivank sirs voice .. he is a very very good trainer.

@nikitagupta6174 7 жыл бұрын

Hi, I have few questions: 1.) Its about difference between Hadoop and Spark, you told that there are lot of i/p o/p operations in hadoop whereas in spark you said it happens only once when blocks are copied in memory and rest of the operations are performed in memory itself, so i wanted to ask when entire operation is completed so i/p o/p operation might be again required to copy the result to disk or result stays in memory itself in case of spark? 2.) Also, when we use map and reduce functions in spark python, how does those things works then? All the map operations are done in memory like that of hadoop? but what about reduce thing as reduce will merge result of two blocks so, don't you think that again network overhead will occur when we pass data from another disk to the disk in which we need to do reduce operation and the that disk will again copy that data to its memory? Can you explain how exactly it will work in case of spark?

@edurekaIN 7 жыл бұрын

Hey Nikita, thanks for checking out our tutorial! Here are the answers to your questions: 1. Spark doesn't work in a strict map-reduce manner and map output is not written to disk unless it is necessary. To disk are written shuffle files. It doesn't mean that data after the shuffle is not kept in memory. Shuffle files in Spark are written mostly to avoid re-computation in case of multiple downstream actions. The difference between Spark storing data locally (on executors) and Hadoop MapReduce is that: i.The partial results (after computing ShuffleMapStages) are saved on local hard drives not HDFS which is a distributed file system with a very expensive saves. ii.Only some files are saved to local hard drive (after operations being pipelined) which does not happen in Hadoop MapReduce that saves all maps to HDFS. 2. Also, when we use map and reduce functions in spark python, The Spark Python API (PySpark) exposes the Spark programming model to Python (Spark Programming Guide). PySpark is built on top of Spark's Java API. Data is processed in Python and cached / shuffled in the JVM. 3. All the map operations are done in memory like that of hadoop? Yes, all the operations will be done in memory only and all the reduce operations also will be done in the same way as Hadoop because data is processed in Python and cached / shuffled in the JVM. Hope this helps. Cheers!

@1203santhu 7 жыл бұрын

Session is really fantastic and informative...

@edurekaIN 7 жыл бұрын

Hey Santhosh, thanks for checking out our tutorial! We're glad you found it useful. :) Here's another video that we thought you might like: kzbin.info/www/bejne/q3XComeIopmcaLM Do subscribe to our channel to stay posted on upcoming videos. Cheers!

@puneethunplugged 7 жыл бұрын

Thank you for the crisp session. Good content and flow. Appreciate it.

@kavyaa1053 6 жыл бұрын

thanks for this video .

@edurekaIN 6 жыл бұрын

Hey Kavya, thank you for appreciating our work. Do subscribe and stay connected with us. Cheers :)

@deankommu3137 5 жыл бұрын

nice video with a brief explanation

@ainunabdullah2140 6 жыл бұрын

Very good Tutorial

@edurekaIN 6 жыл бұрын

Hey Abdullah, thanks for the wonderful feedback! We're glad we could be of help. You can check out our complete Apache Spark course here: www.edureka.co/apache-spark-scala-training. Do subscribe to our channel to stay posted on upcoming tutorials. Hope this helps. Cheers!

@iiitsrikanth 7 жыл бұрын

Good work Edureka Team! Really Helpful to the beginners.

@2007selvam 7 жыл бұрын

It is very useful session.

@edurekaIN 7 жыл бұрын

+Rangasamy Selvam, thanks for checking out our tutorial! We're glad you found it useful. Here's another video that we thought you might like: kzbin.info/www/bejne/rn-kdWmZd7Csl6M. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!

@ajiasahamed8814 6 жыл бұрын

Excellent session.. Trainer is fantastic and attitude . Edureka.. You are amazing in online coaching.

@SaimanoharBoidapu 7 жыл бұрын

Very well explained. Thank you :)

@darisanarasimhareddy4311 7 жыл бұрын

I completed hadoop coaching few days back. I would like to learn spark and scala .Is this 39 videos good enough for Spark AND Scala Training?

@edurekaIN 7 жыл бұрын

+Darisa NarasimhaReddy, thanks for choosing Edureka to learn Hadoop. About your query, these tutorials will give you a basic introduction to Spark but you will miss out on the hands-on components, assignments & doubt clarification since these are pre-recorded sessions. We'd suggest that you take up our Spark course as the next step in your learning path since Hadoop + Spark will give you tremendous career growth. Would you like us to get in touch with you and assist you with your queries? Hope this helps. Cheers!

@dhruveshshah1872 7 жыл бұрын

Loved your video. Explained the basic details in a best possible way. Would wait for your new videos on this topic..Can you share the github link for the earthquake project?

@edurekaIN 7 жыл бұрын

Hey Dhruvesh, thanks for checking out our tutorial. We're glad you liked it. Please check out this blog for the code: www.edureka.co/blog/spark-tutorial/ You can fill in your request on the google form in the blog. Hope this helps. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!

@theinsanify7802 5 жыл бұрын

Thank you very much this was an amazing course .

@edurekaIN 5 жыл бұрын

Thanks for the compliment, Mahdi! We are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!

@theinsanify7802 5 жыл бұрын

@@edurekaIN i sure did .. can't miss these contents.

@efgh7906 7 жыл бұрын

great explianation and great session

@bobslave7063 6 жыл бұрын

Thanks, for amazing tutorials! Very well explained.

@u1l2t3r4a55 6 жыл бұрын

Good session!

@003vipul 7 жыл бұрын

Very useful Post.

@edurekaIN 7 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@arpit006 6 жыл бұрын

awesome learning

@edurekaIN 6 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@tejasshahpuri4827 6 жыл бұрын

Default file size of HDFS data blocks is 64MB and not 128MB. 22:23

@edurekaIN 6 жыл бұрын

Hey Tejas, thank you for watching our video and pointing this out, you are right indeed! Cheers :)

@tejasshahpuri4827 6 жыл бұрын

Thanks edureka!

@cherryandjaji5694 6 жыл бұрын

Its 128 mb according tom white hadoop definative guide !

@prabhathkota107 5 жыл бұрын

Very well explained the overview of spark

@muhammadrizwanali907 6 жыл бұрын

Excellent video from the tutor. Very well defined the concepts and technology. Really appreciable

@edurekaIN 6 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@foradvait7591 4 жыл бұрын

Excellent. Dear trainer sir, you have amazing hold on Spark concepts. Regards

@umeshsawant135 7 жыл бұрын

Excellent session!! trainer is well experienced and good teacher as well..All the best edureka..

@edurekaIN 6 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@tabitha3302 7 жыл бұрын

Excellent Video, Super explanation , we want like real time examples and use cases,,worth it,Awesome

@edurekaIN 7 жыл бұрын

Hey Tabitha, thanks for the wonderful feedback! We're glad you found it useful. Do follow our channel to stay posted on upcoming tutorials. You can also check out our complete training here: www.edureka.co/apache-spark-scala-training. Hope this helps. Cheers!

@balamuruganp2694 6 жыл бұрын

I donno anything about hadoop system.. can you give me something information about it also

@edurekaIN 6 жыл бұрын

Hey Balamurugan, you will find this video helpful, do give it a look: kzbin.info/www/bejne/o2rZap-hrpitmac Hope this helps :)

@sasikumar-gp9zd 7 жыл бұрын

hi , useful information ...What are the pre-requisite to learn Apache Spark and Scala ? is it useful to a fresher to do this course

@edurekaIN 7 жыл бұрын

+Sasi Kumar, thanks for checking out our tutorial! To learn Spark, a basic understanding of functional programming and object oriented programming will come in handy. Knowledge of Scala will definitely be a plus, but is not mandatory. Spark is normally taken up by professionals with some knowledge of Hadoop. You could either up-skill with Hadoop and then follow the learning path to Apache Spark and Scala or you can directly take up Spark training. Hadoop basics will be touched upon in our Spark training also. You can find out more about our Hadoop training here: www.edureka.co/big-data-and-hadoop and learn more about our Spark training here: www.edureka.co/apache-spark-scala-training. Hope this helps. Cheers!

@rakesh4a1 5 жыл бұрын

From where i can read this kinda core information about Spark and Hadoop....any links or way to find documents...

@edurekaIN 5 жыл бұрын

Hi Rakesh, please check this link www.edureka.co/blog/spark-tutorial/. Hope this is helpful.

@JohnWick-zc5li 6 жыл бұрын

Good Jobs Guys thanks.

@edurekaIN 6 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@manedinesh 6 жыл бұрын

NIcely explained. Thanks!

@edurekaIN 6 жыл бұрын

Hey Dinesh, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@nihanthsreyansh2480 7 жыл бұрын

Cheers to Edureka ! Very Well explained . Please Upload " Using Python With Apache Spark " Videos too !!

@edurekaIN 7 жыл бұрын

Hey Nihanth, thanks for checking out our tutorial. We're glad you liked it. We do not have such a tutorial at the moment, but we have communicated your request to our team and we might come up with it in the future. Do subscribe to our channel to stay posted. Cheers!

@nihanthsreyansh2480 7 жыл бұрын

Thanks for the reply !

@rahulmishra4111 6 жыл бұрын

Great session ..very informative .. Can you please share the sequence of videos in Apache Spark and Scala learning playlist.. Thanks in advance

@safiaghani4078 7 жыл бұрын

Hi, It is very much informative lecture ...I have a plan to write my thesis in apache spark ...could please suggest me good topic ..please it will be a great help thanks.

@edurekaIN 7 жыл бұрын

Hey! You can refer to this thread on quora: www.quora.com/I-want-to-do-my-thesis-in-Apache-Spark-What-are-a-few-topics-or-areas-for-that Hope this helps. Cheers :)

@sasidharasandcube6397 7 жыл бұрын

Good explanation and informative.

@edurekaIN 7 жыл бұрын

Thank you for appreciating our work. Do subscribe, like and share to stay connected with us. Cheers :)

@MrAK92 7 жыл бұрын

awesome class..thank u sir for proving very useful information

@edurekaIN 7 жыл бұрын

Hey Arun! Thank you for the wonderful feedback. Do subscribe yo our channel and check out our website to know more about Apache Spark training : www.edureka.co/apache-spark-scala-training Hope this helps. Thanks :)

@rajashekarpantangi9673 7 жыл бұрын

Very good Explanation. Awesome content. I have a question. When Map function is executed the results are given as a block in memory. This is fine. In the example provided in the video, the map function doesn't require any further computation( since the job is to take numbers less than 10). What about for a job like Word count. 1. How would the output of the map function be? Is it same as Map function in MapReduce (apple,1 (apple,1) (apple,1) (banana,1),(banana,1),(banana,1),(orange,1),(orange,1),(orange,1))? Or we can write the code for reducing also in the same map function giving output as ((apple,3) (orange,3)(banana,3))?? 2. And are the blocks from each data node will be sent to a single data node to execute the further computation?? (as in reduce in map reduce)?? Thanks in Advance

@edurekaIN 7 жыл бұрын

Hey Rajashekar, thanks for the wonderful feedback! We're glad you liked our tutorial. This error (Unsupported major.minor version) generally appears because of using a higher JDK during compile time and lower JDK during runtime. The default java version and you Hadoop's java version should match. For java version type in terminal >java -version this will display your current java version. For knowing the java version used by hadoop you will have to find hadoop-env.sh(in etc folder) file which contains an entry for JAVA_HOME like "export JAVA_HOME = /usr/lib/jvm/jdk1.7.0_67" or something like that. If the version of java shown by both the command are different or your hadoop-env.sh file are different then this error arises. Try setting JAVA_HOME to the path of jdk correctly to the version shown by java -version. Hope this helps. Cheers!

@rajashekarpantangi9673 7 жыл бұрын

I don't think u answered to my question. please read my question again and reply thanks.

@edurekaIN 7 жыл бұрын

Hey Rajashekar, here's the explanation: 1. Word count code in spark - map function is similar to hadoop mapreduce but not the same map(func) : Return a new distributed dataset formed by passing each element of the source through a function func. Consider the word count code - in scala val ip =sc. textFile("file:///home/edureka/Desktop/example.txt") // loading the sample example file val wordCounts = ip.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) // flatmap splits the word according to space delimiter and map, here assigns each word,a value of 1 and reduceByKey will add up the values having same key ie words here wordCounts.collect // this will give output as given below: res: Array[(String, Int)] = Array((banana,2), (orange,6), (apple,4)) 2. As spark does in memory processing, only needed data is pushed to memory and processed .Here in the example flatmap , map and reduceByKey are transformation functions used , this will do Lazy evaluation, ie data will not be pushed immediately to memory/ram (transformation function will create a linage graph of RDD's) and when ever an action (collect in the code example) happens on final RDD - spark will use the lineage details and push the required data to memory Spark does not work like hadoop - blocks are not send to a single node for processing , instead , computation/processing will happen in memory of each nodes where needed data exists and aggregated result will be send to the spark master node / client. In this way spark is faster, there is no i/o disk operations as in hadoop. Hope this helps. Cheers!

@rajashekarpantangi9673 7 жыл бұрын

Thanks!!

@pradeepp2009 6 жыл бұрын

HI all i had a doubt i had a 1 PB data to be processed in Spark. If i am trying to read whether 1PB of data will be stored in memory are not how it will process could anyone please help me,

@debk4516 7 жыл бұрын

very useful session !!!

@JohnWick-zc5li 6 жыл бұрын

In case if the File Size is 5GB or 10GB than how RDD would be helpful when there is less Memory.

@edurekaIN 6 жыл бұрын

Hey John, sorry for the delay. First of all you have distributed memory on different slave nodes, so you'll have good amount of memory. But still if the memory is full, then Spark will place the RDDs on disk. Hope this helps!