Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Apache Spark Certification Training Curriculum, Visit the website: bit.ly/2KHSmII
@arunasingh86172 жыл бұрын
Well explained the concept of Lazy Evaluation!
@edurekaIN2 жыл бұрын
Good To know our videos are helping you learn better :) Stay connected with us and keep learning ! Do subscribe the channel for more updates : )
@sarthakverma59214 жыл бұрын
his teaching is pure gold
@ajanthanmani17 жыл бұрын
1.5x for people who don't have 2 hours to watch this video :)
@edurekaIN7 жыл бұрын
Whatever rocks your boat, Ajanthan! :) Since we have learners from all backgrounds and requirements we make our tutorials as detailed as possible. Thanks for checking out our tutorial.Do subscribe to stay posted on upcoming tutorials. We will be coming up with shorter tutorial formats too in the future. Cheers!
@draxutube Жыл бұрын
SO GOOD TO WATCH I UNDERSTANDED SO MUCH
@moview697 жыл бұрын
you are undoubtedly the king of all instructors...you rock man
@AdalarasanSachithanantham5 ай бұрын
1st time really impressed in how the way you are teaching God bless you
@vinulovesutube7 жыл бұрын
Before starting this session I had no clue of Bigdata nor Spark . Now I have pretty decent insight . Thanks
@edurekaIN7 жыл бұрын
Thank you for watching our videos and appreciating our work. Do subscribe to our channel and stay connected with us. Cheers :)
@JanacMeena5 жыл бұрын
Jump to 21:25 for the example
@daleoking16 жыл бұрын
This makes things more clear after my Data Science class lol. Thank you so much for a great tutorial, I think this will sharpen me up.
@edurekaIN6 жыл бұрын
Hey, thank you for watching our video. Do subscribe and stay connected with us. Cheers :)
@ranjeetkumar20512 жыл бұрын
thank you sir for making this video
@edurekaIN2 жыл бұрын
Most welcome
@kag19840077 жыл бұрын
So far this is 4th course I am watching, Instructors from Edureka are amazing. Very well explained RDD in first half. Worth watching !!!
@edurekaIN7 жыл бұрын
Hey Kunal, thanks for the wonderful feedback! We're glad we could be of help. We thought you might also like this tutorial: kzbin.info/www/bejne/q3XComeIopmcaLM. You can also check out our blogs here: www.edureka.co/blog Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@Successtalks22444 жыл бұрын
I love this edureka tutorials very much
@nshettys4 жыл бұрын
Brilliant Explanation!!! Thank you
@leojames226 жыл бұрын
One of the best video I ever watched. MapReduce was not explained in this way wherever i checked. Really thank you to post this. Use Cases are really good. Worth the time watching almost 2 hrs. 5 star to you the instructor. Very impressed.
@taniakhan717 жыл бұрын
thank you so much for this wonderful tutorial.. I have a question.. while discussing about lazy evaluation, you mentioned that for B1 to B6 RDD memory is allocated, but they remain empty till collection is invoked. My qs is.. what is the size size of the memory that is allocated for each RDD? How does the framework predict the size before hand for each RDD without processing the data? eg, B4, B5 , B6 might have different sizes and smaller or equal to B1, B2, B3 respectively... I didn't get this part. Could you please clarify?
@edurekaIN7 жыл бұрын
What is the size of the memory that is allocated for each RDD? 1. There is no easy way to estimate the RDD size and approximate methods were used in Spark Size Estimator's methods). 2. By default, Spark uses 60% of the configured executor memory (--executor- memory) to cache RDDs. The remaining 40% of memory is available for any objects created during task execution. In case your tasks slow down due to frequent garbage-collecting in JVM or if JVM is running out of memory, lowering this value will help reduce the memory consumption. How does the framework predict the size before hand for each RDD without processing the data? 1. One can determine how much memory allocated to each RDD by looking at the Spark Context logs on the driver program. 2. A recommended approach when using YARN would be to use --num-executors 30 --executor-cores 4 --executor-memory 24G. Which would result in YARN allocating 30 containers with executors, 5 containers per node using up 4 executor cores each. The RAM per container on a node 124/5= 24GB (roughly). Hope this helps :)
@kadhirn47924 жыл бұрын
Great Video. He is my tutor for ML
@yitayewsolomon49063 жыл бұрын
thanks very much, I'm biggner for data science i got clear explanation for spark thanks alooot.
@edurekaIN3 жыл бұрын
Thank you so much for the review ,we appreciate your efforts : ) We are glad that you have enjoyed your learning experience with us .Thank You for being a part of our Edureka team : ) Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@arunasingh86172 жыл бұрын
I have a question here, if we have almost 60M data then creating RDD while processing the data will helps in handling such huge data or some other processing steps required?
@mix-fz7ln Жыл бұрын
Awesome session! Hats off to the instructor, I was searching hard to understand spark and nothing pop up to me and explained this session amazing I love how the instructor clarify every concept and frames you are amazing!
@edurekaIN Жыл бұрын
Good to know our contents and videos are helping you learn better . We are glad to have you with us ! Please share your mail id to send the data sheets to help you learn better :) Do subscribe the channel for more updates 😊 Hit the bell icon to never miss an update from our channel
@Yashyennam4 жыл бұрын
This is top notch 👍👍👌
@suvradeepbanerjee6801 Жыл бұрын
Great tutorial. Really explained things up! thanks a lot
@edurekaIN Жыл бұрын
You're welcome 😊 Glad you liked it!! Keep learning with us..
@areejabdelaal44465 жыл бұрын
thanks a lot!
@shamla087 жыл бұрын
Very detailed presentation and a very good instructor! Thank you!
@niveditha-75554 жыл бұрын
Wow!! extremely impressed with this explanation
@moneymaker23286 жыл бұрын
Excellent session no words to describe anything about it ...trainer is too good...worth watching
@edurekaIN6 жыл бұрын
Hey Apurv, thank you for watching our video and appreciating our effort. Do subscribe and stay connected with us. Cheers :)
@sahanashenoy58954 жыл бұрын
Amazing way of explanation. crystal clear.. way to go edurekha
@krutikachauhan32993 жыл бұрын
It was a totally new topic for me.. but still I was able to grasp it easily. Thanks to the whole team.
@edurekaIN3 жыл бұрын
Hey:) Thank you so much for your sweet words :) Really means a lot ! Glad to know that our content/courses is making you learn better :) Our team is striving hard to give the best content. Keep learning with us -Team Edureka :) Don't forget to like the video and share it with maximum people:) Do subscribe the channel:)
@rmuru7 жыл бұрын
Excellent session...very informative..trainer is too good and explained all concepts in detail...thanks lot
@hymavathikalva89592 жыл бұрын
Very helpful section. Now I have some idea on hadoop. Nice explanation sir. Tq
@edurekaIN2 жыл бұрын
Thank you so much : ) We are glad to be a part of your learning journey. Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@sunithachalla78404 жыл бұрын
awesome session...
@nileshdhamanekar45456 жыл бұрын
Awesome session! Hats off to the instructor, you are amazing! The RDD explanation was the best
@edurekaIN6 жыл бұрын
Hey Nilesh, we are delighted to know that you liked our video. Do subscribe to our channel and stay connected with us. Cheers :)
@AdalarasanSachithanantham5 ай бұрын
Superb 🎉
@ramsp355 жыл бұрын
This one of the best and simplified Spark tutorial I have come across. 5 stars...!!!
@edurekaIN5 жыл бұрын
Thank you for appreciating our efforts, Ramanathan. We strive to provide quality tutorials so that people can learn easily. Do subscribe, like and share to stay connected with us. Cheers!
@girish904 жыл бұрын
Excellant session!
@joa1paulo_5 жыл бұрын
Thanks for sharing!
@srividyaus7 жыл бұрын
This is the best spark demo I have ever heard. Very clear and planned way of explaining things! Have taken up Hadoop basics classes with Edureka, which are great! Planning to enroll for spark as well. Would you explain more realtime use cases in spark training? Hadoop basics doesn't have use case explanation, which is the only drawback of the course! Great going , thanks a lot for this video.
@edurekaIN7 жыл бұрын
+Srividyaus thanks for the thumbs up! :) We're glad you liked our tutorial and the learning experience with Edureka! We have communicated your feedback to our team and will work towards coming up with more real time use case videos on top of existing hands-on projects. Meanwhile, you might also find this video relevant: kzbin.info/www/bejne/sJanhquVf8tka5Y. Do subscribe to our channel to stay posted on upcoming videos and please feel free to reach out in case you need any assistance. Cheers!
@theabhishekkumardotcom7 жыл бұрын
Thank you for the quick introduction on the architecture of spark....
@edurekaIN7 жыл бұрын
Hey Abhishek, Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@seenaiahpedipina11657 жыл бұрын
Good explanation and useful tutorial. Conveyed a lot just in two hours. Thank you edureka !
@edurekaIN7 жыл бұрын
Hey Srinu, thanks for the wonderful feedback! We're glad we could be of help. Here's another video that we thought you might like: kzbin.info/www/bejne/q3XComeIopmcaLM. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@laxmipriyapradhan80872 жыл бұрын
thank u sir , i just love ur teching style. is there any other vdos of urs in youtube, plz give that link
@edurekaIN2 жыл бұрын
Hi Laxmipriya glad to hear this from you please feel free to visit our channel for more informative videos and don't forget to subscribe to get notified on our new videos
@sagarsinghrajpoot38325 жыл бұрын
Awesome video sir 🙂
@manishdev715 жыл бұрын
Excellent session.
@tsuyoshikittaka66367 жыл бұрын
wonderful tutorial ! thank you :)
@JanacMeena5 жыл бұрын
26:34 Why do we consider the step for incrementing our indicies as a bottle neck, but we don't consider sorting as a bottle neck? EDIT: I think I understand the bottleneck. If we don't know what the all the possible words are, then we can't have a simple array index based counter. Instead we we would use a hashmap, and would need to check for the existence of the word in the hashmap. for each word in the file if the word is in our hashmap increment hashmap if the word is not in our hashmap create the index, and increase the hashmap This is a looping bottleneck for sure
@gaurisharma90396 жыл бұрын
Kindly put a video on spark pipelining. I would really appreciate that. Thanks much in advance
@gyanpattnaik5207 жыл бұрын
Its an amazing video . Gives a complete concept of spark as well as its implementation in real world. Thanks
@hemanthgowda58556 жыл бұрын
Good lecture. An action is a trigger for lazy eval to start right? .collect() is not equivalent to printing..
@edurekaIN6 жыл бұрын
Hey Hemanth, sorry for the delay. Yes, action is a trigger for lazy evaluations to start. To print all elements on the driver, one can use the collect() method to first bring the RDD to the driver node. Hope this helps!
@taniakhan717 жыл бұрын
Thank you for the explanation.
@edurekaIN7 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@coolprashantmailbox7 жыл бұрын
very useful video for beginners.. awesome.thank u
@ManishKumar-ni4pi6 жыл бұрын
The way of representation is wonderful. Thank you
@edurekaIN6 жыл бұрын
Thanks for the compliment Manish! We are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!
@umashankarsaragadam82057 жыл бұрын
Excellent explanation ...Thank you
@sagnikmukherjee51084 жыл бұрын
Its an awesome session. The way you explain everything with examples, its remarkable. Thanks mate.
@edurekaIN4 жыл бұрын
Thanks for the wonderful feedback! We are glad we could help. Do subscribe to our channel to stay posted on upcoming tutorials.
@chandan02srivastav5 жыл бұрын
Very well explained!! Amazing Tutor
@SkandanKA4 жыл бұрын
Nice, brief explanation @edureka. Keep going with more such good tutorials.. 👍
@IsabellaYuZhou5 жыл бұрын
1:01:07
@nagendrag24415 жыл бұрын
Explanation is very good.. Thank you Now i understood overview completely...
@shubhamshingi96184 жыл бұрын
wow. Such an amazing content. Thanks, edureka for this
@pritishkumar65146 жыл бұрын
Loved the way how the trainer explained about it. Watched for the first time and it cleared all my doubts. Thanks, edureka.
@edurekaIN6 жыл бұрын
Thanks for the compliment Pritish! we are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!
@ankitas72935 жыл бұрын
This Shivank sirs voice .. he is a very very good trainer.
@nikitagupta61747 жыл бұрын
Hi, I have few questions: 1.) Its about difference between Hadoop and Spark, you told that there are lot of i/p o/p operations in hadoop whereas in spark you said it happens only once when blocks are copied in memory and rest of the operations are performed in memory itself, so i wanted to ask when entire operation is completed so i/p o/p operation might be again required to copy the result to disk or result stays in memory itself in case of spark? 2.) Also, when we use map and reduce functions in spark python, how does those things works then? All the map operations are done in memory like that of hadoop? but what about reduce thing as reduce will merge result of two blocks so, don't you think that again network overhead will occur when we pass data from another disk to the disk in which we need to do reduce operation and the that disk will again copy that data to its memory? Can you explain how exactly it will work in case of spark?
@edurekaIN7 жыл бұрын
Hey Nikita, thanks for checking out our tutorial! Here are the answers to your questions: 1. Spark doesn't work in a strict map-reduce manner and map output is not written to disk unless it is necessary. To disk are written shuffle files. It doesn't mean that data after the shuffle is not kept in memory. Shuffle files in Spark are written mostly to avoid re-computation in case of multiple downstream actions. The difference between Spark storing data locally (on executors) and Hadoop MapReduce is that: i.The partial results (after computing ShuffleMapStages) are saved on local hard drives not HDFS which is a distributed file system with a very expensive saves. ii.Only some files are saved to local hard drive (after operations being pipelined) which does not happen in Hadoop MapReduce that saves all maps to HDFS. 2. Also, when we use map and reduce functions in spark python, The Spark Python API (PySpark) exposes the Spark programming model to Python (Spark Programming Guide). PySpark is built on top of Spark's Java API. Data is processed in Python and cached / shuffled in the JVM. 3. All the map operations are done in memory like that of hadoop? Yes, all the operations will be done in memory only and all the reduce operations also will be done in the same way as Hadoop because data is processed in Python and cached / shuffled in the JVM. Hope this helps. Cheers!
@1203santhu7 жыл бұрын
Session is really fantastic and informative...
@edurekaIN7 жыл бұрын
Hey Santhosh, thanks for checking out our tutorial! We're glad you found it useful. :) Here's another video that we thought you might like: kzbin.info/www/bejne/q3XComeIopmcaLM Do subscribe to our channel to stay posted on upcoming videos. Cheers!
@puneethunplugged7 жыл бұрын
Thank you for the crisp session. Good content and flow. Appreciate it.
@kavyaa10536 жыл бұрын
thanks for this video .
@edurekaIN6 жыл бұрын
Hey Kavya, thank you for appreciating our work. Do subscribe and stay connected with us. Cheers :)
@deankommu31375 жыл бұрын
nice video with a brief explanation
@ainunabdullah21406 жыл бұрын
Very good Tutorial
@edurekaIN6 жыл бұрын
Hey Abdullah, thanks for the wonderful feedback! We're glad we could be of help. You can check out our complete Apache Spark course here: www.edureka.co/apache-spark-scala-training. Do subscribe to our channel to stay posted on upcoming tutorials. Hope this helps. Cheers!
@iiitsrikanth7 жыл бұрын
Good work Edureka Team! Really Helpful to the beginners.
@2007selvam7 жыл бұрын
It is very useful session.
@edurekaIN7 жыл бұрын
+Rangasamy Selvam, thanks for checking out our tutorial! We're glad you found it useful. Here's another video that we thought you might like: kzbin.info/www/bejne/rn-kdWmZd7Csl6M. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@ajiasahamed88146 жыл бұрын
Excellent session.. Trainer is fantastic and attitude . Edureka.. You are amazing in online coaching.
@SaimanoharBoidapu7 жыл бұрын
Very well explained. Thank you :)
@darisanarasimhareddy43117 жыл бұрын
I completed hadoop coaching few days back. I would like to learn spark and scala .Is this 39 videos good enough for Spark AND Scala Training?
@edurekaIN7 жыл бұрын
+Darisa NarasimhaReddy, thanks for choosing Edureka to learn Hadoop. About your query, these tutorials will give you a basic introduction to Spark but you will miss out on the hands-on components, assignments & doubt clarification since these are pre-recorded sessions. We'd suggest that you take up our Spark course as the next step in your learning path since Hadoop + Spark will give you tremendous career growth. Would you like us to get in touch with you and assist you with your queries? Hope this helps. Cheers!
@dhruveshshah18727 жыл бұрын
Loved your video. Explained the basic details in a best possible way. Would wait for your new videos on this topic..Can you share the github link for the earthquake project?
@edurekaIN7 жыл бұрын
Hey Dhruvesh, thanks for checking out our tutorial. We're glad you liked it. Please check out this blog for the code: www.edureka.co/blog/spark-tutorial/ You can fill in your request on the google form in the blog. Hope this helps. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@theinsanify78025 жыл бұрын
Thank you very much this was an amazing course .
@edurekaIN5 жыл бұрын
Thanks for the compliment, Mahdi! We are glad you loved the video. Do subscribe to the channel and hit the bell icon to never miss an update from us in the future. Cheers!
@theinsanify78025 жыл бұрын
@@edurekaIN i sure did .. can't miss these contents.
@efgh79067 жыл бұрын
great explianation and great session
@bobslave70636 жыл бұрын
Thanks, for amazing tutorials! Very well explained.
@u1l2t3r4a556 жыл бұрын
Good session!
@003vipul7 жыл бұрын
Very useful Post.
@edurekaIN7 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@arpit0066 жыл бұрын
awesome learning
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@tejasshahpuri48276 жыл бұрын
Default file size of HDFS data blocks is 64MB and not 128MB. 22:23
@edurekaIN6 жыл бұрын
Hey Tejas, thank you for watching our video and pointing this out, you are right indeed! Cheers :)
@tejasshahpuri48276 жыл бұрын
Thanks edureka!
@cherryandjaji56946 жыл бұрын
Its 128 mb according tom white hadoop definative guide !
@prabhathkota1075 жыл бұрын
Very well explained the overview of spark
@muhammadrizwanali9076 жыл бұрын
Excellent video from the tutor. Very well defined the concepts and technology. Really appreciable
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@foradvait75914 жыл бұрын
Excellent. Dear trainer sir, you have amazing hold on Spark concepts. Regards
@umeshsawant1357 жыл бұрын
Excellent session!! trainer is well experienced and good teacher as well..All the best edureka..
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@tabitha33027 жыл бұрын
Excellent Video, Super explanation , we want like real time examples and use cases,,worth it,Awesome
@edurekaIN7 жыл бұрын
Hey Tabitha, thanks for the wonderful feedback! We're glad you found it useful. Do follow our channel to stay posted on upcoming tutorials. You can also check out our complete training here: www.edureka.co/apache-spark-scala-training. Hope this helps. Cheers!
@balamuruganp26946 жыл бұрын
I donno anything about hadoop system.. can you give me something information about it also
@edurekaIN6 жыл бұрын
Hey Balamurugan, you will find this video helpful, do give it a look: kzbin.info/www/bejne/o2rZap-hrpitmac Hope this helps :)
@sasikumar-gp9zd7 жыл бұрын
hi , useful information ...What are the pre-requisite to learn Apache Spark and Scala ? is it useful to a fresher to do this course
@edurekaIN7 жыл бұрын
+Sasi Kumar, thanks for checking out our tutorial! To learn Spark, a basic understanding of functional programming and object oriented programming will come in handy. Knowledge of Scala will definitely be a plus, but is not mandatory. Spark is normally taken up by professionals with some knowledge of Hadoop. You could either up-skill with Hadoop and then follow the learning path to Apache Spark and Scala or you can directly take up Spark training. Hadoop basics will be touched upon in our Spark training also. You can find out more about our Hadoop training here: www.edureka.co/big-data-and-hadoop and learn more about our Spark training here: www.edureka.co/apache-spark-scala-training. Hope this helps. Cheers!
@rakesh4a15 жыл бұрын
From where i can read this kinda core information about Spark and Hadoop....any links or way to find documents...
@edurekaIN5 жыл бұрын
Hi Rakesh, please check this link www.edureka.co/blog/spark-tutorial/. Hope this is helpful.
@JohnWick-zc5li6 жыл бұрын
Good Jobs Guys thanks.
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@manedinesh6 жыл бұрын
NIcely explained. Thanks!
@edurekaIN6 жыл бұрын
Hey Dinesh, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@nihanthsreyansh24807 жыл бұрын
Cheers to Edureka ! Very Well explained . Please Upload " Using Python With Apache Spark " Videos too !!
@edurekaIN7 жыл бұрын
Hey Nihanth, thanks for checking out our tutorial. We're glad you liked it. We do not have such a tutorial at the moment, but we have communicated your request to our team and we might come up with it in the future. Do subscribe to our channel to stay posted. Cheers!
@nihanthsreyansh24807 жыл бұрын
Thanks for the reply !
@rahulmishra41116 жыл бұрын
Great session ..very informative .. Can you please share the sequence of videos in Apache Spark and Scala learning playlist.. Thanks in advance
@safiaghani40787 жыл бұрын
Hi, It is very much informative lecture ...I have a plan to write my thesis in apache spark ...could please suggest me good topic ..please it will be a great help thanks.
@edurekaIN7 жыл бұрын
Hey! You can refer to this thread on quora: www.quora.com/I-want-to-do-my-thesis-in-Apache-Spark-What-are-a-few-topics-or-areas-for-that Hope this helps. Cheers :)
@sasidharasandcube63977 жыл бұрын
Good explanation and informative.
@edurekaIN7 жыл бұрын
Thank you for appreciating our work. Do subscribe, like and share to stay connected with us. Cheers :)
@MrAK927 жыл бұрын
awesome class..thank u sir for proving very useful information
@edurekaIN7 жыл бұрын
Hey Arun! Thank you for the wonderful feedback. Do subscribe yo our channel and check out our website to know more about Apache Spark training : www.edureka.co/apache-spark-scala-training Hope this helps. Thanks :)
@rajashekarpantangi96737 жыл бұрын
Very good Explanation. Awesome content. I have a question. When Map function is executed the results are given as a block in memory. This is fine. In the example provided in the video, the map function doesn't require any further computation( since the job is to take numbers less than 10). What about for a job like Word count. 1. How would the output of the map function be? Is it same as Map function in MapReduce (apple,1 (apple,1) (apple,1) (banana,1),(banana,1),(banana,1),(orange,1),(orange,1),(orange,1))? Or we can write the code for reducing also in the same map function giving output as ((apple,3) (orange,3)(banana,3))?? 2. And are the blocks from each data node will be sent to a single data node to execute the further computation?? (as in reduce in map reduce)?? Thanks in Advance
@edurekaIN7 жыл бұрын
Hey Rajashekar, thanks for the wonderful feedback! We're glad you liked our tutorial. This error (Unsupported major.minor version) generally appears because of using a higher JDK during compile time and lower JDK during runtime. The default java version and you Hadoop's java version should match. For java version type in terminal >java -version this will display your current java version. For knowing the java version used by hadoop you will have to find hadoop-env.sh(in etc folder) file which contains an entry for JAVA_HOME like "export JAVA_HOME = /usr/lib/jvm/jdk1.7.0_67" or something like that. If the version of java shown by both the command are different or your hadoop-env.sh file are different then this error arises. Try setting JAVA_HOME to the path of jdk correctly to the version shown by java -version. Hope this helps. Cheers!
@rajashekarpantangi96737 жыл бұрын
I don't think u answered to my question. please read my question again and reply thanks.
@edurekaIN7 жыл бұрын
Hey Rajashekar, here's the explanation: 1. Word count code in spark - map function is similar to hadoop mapreduce but not the same map(func) : Return a new distributed dataset formed by passing each element of the source through a function func. Consider the word count code - in scala val ip =sc. textFile("file:///home/edureka/Desktop/example.txt") // loading the sample example file val wordCounts = ip.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) // flatmap splits the word according to space delimiter and map, here assigns each word,a value of 1 and reduceByKey will add up the values having same key ie words here wordCounts.collect // this will give output as given below: res: Array[(String, Int)] = Array((banana,2), (orange,6), (apple,4)) 2. As spark does in memory processing, only needed data is pushed to memory and processed .Here in the example flatmap , map and reduceByKey are transformation functions used , this will do Lazy evaluation, ie data will not be pushed immediately to memory/ram (transformation function will create a linage graph of RDD's) and when ever an action (collect in the code example) happens on final RDD - spark will use the lineage details and push the required data to memory Spark does not work like hadoop - blocks are not send to a single node for processing , instead , computation/processing will happen in memory of each nodes where needed data exists and aggregated result will be send to the spark master node / client. In this way spark is faster, there is no i/o disk operations as in hadoop. Hope this helps. Cheers!
@rajashekarpantangi96737 жыл бұрын
Thanks!!
@pradeepp20096 жыл бұрын
HI all i had a doubt i had a 1 PB data to be processed in Spark. If i am trying to read whether 1PB of data will be stored in memory are not how it will process could anyone please help me,
@debk45167 жыл бұрын
very useful session !!!
@JohnWick-zc5li6 жыл бұрын
In case if the File Size is 5GB or 10GB than how RDD would be helpful when there is less Memory.
@edurekaIN6 жыл бұрын
Hey John, sorry for the delay. First of all you have distributed memory on different slave nodes, so you'll have good amount of memory. But still if the memory is full, then Spark will place the RDDs on disk. Hope this helps!
@JarinTasnimAva6 жыл бұрын
very well-described ! Amazing !
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@MrAlfonsogug6 жыл бұрын
You're the best!
@deepikapatra10654 жыл бұрын
amazing video! too much concepts got cleared in just 2 hours:)Keep up the good work,edureka!