This is by far the best explanation of the MapReduce technique that I have come across. I especially like how the technique was explained with the least amount of technical jargon. This is truly an ELI5 definition for MapReduce. Good work!
@jessetanderson8 жыл бұрын
+Subramanian Iyer Thanks!
@smushti5 жыл бұрын
An innovative idea to use a pack of cards to explain the concept. Getting fundamentals right with an example is great ! Thank you
@ekdumdesi9 жыл бұрын
Great explanation !! You Mapped the Complexity and Reduced it to Simplicity = MapReduce :)
@djyotta9 жыл бұрын
Very well done - not too slow, yet very clear and well structured.
@Useruytrw10 жыл бұрын
Jesse may you get all SUCCESS and BLESSINGS
@rodrigofuentealbafuentes6954 жыл бұрын
Really good illustration.... really easy to understand for people as me that we are not computer experts.. thanks
@bit.blogger10 жыл бұрын
6:16 got a question! Would you please elaborate more on those moving data? Since there is two separate reduce task on those two nodes how does two different reduce tasks combine together? How do we choose which cards move to which node?
@jessetanderson10 жыл бұрын
That is called the shuffle sort. See more about that here www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort.
@chandrakanthpadi3 жыл бұрын
Does the actual data in the node moves or copies of the data is moved?
@mmuuuuhh9 жыл бұрын
To wrap this up: Map = Split data Reduce = Perform calculations on small chunks of data in parallel Then combine the subresults from each reduced-chunk. Is that correct?
@jessetanderson9 жыл бұрын
+mmuuuuhh Somewhat correct. I'd suggest buying the screencast to learn more about the code and how it works.
@alphacat038 жыл бұрын
+mmuuuuhh merge-sort maybe?
@ienjoysandwiches7 жыл бұрын
divide and conquer
@BULLSHXTYT6 жыл бұрын
Map transforms data too
@dennycrane29386 жыл бұрын
No no... Map = Reduce the Data, Reduce = Map the Data . .... ....
@sukanyaswamy10 жыл бұрын
Great presentation. The visualization makes it so much easier to understand.
@kabirkanha4 жыл бұрын
Never trust a man whose deck of playing cards has two 7s of Diamonds.
@vamsikrishnachiguluri85103 жыл бұрын
what a great effort, i am astonished by your teaching skills.we need teachers like you.Thanks for your best explanation .
@LetsBeHuman5 жыл бұрын
4:51 - - i'm kind of lost. so you said two papers as two sets of nodes. left is node1 and right is node2. then you said, "I have two nodes, where each node has 4 stacks of cards". I also understood that you are merging two varieties of cards in node1 and another two varieties of cards in node2. " a cluster is made of tens, hundreds or even thousands of nodes all connected by a network". so in this example, let's say two papers(nodes) are one cluster. the part I get confused is , when you say " the mapper on a node operates on that smaller part. the magic takes the mapper data from every node and brings it together on nodes all around the cluster. the reducer runs a node and knows it has access to everything with same key ". So if there are two nodes A and B that has mapper data, then the reducer part will happen on two other nodes C and D. I'm confused when you say "on nodes all around the cluster".
@scottzeta30672 жыл бұрын
The only one I watched which can clearly introduce mapreduce to newbie
@amitprakashpandeysonu3 жыл бұрын
loved the idea. Now I understood how map reduce works. Thank you.
@furkanyigitozgoren38472 жыл бұрын
It was very nice. But I could not find the video that you showed the shuffling "magic part"
@menderbrains5 жыл бұрын
Great explanation! This is how a tutor should simplify the understanding! Thanks
@doud120119909 жыл бұрын
really cool one. It is always nice to come back to the basics. Thanks for that one
@vscid8 жыл бұрын
and that's how you explain any technical concept. simple is beautiful!
@victorburnett63293 жыл бұрын
If I understand correctly, the mapper divvies up the data among nodes of the cluster and subsequently organizes the data on each node into key-value pairs, and the reducer collates the key-value pairs and distributes the pairs among the nodes.
@jessetanderson3 жыл бұрын
Almost. Hadoop divvies up the data, the mapper creates key value pairs, and the reducer processes the collated pairs.
@vivek33508 жыл бұрын
Really liked your way of presentation....."Simple" and "Informative". Thanks for sharing!!
@rahulx41110 жыл бұрын
an ounce of example is better than a ton of precept! --Thanks, this was great!
@ahmedatallahatallahabobakr87129 жыл бұрын
Your explanation is majic! Well done
@rohitgupta0259 жыл бұрын
Just wow...very nicely explained
@davidy25353 жыл бұрын
amazing explanation! I love it. Huge Thanks!
@mgpradeepa55410 жыл бұрын
The explanation is wonderful.. You made me understand things easily.
@nkoradia7 жыл бұрын
Brilliant approach to teach the concept
@prasann2610 жыл бұрын
Wow.. You have made this look so simple and easy... Thanks a ton !!!
@abhishekgowlikar10 жыл бұрын
Nice video explaining the Map Reduce Practically.
@hazelmiranda85878 жыл бұрын
Good to understand for a layman! So its quite crucial to identify the basis of the grouping i.e. the parameters based on which the data should be stored in each node. Is it possible to revisit that at a later stage?
@thezimfonia7 жыл бұрын
That was very helpful Jesse. Thank you for sharing this!!
@asin0136-y6g5 жыл бұрын
Wonderful explanation ! Made it very simple to understand! Thanks a ton!
@user-or7ji5hv8y6 жыл бұрын
best explanation of mapReduce. Thanks!
@mahari9998 жыл бұрын
Superb. Thank you Jesse Anderson
@arnavanuj3 жыл бұрын
Good illustration. 😃
@sarthakmane29774 жыл бұрын
dude, whats the name of that magic??
@tkousek17 жыл бұрын
Great explanation!! worth a bookmark. Thank you sir!
@TheDeals2buy10 жыл бұрын
Good illustration using a practical example...
@gboyex6 жыл бұрын
Great video with good explanation technique.
@anandsib10 жыл бұрын
Good Explanation with simple example
@urvisharma7243 Жыл бұрын
What if the node with clubs and hearts breaks down during the reduce operation? Will data be lost? Or will the complete Map Reduce job be repeated using the replicated data?
@jessetanderson Жыл бұрын
The data is replicated and the reduce would be re-run on a different node.
@amirkazemi251710 жыл бұрын
greta video. why is there performance issues with hadoop however?
@jessetanderson10 жыл бұрын
I'm not sure what you mean by performance issues.
@hexenkingTV6 жыл бұрын
So it follows mainly the principle of divide and conquer?
@jessetanderson6 жыл бұрын
Following that analogy, it would be divide, reassemble, and conquer.
@AnirudhJas5 жыл бұрын
Thanks Jesse! This is a wonderful video! I have 2 doubts. 1. Instead of sum, if it is a sort function, how will splitting it into nodes work? Because then every data point should be treated in one go. 2. The last part on scaling, how will different nodes working on a file and then combining based on key, be more efficient than one node working on one file? I am new to this and would appreciate some guidance and help on the same.
@jessetanderson5 жыл бұрын
1. This example goes more into sorting github.com/eljefe6a/CardSecondarySort 2. It isn't more efficient, but more scalable.
@AnirudhJas5 жыл бұрын
@@jessetanderson Thank you!
@rogerzhao11588 жыл бұрын
Nice tutorial! Easy to understand
@piyushmajgawali16114 жыл бұрын
I actually did this with cards.Thanks
@sebon113 жыл бұрын
Great explanation!
@arindamdalal39889 жыл бұрын
really nice video and explain the terms in a simple way...
@patrickamato883910 жыл бұрын
Great summary - thanks!
@tichaonamiti461610 жыл бұрын
Thats wonderful ..... you are a gret teacher
@LetsBeHuman5 жыл бұрын
When you say nodes and clusters, does an input file of 1TB should definitely be run in more than one computer or we can install hadoop in a single laptop and virtually create nodes and clusters ?
@abdulrahmankerim23778 жыл бұрын
Very useful explanation.
@trancenut8110 жыл бұрын
Excellent explanation!
@grahul0079 жыл бұрын
Excellent video explanation
@Dave-lc3cd4 жыл бұрын
Thanks for the great video!
@rodrigoborjas77274 жыл бұрын
Thank u very much for the explanation.
@gypsyry5 жыл бұрын
Best explanation. Thanks a lot
@moofymoo9 жыл бұрын
huge 1Tb file.. anyone watching this in 2065?
@NuEnque6 жыл бұрын
February 2019 (Go RAMS)
@anmjubaer5 жыл бұрын
@@NuEnque July 21 2019
@devanshsrivastava42654 жыл бұрын
feb 2020
@jonathannimmo92934 жыл бұрын
more like 2025
@omrajpurkar4 жыл бұрын
August 11, 2020!!
@MrSpun10908 жыл бұрын
Thanks this really helped me for my exam !!
@MuhammadFarhan-ny7tj3 жыл бұрын
Which music is this in start of this video
@jessetanderson3 жыл бұрын
I'm not sure where they got it from.
@IvanRubinson7 жыл бұрын
Well, that explains the interview question: How would you sort a ridiculously large amount of data?
@sarthakmane29774 жыл бұрын
great video by the way!!
@amandeepak86408 жыл бұрын
Thank You sir for such a wonderful explanation. :-)
@vincentvimard90199 жыл бұрын
just great explanation !
@vigneshrachha83627 жыл бұрын
Superb video....thanks a lot sir
@ZethWeissman8 жыл бұрын
Might be a bit clearer to understand the advantage of this if instead of having the same person run the cards on each node sequentially and have two people do it at the same time. Or go further and have four people show it. Then each person can grab all the cards of the suit from each node and can sum their values up, again, at the same time. Show a timer showing how long it took for the one person to do everything on one node and the time of having all four running at the same time.
@rrckguy10 жыл бұрын
Great lesson. Thanks..
@MincongHuang9 жыл бұрын
Great video, thanks for sharing !
@alextz43076 жыл бұрын
Very nice, thanks a lot.
@irishakazakyavichyus6 жыл бұрын
thanks! that is an easy explanation!
@ajuhaseeb9 жыл бұрын
Aiwa. Simply explained.
@Luismunoz-jf2zv10 жыл бұрын
Now I get it, thanks!
@SamHopperton7 жыл бұрын
Brilliant - thanks!
@wetterauerbub7 жыл бұрын
Hi Jesse, can I use map reduce only on document-oriented DBs, or also e.g. on Graph databases?
@jessetanderson7 жыл бұрын
Hessebub you can use it for both, but the processing Algorithms are very different between them.
@wetterauerbub7 жыл бұрын
Alright, thanks very much for answering & doing the video in the first place!
@周大鹏-o1j9 жыл бұрын
Great video
@abdellahi.heiballa5 жыл бұрын
my friend: i wish i had ur calm we having an exam tomorrow you watching how playing cards....
@devalpatel72435 жыл бұрын
Hat's of man. very well understood
@hemanthpeddi41295 жыл бұрын
awesome explanation super
@bijunair380710 жыл бұрын
Good explanation
@guessmedude96366 жыл бұрын
i like this technique nice keep it up
@logiprabakar9 жыл бұрын
Wonderful, you have used the right tool(cards) and made it simpler. Thank you. Am i correct in saying, in this manual shuffle and sort, the block size is 52 cards where as in a node it would be 128.
@__-to3hq5 жыл бұрын
wow this was great
@thiery5727 жыл бұрын
Interesting. Now I want to request a bunny comes out from a hat.
@iperezgenius7 жыл бұрын
Brilliant!
@yash66807 жыл бұрын
awesome
@RawwestHide7 жыл бұрын
thanks
@pamgg16639 жыл бұрын
excellent!!!
@ZFlyingVLover5 жыл бұрын
The 'scalability' of hadoop has to do with the fact that the data being processed CAN be broken up and processed in parallel in chunks and then the results can be tallied by key. It's not an inherent ability of the tech other than HDFS itself. Like most technology or jobs for that matter the actual 'process' is simple it's wading through the industry specific terminology that has makes it unnecessarily complicated. Hell you can make boiling an egg or making toast complicated too if that's your intent.
@jessetanderson5 жыл бұрын
Sorry, you misunderstood.
@ZFlyingVLover5 жыл бұрын
@@jessetanderson I didn't misunderstand you. Your explanation was great.
@mudassarm309 жыл бұрын
spade clubs ... I think you used the wrong suite names for them :)
@niamatullahbakhshi93718 жыл бұрын
so nice
@lerneninverschiedenenforme751310 жыл бұрын
little bit long explanation. could be done faster (e.g. card-sorting). But after watching, you know what's happening. So all thumbs up!
@covelus7 жыл бұрын
awesome
@sumantabanerjee97286 жыл бұрын
Easiest explanation.
@Nyocurio6 жыл бұрын
Why did they come up with such a terribly unintuitive name as "MapReduce" ??? It's basically just "bin by attribute, then process each bin in parallel". BinProcess.
@jessetanderson6 жыл бұрын
It's a well-known functional programming paradigm.
@varshamehra81645 жыл бұрын
Cool
@haroonrasheed97399 жыл бұрын
Great
@kart00nher08 жыл бұрын
IMO the key takeaway from the video is that MR only works when: a. There is one really large data set (e.g. a giant stack of playing cards) b. Each row in the data set can be processed independently. (e.g. sorting or counting playing cards does not require knowing the sequence of cards in the deck - each card is processed based on information on the face of card) To process real-world problems using MR, the data sets will need to be massaged and joined to satisfy the criteria listed above. This is where all the challenges lie. MR itself is the easy part.
@jessetanderson8 жыл бұрын
+Subramanian Iyer agreed MR is difficult, but the understanding of how to use and manipulate the data is far more complex. This is why I think data engineering should be a specific discipline and job title. www.jesse-anderson.com/big-data-engineering/
@glennt19625 жыл бұрын
This is a great example video without the accent to deal with.
@gregrell24418 жыл бұрын
This is just a sales pitch
@jessetanderson8 жыл бұрын
I think the description is pretty clear that it's an extended preview of the screencast.