Hadoop

Рет қаралды 30,480

Altamira TC

Күн бұрын

Пікірлер: 25

@ravibharathiii 12 жыл бұрын

One of the best Hadoop presentation.Thanks a lot !

@misterbruno 12 жыл бұрын

Good information. I liked the overlay of slides over the video. I wouldn't think that would work but it does. Sound is excellent except for questions from some members of the audience and when the speaker turns his back.

@srinivassr1985 12 жыл бұрын

Thx.. Best map Reduce Tutorial I have ever watched..

@piyushmishra1289 12 жыл бұрын

Ultimate video for Hadoop Overview .. must watch.

@mailvkjain 12 жыл бұрын

Awesome Video loved the presentation and ease with which its presented

@scottleber 11 жыл бұрын

Karthik, in Hadoop the replication is for data redundancy. It also provides the map/reduce framework with multiple places to schedule mappers, right? i.e. with default replication of 3, a mapper for a given data block can be scheduled on any of the 3 different machines where that block is located. As for how Hadoop does block splits, it basically splits at the block size, regardless of natural record boundaries. The record readers in the map phase know how to retrieve records that were split.

@MohammadAdnanRaza 11 жыл бұрын

what a presentation. very nice. Thanks for sharing.

@scottleber 11 жыл бұрын

Karthik, in general the fact that the data is replicated 3 times doesn't affect performance, since map/reduce processes each block only once in the map phase. But generally yes, the more data you have, and thus the more data which must be scanned by the mappers, the longer your map/reduce job will take to run. However, performance depends on many factors such as the size of your cluster, how busy the cluster is at the moment, etc.

@psjrajarajan 12 жыл бұрын

thank you so much, great overview to hadoop

@saikarthik16 11 жыл бұрын

Thanks for such informative demo.!! I have couple of questions like. 1. The Data itself is very big...(For ex: Google processes 20 PB of data per data). In hadoop we are replicating the data 3 times. Here it will become 60 PB of data.. Won't it affect the processing performance. I'm new to this., If my perception is wrong please correct me.!! 2. Can you please give me an example, how unstructured data split into blocks & stored.And how queried..?? Thanks

@scottleber 12 жыл бұрын

The description now includes a link to the code samples on GitHub

@stholy32 12 жыл бұрын

super good vid !!! many thx !!!

@nebzero1990 12 жыл бұрын

is the code available?

@scottleber 12 жыл бұрын

For some reason I am having a hard time pasting the actual URL and getting it to work properly (it keeps expanding into a bunch of hex characters). If you go to github.com / sleberknight then choose the project called basic-hadoop-examples that should get you there

@scottleber 12 жыл бұрын

The sample code is available on GitHub at github.com/sleberknight/basic-hadoop-examples

@mjshaheed 10 жыл бұрын

It's been more than 3 years since this video was uploaded but in the mapreduce wordcount program, line 31 is unnecessary. 'word' is nowhere used. The code would work just fine without that line!

@paderborner5213 9 жыл бұрын

mjshaheed You're right. Some guy in the audience noticed it as well @30:30 :)