033 Shuffle and Sort in hadoop

Рет қаралды 30,287

videoonlinelearning

Күн бұрын

Пікірлер: 16

@sunnyjain4774 6 жыл бұрын

Already read this in the definite hadoop. Can you exlain how partitions takes in spill. Thanks

@nsb5467 9 жыл бұрын

Hi, Can you explain why using three files for the first reducer split increases disk I/O efficiency?

@judesoosai8648 6 жыл бұрын

@Nachiket Bhoyar I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly. When there are 40 files, it goes like this ... merge 4 files -> 1 file (round 1) merge 10 files ->1 file (round 2) merge 10 files -> 1 file (round 3) merge 10 files -> 1 file (round 4) At this point we have 4 merged files and 6 unmerged files (totally 10). In round 5, these 10 files will be merged in the reducer memory. However I am not clear how this logic would make the disk i/o efficient.

@judesoosai8648 6 жыл бұрын

I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly. When there are 40 files, it goes like this ... merge 4 files -> 1 file (round 1) merge 10 files ->1 file (round 2) merge 10 files -> 1 file (round 3) merge 10 files -> 1 file (round 4) At this point we have 4 merged files and 6 unmerged files (totally 10). In round 5, these 10 files will be merged in the reducer memory. However I am not clear how this logic would make the disk i/o efficient.

@akashgaikwad6847 7 жыл бұрын

How is disk I/O efficiency increased taking first 3 files into one and then processing later by batches of ten? Files are already moved over network so how will they increase I/O efficiency? how is the example given at the last related.Please elaborate.

@its_joel7324 3 жыл бұрын

Thankyou very much for this..

@mahendarkusuma 7 жыл бұрын

Very good presentation, can you please tell me Which tool are you using to generate the simulations

@shaikhmohammedatif2391 3 жыл бұрын

have u made another channel?

@kirantvbk 6 жыл бұрын

When files spill over to the disk and then data gets partitioned and sorted. Does it need to read the data into memory again and do sort and write back? Or does it in disk?

@mohammadsadaquat3624 8 жыл бұрын

very nyc explanation. Keep posting newer contents. Thanx

@rytmf 4 жыл бұрын

Great explanation. Ty

@JMK2928 2 жыл бұрын

Is there any notes

@charleygrossman8368 9 жыл бұрын

Hello, I have a question. Speaking for the sort phase, would you consider the theoretical sort (first one) with three even splits to be a bucket sort? And the actual sort (second one) that is implemented, why does it begin with three partitions, then 10, 10, and finally the remaining 7 files? Thank you sir.