Already read this in the definite hadoop. Can you exlain how partitions takes in spill. Thanks
@nsb54679 жыл бұрын
Hi, Can you explain why using three files for the first reducer split increases disk I/O efficiency?
@judesoosai86486 жыл бұрын
@Nachiket Bhoyar I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly. When there are 40 files, it goes like this ... merge 4 files -> 1 file (round 1) merge 10 files ->1 file (round 2) merge 10 files -> 1 file (round 3) merge 10 files -> 1 file (round 4) At this point we have 4 merged files and 6 unmerged files (totally 10). In round 5, these 10 files will be merged in the reducer memory. However I am not clear how this logic would make the disk i/o efficient.
@judesoosai86486 жыл бұрын
I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly. When there are 40 files, it goes like this ... merge 4 files -> 1 file (round 1) merge 10 files ->1 file (round 2) merge 10 files -> 1 file (round 3) merge 10 files -> 1 file (round 4) At this point we have 4 merged files and 6 unmerged files (totally 10). In round 5, these 10 files will be merged in the reducer memory. However I am not clear how this logic would make the disk i/o efficient.
@akashgaikwad68477 жыл бұрын
How is disk I/O efficiency increased taking first 3 files into one and then processing later by batches of ten? Files are already moved over network so how will they increase I/O efficiency? how is the example given at the last related.Please elaborate.
@its_joel73243 жыл бұрын
Thankyou very much for this..
@mahendarkusuma7 жыл бұрын
Very good presentation, can you please tell me Which tool are you using to generate the simulations
@shaikhmohammedatif23913 жыл бұрын
have u made another channel?
@kirantvbk6 жыл бұрын
When files spill over to the disk and then data gets partitioned and sorted. Does it need to read the data into memory again and do sort and write back? Or does it in disk?
@mohammadsadaquat36248 жыл бұрын
very nyc explanation. Keep posting newer contents. Thanx
@rytmf4 жыл бұрын
Great explanation. Ty
@JMK29282 жыл бұрын
Is there any notes
@charleygrossman83689 жыл бұрын
Hello, I have a question. Speaking for the sort phase, would you consider the theoretical sort (first one) with three even splits to be a bucket sort? And the actual sort (second one) that is implemented, why does it begin with three partitions, then 10, 10, and finally the remaining 7 files? Thank you sir.
@VibeWithSingh9 жыл бұрын
Nice explanation. though didn't understand the last splitting part. but still kudos. :)