Learn about the impact of Apache Hadoop YARN on Hadoop, and how it transforms Hadoop 2 into a Data Operating System.
Пікірлер: 9
@roguelitedev9 жыл бұрын
I literally have goosebumps I'm so excited!! :D
@jameskpl11 жыл бұрын
Horton works, thank you so much for the video. A quick question - is there a way to manage the data that's going into hdfs like to check for duplicates. For an eg: we upload data (several GB's and all structured) for the day. And we are asked to upload data after couple of weeks. Is there a way to check/compare the data that's being uploaded now to the data that was uploaded before. So we don't end up having 6 copies of the same data (limit to 3 with replication). Would really appreciate any feedback. Thank you, James.
@dukegaming22317 жыл бұрын
jameskpl in hadoop 2 if over replication is done among datanodes, it will thow overReplicatedBlock exception therefor Replication balancers should be run ie define threshold or specify datanodes
@charleygrossman83689 жыл бұрын
One cluster to store them all.
@homoudalshammari913911 жыл бұрын
Hi I like the question that jameskpl posted. I would add a simple point which is since the data source file has the same and needed to be uploaded into the same NameNode? Is that can be considered as a duplication or overwritten ? Thank you... Hamoud
@MAZEN_TAEMIN8 ай бұрын
here cuz i'm studying hadoop and it's version at the moment in 2024
@vivek23197 жыл бұрын
Arun looks pissed :D What's the matter Arun? Somebody give him Hadoop to play with ;) #IYKWIM :D
@sn2011 жыл бұрын
the life of me... I still cannot understand why in the hell they call YARN as MR2? To me it sounds a like a layer of abstraction for resource management. & Now you have to go through YARN if you need something done on hdfs. (May be another secondary name node in the making...) in other words - Dismantle existing MR and reorg it. More importantly open up the processing unit underlying HDFS to other applications. Let them all fight for cpu time via YARN