Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Hadoop Training and Certification Curriculum, Visit our Website: bit.ly/2Ozdh1I
@srikrishnarr65535 жыл бұрын
speaking with authority ensuring the audience has understood ...thats too good
@adrianmarin39674 жыл бұрын
Easy to understand. Thank you! Great job keeping everyone in the audience paying attention.
@aecosta19815 жыл бұрын
Thank you so much for help us to understand the fantastic world of Hadoop.
@edurekaIN5 жыл бұрын
Thanks for watching the video! We are glad that our video was helpful. Cheers!
@abhishekkaushik56145 жыл бұрын
Worth 58:14 minutes... Nice one
@hamza-the-big-data-lad5 жыл бұрын
This is exactly what I've been looking for. Thank you! :)
@edurekaIN5 жыл бұрын
Thanks for being a part of our community! Cheers!
@venkateswarlub63652 жыл бұрын
Excellent session sir. Very useful for me. Tq
@edurekaIN2 жыл бұрын
You are welcome 😃 Glad it was helpful!!
@sameergpta7 жыл бұрын
Very simple and extremely informative session.
@edurekaIN7 жыл бұрын
Thank you for watching our videos. Do subscribe to our youtube channel and stay updated with our content. Cheers :)
@mahen578211 ай бұрын
Truly helpful with live examples..keep up the good job!!
@edurekaIN11 ай бұрын
Glad it was helpful!
@snehalgandham6 жыл бұрын
Awesome Tutorial. The architecture has been explained precisely. Thanks.
@edurekaIN6 жыл бұрын
Hey Snehal, we are glad that you found our lectures useful. Do subscribe and stay connected with us. Cheers :)
@mca-hod74767 жыл бұрын
It is very useful ..............Thanx
@edurekaIN7 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@mariei74453 жыл бұрын
You are the best. made it so easy to understand. Best !
@edurekaIN3 жыл бұрын
Hey:) Thank you so much for your sweet words :) Really means a lot ! Glad to know that our content/courses is making you learn better :) Our team is striving hard to give the best content. Keep learning with us -Team Edureka :) Don't forget to like the video and share it with maximum people:) Do subscribe the channel:)
@girish904 жыл бұрын
Thanks this is a great tutorial!
@duven20896 жыл бұрын
Very well explained.😃
@vatsala13883 жыл бұрын
Very helpful, easy to understand !!
@devsatheesh72877 жыл бұрын
very good Explanation, i study very useful
@anuradhag47173 жыл бұрын
Awesome tutorial!
@sitaluk217 жыл бұрын
Very well organized & explained
@edurekaIN7 жыл бұрын
Hey Seeta! Thank you for appreciating our work. Do subscribe, like and share to stay connected with us. Cheers :)
@pavankumar-nm9yu3 жыл бұрын
Awesome explanation of the hdfs architecture..
@edurekaIN3 жыл бұрын
Thank you for you time in giving a feedback :) We are glad that you are learning from our videos! Stay connected with our channel :)
@RaviYadav-nj8zh3 жыл бұрын
Amazing session 👍👍 loved it ❤️ Hare Krishna ♥️🙏
@shanisankar53453 жыл бұрын
Very very helpful and easy to understand. 🙏 Thankyou for such a wonderful presentation
@edurekaIN3 жыл бұрын
Thank you so much : ) We are glad to be a part of your learning journey. Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@babus22415 жыл бұрын
Good explanation Good job
@saibadrish12487 жыл бұрын
thanks for the details explanation !! really appreciated !
@1982sangeetha8 жыл бұрын
Very nice explaination !! Just a quick question on HDFS multi-block write mechanism which is explained at 40th minute. Here 1st and 2nd copy of block B is getting written into same rack [ Rack 5]. 2nd copy of block B was supposed to be in different rack right? 2nd and 3rd copy can be in same rack but not the 1st and 2nd copy.
@edurekaIN8 жыл бұрын
+sang, thanks for checking out our tutorial! We're glad you found it useful. You are right. Block B-copy should be first copied to Rack1 datanode3. then to Rack3 datanode9. Cheers!
@barbobrien93184 жыл бұрын
Love the graphics essential to learning.
@satyabetha6376 жыл бұрын
Excellent introduction and very absorbing
@edurekaIN6 жыл бұрын
Hey Satya, we are glad you loved the video. Do subscribe and hit the bell icon to never miss an update from us in the future. Cheers!
@amitbukshet41608 жыл бұрын
Very good explanation. thanks.
@edurekaIN8 жыл бұрын
Hey Amit, thanks for your wonderful feedback. We thought you might be interested in learning through Hadoop use cases. You can check out the videos here: kzbin.info/aero/PL9ooVrP1hQOGh5sIXY_E6JE4zknuzxleF. Hope this helps. Cheers!
@amitd167 жыл бұрын
Thank you so much..very well explained
@edurekaIN7 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@kaushalsingh6017 жыл бұрын
awesome explanation😊
@edurekaIN7 жыл бұрын
Thank you, Kaushal! Do subscribe, like and share to stay connected with us. Cheers :)
@shakeer68083 жыл бұрын
Very simple nd amazing explanation. Especially the figures made us to understand very clearly. In between the questions made us realize whether we understood or no. Thank you sir
@sanjeetkumar76466 жыл бұрын
Wonderful explanation. It was very helpful for me.
@anishasingh80557 жыл бұрын
Thanku it was very useful
@gopireddytalatala97727 жыл бұрын
Anisha Singh ..the way he teach very good to understand every one...are you learn hadoop
@sivagurusubarmaniyan14346 жыл бұрын
Good explanation! Btw, at minute 43:45 it was mentioned that reading different data blocks from datanodes within the same rack can reduce the usage of network bandwidth. Just wondering how the usage of network bandwidth get reduced. Correct me if I'm wrong, if data are read from different rack, it can reduce the load on the switches and increase their performance.
@edurekaIN6 жыл бұрын
Hey Sivaguru, You are right. If you read data from different data nodes residing on different racks, the load on rack switches gets distributed and it actually contributes to performance tuning. Hope this helps!
@manjunathckadani77327 жыл бұрын
Good Explanation
@suryag75976 жыл бұрын
Great Session
@edurekaIN6 жыл бұрын
Hey Surya, thank you for watching our video. We are glad to know that you liked our tutorial. Do subscribe and stay connected with us. Cheers :)
@svdfxd7 жыл бұрын
Simply Awesome !!!
@jaymishra1027 жыл бұрын
Why does client node seek permission/status from datanodes to perform write operation(see whether they are ready or not).name node already must be having the status of each data node then only it will send the node no.s right?Kindly brief
@edurekaIN6 жыл бұрын
DataNode sends heartbeat to the NameNode periodically i.e. in interval. It might happen that during this interval a DataNode can crash which NameNode won't be aware of. Also, NameNode is the master node and needs to be available all the time. Hence, to lessen the load from NameNode, the read/write operation is taken care by DataNodes. Hope this helps :)
@nandinip16576 жыл бұрын
explain more about name node and datanode
@Giridhar15342 жыл бұрын
Why HDFC read onces and write many in architecture?
@mubashshirrizvi30247 жыл бұрын
very well explained....Thank You... I have few questions, 1)is it mandatory to have equal number of nodes in all the racks? 2) in write mechanism why first copy is created in DN1 and not in DN4/DN6? Please answer
@edurekaIN7 жыл бұрын
+Mubashshir Rizvi, thanks for the wonderful feedback! Here are the answers o your queries: 1 Ans: No, we can define the number of nodes as per our requirement. 2 Ans : It is not mandatory that the first copy should be created on DN1. Selection of DN is solely dependent upon the Hadoop system. In this video for demonstration purpose, it is mentioned that the data is copied to DN1, but in real-time, Hadoop sytem handles this internally. Hope this helps. Cheers!
@jayasharma76168 жыл бұрын
I have gone through the videos and all of them are very useful.I have a doubt here : rack means different machines at one physical location and connected to each other. As said by you rack have data nodes.Then will it be correct if i say that different computers listed in a rack are data nodes?
@edurekaIN8 жыл бұрын
+Jaya Sharma, thanks for checking out our tutorial! We're glad you found it useful. Rack is like a a container, which contains the data node, and which is nothing but a computing machine, and which contains the actual data, So if the data is very big, and comes to rack, then the data is distributed among the data nodes, that can be recollect as a single unit (which will be a merged output from all the data-nodes which kept the data). Theses things are maintained by hadoop framework, which means, in what amount the data should be divided among the data-nodes, which rack will be storing the data for certain region or group. Because, in real-time scenario, there can be multiple racks which will be storing data-nodes from different regions or say group. Hope this helps. Cheers!
@kishorekumar27696 жыл бұрын
What is the difference between hadoop dfs -ls / and hdfs dfs -ls /
@edurekaIN6 жыл бұрын
hadoop fs fs is used for generic file system and it can point to any file system such as local file system, HDFS, WebHDFS, S3 FS, etc. hadoop dfs hdfs dfs dfs points to the Distributed File System and it is specific to HDFS. You can use it to execute operations on HDFS. Now it is deprecated, and you have to use hdfs dfs instead of hadoop dfs. Hope this helps :)
@ankitjain24167 жыл бұрын
Hi, Its Very well explained. But I have small query. As per HDFS Arch we are creating three replication copy of a block. Ist copy is storing in one Rack and other two are in different but same rack. You already explained by saving it in same rack we are saving network bandwidth but my question is why we are creating 2 copies? If there is any issue with this rack then either both would not be accessible or both would be accessible. In that case we are not occupying/consuming more disk space keeping big data into consideration? What's the purpose of third copy? Please reply. Thanks
@edurekaIN6 жыл бұрын
This is done to prevent data loss and provide more fault tolerance. As you mentioned, in case a rack fails, data can be retrieved from the third block residing in a different rack. Also, HDFS periodically checks for under replicated and corrupt blocks and adds more replicas if required to ensure that configured replication factor is maintained. Same is done for the corrupted blocks. Hope this helps :)
@naveenreddy50647 жыл бұрын
Good explanation.
@edurekaIN7 жыл бұрын
Hey Naveen, thanks for checking out our tutorial! We're glad you found it useful. Here's another video that we thought you might like: kzbin.info/www/bejne/qqaan3apfa6gmKs. Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@sandykoolz7 жыл бұрын
Thanks for good explanation, Vineet. I have a question, is it configurable to write the replicas in parallel. Because writing the replicas to the racks sequentially takes more time and also name node should wait for the Ack from the last replica commit.
@edurekaIN6 жыл бұрын
No, it is not configurable as the whole process is being guided through a pipeline. Also, while waiting for acknowledgment message NameNode can serve other client requests.
@chetanpaithane65608 жыл бұрын
Very nice explanation. How does HDFS manage metadata on name node? Quick explanation will certainly help.
@edurekaIN8 жыл бұрын
Hey Chethan, thanks for checking out our tutorial! We're glad you liked it. Here's the answer to your query: The HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too. The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint. Hope this helps. Cheers!
@chetanpaithane65608 жыл бұрын
Thanks for the reply. My question was a bit different though. Let me elaborate it more with example of reiserfs. 1. If one wants to create a file or directory in reiserfs, reiserfs btree code creates an inode. 2. At the time of writing the inode on disk (stat data is on-disk representation of inode), the stat data item is inserted into B+ tree. 3. Dirent is inserted into parent directory. 3. Whenever, lookup for the file happens, the b+ tree is searched using key-value pair to retrieve information. ======= My question : How does HDFS manage metadata of files/directories on name node? Explanation would be helpful. Thanks, Chetan
@edurekaIN8 жыл бұрын
Hey Chetan, maybe this would help. Persistence of HDFS metadata broadly breaks down into 2 categories of files: 1) fsimage - An fsimage file contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID. 2) Edits - An edits file is a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage. * Checkpointing is the process of merging the content of the most recent fsimage with all edits applied after that fsimage is merged in order to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands. Here is an example of an HDFS metadata directory taken from a NameNode. This shows the output of running the tree command on the metadata directory, which is configured by setting dfs.namenode.name.dir in hdfs-site.xml. data/dfs/name ├── current │ ├── VERSION │ ├── edits_0000000000000000001-0000000000000000007 │ ├── edits_0000000000000000008-0000000000000000015 │ ├── edits_0000000000000000016-0000000000000000022 │ ├── edits_0000000000000000023-0000000000000000029 │ ├── edits_0000000000000000030-0000000000000000030 │ ├── edits_0000000000000000031-0000000000000000031 │ ├── edits_inprogress_0000000000000000032 │ ├── fsimage_0000000000000000030 │ ├── fsimage_0000000000000000030.md5 │ ├── fsimage_0000000000000000031 │ ├── fsimage_0000000000000000031.md5 │ └── seen_txid └── in_use.lock In this example, the same directory has been used for both fsimage and edits. Alternatively, configuration options are available that allow separating fsimage and edits into different directories. Each file within this directory serves a specific purpose in the overall scheme of metadata persistence: • VERSION - Text file that contains: • layoutVersion - The version of the HDFS metadata format. When we add new features that require changing the metadata format, we change this number. An HDFS upgrade is required when the current HDFS software uses a layout version newer than what is currently tracked here. • namespaceID/clusterID/blockpoolID - These are unique identifiers of an HDFS cluster. The identifiers are used to prevent DataNodes from registering accidentally with an incorrect NameNode that is part of a different cluster. These identifiers also are particularly important in a federated deployment. Within a federated deployment, there are multiple NameNodes working independently. Each NameNode serves a unique portion of the namespace (namespaceID) and manages a unique set of blocks (blockpoolID). The clusterID ties the whole cluster together as a single logical unit. It’s the same across all nodes in the cluster. • storageType - This is either NAME_NODE or JOURNAL_NODE. Metadata on a JournalNode in an HA deployment is discussed later. • ctime - Creation time of file system state. This field is updated during HDFS upgrades. • edits_start transaction ID-end transaction ID - These are finalized (unmodifiable) edit log segments. Each of these files contains all of the edit log transactions in the range defined by the file name’s • edits_inprogress__start transaction ID - This is the current edit log in progress. All transactions starting from are in this file, and all new incoming transactions will get appended to this file. HDFS pre-allocates space in this file in 1 MB chunks for efficiency, and then fills it with incoming transactions. You’ll probably see this file’s size as a multiple of 1 MB. When HDFS finalizes the log segment, it truncates the unused portion of the space that doesn’t contain any transactions, so the finalized file’s space will shrink down. • fsimage_end transaction ID - This contains the complete metadata image up through • seen_txid - This contains the last transaction ID of the last checkpoint (merge of edits into a fsimage) or edit log roll (finalization of current edits_inprogress and creation of a new one). Note that this is not the last transaction ID accepted by the NameNode. The file is not updated on every transaction, only on a checkpoint or an edit log roll. The purpose of this file is to try to identify if edits are missing during startup. It’s possible to configure the NameNode to use separate directories for fsimage and edits files. If the edits directory accidentally gets deleted, then all transactions since the last checkpoint would go away, and the NameNode would start up using just fsimage at an old state. To guard against this, NameNode startup also checks seen_txid to verify that it can load transactions at least up through that number. It aborts startup if it can’t. • in_use.lock - This is a lock file held by the NameNode process, used to prevent multiple NameNode processes from starting up and concurrently modifying the directory. Hope this helps. Cheers!
@chetanpaithane65608 жыл бұрын
Thanks for the information.
@phillybruce7 жыл бұрын
A 43 min job on one machine takes exactly 4.3 min on 10 machines: Doesn't the lower level of parallelism in the reduce phase, the overhead of the mater name server and the fact that the data nodes may not have equal slices of the data make this an approximation?
@edurekaIN6 жыл бұрын
Yes Bruce, you are absolutely correct.It is just an approximation so as to make you understand the benefits of parallelisn. Hope this helps :)
@laxmikantdhond32847 жыл бұрын
What is need of 3 replica, As we coping 2 replica in same rack ?? Can you please explain ?
@edurekaIN6 жыл бұрын
It is done to provide more fault tolerance. Also, in general, DataNodes are likely to fail more than that of a rack. Besides this, having two replica in the same ract helps to improve the network performance because, in general, you will find greater network bandwidth between machines in the same rack than the machines residing in different rack. Hope this answers your query! :)
@ramjadhav69426 жыл бұрын
many thanks...
@pushpendrasharma916 жыл бұрын
what difference between mapreduce and yarn ? and why need yarn?
@edurekaIN6 жыл бұрын
Hey, sorry for the delay. YARN is for resource allocation in hadoop while MapReduce is a programming model for processing big data using parallel & distributed algorithm on a cluster. Hope this helps!
@devendranehra98406 жыл бұрын
very good
@subhamkumargupta27126 жыл бұрын
what is the difference between the commands hadoop fs -ls and hadoop dfs -ls?
@edurekaIN6 жыл бұрын
Hey Subham, The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others.So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS. Hope this helps!
@BSS2UA9901S6 жыл бұрын
Nice tutorial :)
@edurekaIN6 жыл бұрын
Hey , thank you for watching our video. We are glad to know that you liked our tutorial. Do subscribe and stay connected with us. Cheers :)
@ganeshsundar14847 жыл бұрын
Good explanation, I hav a question, Who is going to create blocks??
@edurekaIN7 жыл бұрын
Hey Ganesh, thanks for checking out our tutorial! We're glad you liked it. While storing data in HDFS, the NameNode will divide the files into data blocks (as mentioned by you in dfs.block.size property) and then stores the data blocks across various DataNodes in the HDFS. Hope this helps. Cheers!
@gundaanil50017 жыл бұрын
who will create racks ?? and how do we the rack configuration ?
@edurekaIN6 жыл бұрын
Rack configuration is done by the cluster administrator. For more information about rack awareness configuration, refer this link: hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/RackAwareness.html Hope this helps :)
@1983akj5 жыл бұрын
Hi, Thanks for the very informative video.I have a question here, why are we creating 2 replications of same block in single rack. Wouldn't the second one is redundant because if the rack is not available there is no meaning of having 2 copies in same rack.
@edurekaIN5 жыл бұрын
Yes, you are correct. If that rack fails, then both the copies will not be available. For the purpose of the video, we just did that but in real life, it is suggested to have it on different racks.
@pranjitbharali66057 жыл бұрын
what if the no. of replicas is decided dynamically ?
@edurekaIN6 жыл бұрын
No, the number of replica is not decided dynamically. By default it is specified in hdfs-site.xml. But you can also explicitly decide the block size for a file. Hope this helps :)
@shreeprakashagrahari97627 жыл бұрын
Hi If we put a file from local machine to HDFS , Still it ll create 3 replica of each blocks(block of file ) in Racks?
@edurekaIN6 жыл бұрын
Yes, According to the default replication factor, whatever content you put in HDFS will be replicated and stored in different racks. Hope this helps :)
@komalkale62057 жыл бұрын
what is hadoop clusture?
@edurekaIN6 жыл бұрын
In talking about Hadoop clusters, first we need to define two terms: cluster and node. A cluster is a collection of nodes. A node is a process running on a virtual or physical machine or in a container. We say process because a code would be running other programs beside Hadoop. There are two types of cluster setup for Hadoop: We have Single Node Cluster (Normal Setup) Multi Node Cluster
@gummadavellisaiavinash64807 жыл бұрын
Thanks for information. Please tell that Which playlist we need to select for hadoop bigdata. 1. Hadoop Training Videos( 26 videos) 2. Big Data Hadoop Tutorial Videos( 46 videos ) Give replay ... Both from # edureka
@edurekaIN7 жыл бұрын
Hey Avinash, in Hadoop Training Videos playlist ( 26 videos) we only have the latest videos. The other playlist Big Data Hadoop Tutorial Videos playlist ( 46 videos ) has all the big data hadoop videos by Edureka, so I would suggest you to follow this playlist. Hope this helps. Cheers!
@gummadavellisaiavinash64807 жыл бұрын
***** Thanks
@aparnasen40957 жыл бұрын
very nice explanation indeed!!! thanks a lot.. still got a doubt during the video, in the HDFS multi block write mechanism, it is shown that the first replica of BLK 2 is created in the same Rack (Rack 5), while earlier it was explained to create replicas in different Racks.. how far this is correct??please clear my doubt..Thanks in advance...
@edurekaIN7 жыл бұрын
Hey Aparna, thanks for the wonderful feedback! We're glad you liked our tutorial. With regard to your query, the block replacement policy is something that can be customised. But, the default block placement algorithm works fine. It states that if the client is itself on a data-node then store the first replica on that machine. Store the second replica on a different rack and then the third replica on the same rack where first replica was stored but on a different node. And if the replication factor is more than 3 then the further replicas are placed randomly(not actually, load balancing and network bandwidth) have to be taken into account. But the above is the best case scenario, if you don't have enough size on the local machine(for placing first replica) then hadoop will try to store the data in the same rack on a node which has lot of free space. So, it depends upon a lot more factor than what is told. Hope this helps. Cheers!
@aparnasen40957 жыл бұрын
edureka! Thanks a lot again for replying to my query.. please keep up the good work and wish you all the very best !!!
@sarangsirsikar34437 жыл бұрын
should the block size be in multiples of 64mb only??
@edurekaIN7 жыл бұрын
Hey Sarang, thanks for checking out our tutorial. Yes, it will be multiple of 64 MB or 128 MB.The default size of a block in Hadoop yarn is 128 MB, but in Hadoop 1x its 64 MB. Hope this helps. Cheers!
@r4hu1gunner6 жыл бұрын
Sir, in HDFS multi block write pipeline, why block B is getting copied twice to rack 5? First copy was copied to rack 5, shouldn't second and third copy by replicated/copied to the same rack?
@edurekaIN6 жыл бұрын
Hey Rahul, "Replication factor is basically the no.of times we are going to replicate every single Data Block. So, in Hadoop, we have replication factor by default as 3, and the replication in hadoop is not the drawback, in fact it makes hadoop effective and efficient by incorporating the feature like Fault Tolerant. There is a flexibility to change the replication factor in hadoop, i.e it can be changed to 2(less than 3) or can be increased(more than 3). However it is considered ideally to have replication factor as 3, because: If one node of your’s goes down, you still have fault tolerant with 2 nodes and your critical data is saved in these two nodes successfully. Also, you have ample time to send an alert to name node and recover the duplication of the failed node into a new node. And in the meantime, if the 2nd node also fails unplanned, you still have one node active with your critical data to process. Hence replication factor 3 is considered to best fit, less than that could be challenging during data recovery, and higher no of the node are known as cost prone." Hope this helps!
@omarayman54788 жыл бұрын
how can i download hadoop software or where to find it..thanks in advance
@edurekaIN8 жыл бұрын
Hey Omar, thanks for checking out our tutorial! Kindly use the bellow link to download Hadoop Software. www-eu.apache.org/dist/hadoop/common/ Cheers!
@vinothinijawahar69385 жыл бұрын
What is core switch here
@edurekaIN5 жыл бұрын
Hey, A great deal of chatter takes place between the master nodes and slave nodes in a Hadoop cluster that is essential in keeping the cluster running, so enterprise-class switches are definitely recommended. These core switches handle massive amounts of traffic, so 40GbE is a necessity.
@prateeksingh36367 жыл бұрын
Hi,Its so great, I've a little doubt, At the time of block writing,Client is going to write first block to the data node and other replica creates automatically,and client gets the feedback,But for this process client must have all blocks of that files up-front so who is going to create these blocks ? Is client itself?? and at what point of time blocks gets created and collect to client?? Who is going to maintain the sequence of that block if a case file need to re-collect? Because as a user I will give only a BIG file as input.Appreciate your help.
@edurekaIN7 жыл бұрын
Hey Prateek, thanks for checking out our tutorial. You can check out this blog for a detailed explanation on storing file in Hadoop environment. It will give you all the info you need. www.edureka.co/blog/apache-hadoop-hdfs-architecture/ Hope this helps. Cheers!
@gummadavellisaiavinash64807 жыл бұрын
I want to learn hadoop.
@edurekaIN7 жыл бұрын
Hey Avinash, thanks for checking out our tutorial! Our instructor led Big Data Hadoop Certification Training with help you to learn Hadoop, you can check out the details of this training here: www.edureka.co/big-data-and-hadoop Hope this helps. Cheers!
@vaibhavkumar33518 жыл бұрын
HI Team Greetings!!! Please do let me know is there any coming batch of the instructor in the video . i need to join asap . Thanks ..
@edurekaIN8 жыл бұрын
Hey Vaibhav, thanks for checking out our tutorial and for your interest. While we do not have any upcoming batches led by this instructor, we have upcoming batches by other top-rated instructors who have trained hundreds of professionals. You can check out the batch dates here: www.edureka.co/big-data-and-hadoop. If you would like to take a look at the sample class recordings of the other instructors, please share your contact details with us here (we will not publish the comment) or inbox us on FB and we will send you the links. Alternatively, you can also call us at +91 88808 62004 . Hope this helps. Cheers!
@edurekaIN8 жыл бұрын
Hey Vaibhav, we have shared your contact details with the relevant team. You can expect to hear from them very soon. Since this instructor does not have any batches coming up, they will share sample class recordings for instructors who have upcoming batches. You can take a look and decide. :) Please feel free to get in touch if you have any questions. Hope this helps. Cheers!
@azadbulla35566 жыл бұрын
great!!!!!!!!!!!!
@mohammedabdulbari34608 жыл бұрын
Hi edureka, I want to take the hadoop course that you guys are offering, is there any email address that i can get to contact you guys.
@edurekaIN8 жыл бұрын
+Mohammed Abdul Bari, thanks for checking out our tutorial and for your interest. We can definitely help you there. You can get in touch with us at +91 88808 62004 or simply write to us at sales@edureka.co. You can even register online here: www.edureka.co/big-data-and-hadoop. Alternatively, you can share your contact details with us (we will not make the comment public) and we will get in touch with you. Hope this helps. Cheers!
@Manishkumar-zj9zw7 жыл бұрын
are the vedios in proper sequence??
@edurekaIN7 жыл бұрын
Hey Manish, thanks for checking out our tutorials. For Hadoop Developer training, you can follow this playlist: kzbin.info/aero/PL9ooVrP1hQOEmUPq5vhWfLYJH_b9jFBbR. You can skip video #3,4,5 as there may be repetition of concepts. For a structured training programme that includes practicals, 24X7 support and lifetime access to learning material, please check out our course here: www.edureka.co/big-data-and-hadoop. Hope this helps. Cheers!
@kajapraneetha28856 жыл бұрын
If i install hadoop on my laptop, then it will run on the top of the OS. How will the HDFS is running on the top of it ? where is the namenode created ? how will the hard disk be partitioned? how will the cpu be distributed for processing among data nodes ?
@edurekaIN6 жыл бұрын
Hey, www.edureka.co/blog/interview-questions/hadoop-interview-questions-hadoop-cluster/. Please take a look at this blog. Cheers!
@indrapadmaja7 жыл бұрын
How can i install Hadoop in windows7 os?
@edurekaIN7 жыл бұрын
Hey Indrakanth, thanks for checking out our tutorial. Hadoop cannot be installed on windows machine. So please install virtual machine (centos) which is Linux operating system and there you can install Hadoop in centos. Please go through the below blog which has the detailed steps for installing Hadoop on centos. www.edureka.co/blog/install-hadoop-single-node-hadoop-cluster Hope this helps. Cheers!
@truthsjourney39947 жыл бұрын
Awesone
@teju79076 жыл бұрын
Hi team, 1) One DataNode means 1 CPU/RAM ? please give me answer. 2) where the RACKS are configured. means how many RACKs going to be created ?
@edurekaIN6 жыл бұрын
Hey Chenna, A small Hadoop cluster includes a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. Though it is possible to have data-only worker nodes and compute-only worker nodes, a slave or worker node acts as both a DataNode and TaskTracker. Hope this helps!
@graphe.l59116 жыл бұрын
Suppose we are maintaining 3Copies of Data ( 1 is in A Rack 2,3 are in B Rack ) suppose if B Rack fails due to some network problem . Hadoop can access data from A Rack it is fine. But my doubt is before we fixing up B Rack if A Rack also fails How to get the Data? Do we have any mechanism maintaining Replication factor as 3 if some of copy fails means does it create those 2 copies by using A Rack copy to maintain Replication factor as 3 before we fix the problem of B Rack???
@edurekaIN6 жыл бұрын
Hey, "Failure of the complete Rack is very rare. Generally the nodes among the racks fail, and yes it is possible that all the 3 nodes residing on two racks where the data block is present can fail. But you need to know how critical is the data & you need to change the replication factor accordingly. Suppose one of the DataNode fails, then NameNode quickly starts replicaitiong all the blocks present in that DataNode. " Hope this helps!
@sriharshagudi67696 жыл бұрын
If a block of 128 MB is stored with 50 MB or 100 MB. What will happen with the remaining storage space in the block? Will it be used by another file or will it wasted?
@edurekaIN6 жыл бұрын
Hey Sriharsha, the remaining space shall be unused but we can't exactly call it a waste of space because this architecture is optimized basically for parallel processing. It's meant for enormous amounts of data to be processed in smaller blocks simultaneously, ultimately saving alot of time and resulting in better efficiency. Hope this helps!
@ubaidmukati15327 жыл бұрын
what if the name node fails??
@edurekaIN7 жыл бұрын
+Ubaid Mukati, thanks for checking out our tutorial! If the NameNode process or machine fails, then the entire cluster will not be available until either the NameNode is rebooted or it is assigned and started on another machine. Any restarted NameNode is not available until it gets heartbeat messages from the data nodes with the block locations for all of the files on the data nodes. This can take hours for large clusters which results in decreased availability when there is an unexpected outage. The single NameNode contains the metadata about all of the file blocks stored in HDFS. This meta data is a registry of which file blocks make up each HDFS file. Without this registry, there is no way to know which blocks belong to which HDFS files. The location of file blocks is sent to the NameNode through heartbeat messages from the Data Nodes. In the event of the NameNode failure, since there are normally no HDFS file blocks stored on the NameNode, there would be no loss of the file blocks that make up HDFS files. As mentioned, the NameNode contains a registry of all of the blocks in HDFS. This information is located in an image file called fsimage also in an edit log that keep tracks of all of the files on the system. If this file is lost or corrupted, then there will be no record of which blocks are in which HDFS file resulting in data loss of the entire cluster. Hadoop does have built in mechanisms and also some administration practices to protect against this case. Hope this helps. Cheers!
@prachiagrawalcipher8 жыл бұрын
How DataNode1 knows about datanode4?
@edurekaIN8 жыл бұрын
Hey Prachi, thanks for checking out our tutorial! The Application Master is the one that handles communication between DataNodes. So DataNode1 and DataNode2 are connected via Application Master. Hope this helps. Cheers!
@prateekjaiswal72307 жыл бұрын
sir please explain everytopic in depth... i am not understand any topic of hadoop...??
@edurekaIN7 жыл бұрын
Hey Prateek, thanks for checking out our tutorial! We suggest that you start with this tutorial kzbin.info/www/bejne/sJbdY4esYseWjrs and work your way down this playlist: kzbin.info/aero/PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD. You can also sign up for our structured instructor-led training to get support and doubt clearance: www.edureka.co/big-data-and-hadoop. Hope this helps. Cheers!
@RohitRoy-ji9kv3 жыл бұрын
sweet
@lakshmidurga14067 жыл бұрын
great
@412sahil7 жыл бұрын
what is secondary namenode ?
@edurekaIN6 жыл бұрын
Secondary NameNode is a helper to the primary NameNode but is not a replacement for the primary namenode.Secondary Namenode takes the responsibility of merging editlogs with fsimage from the namenode. 1.It gets the edit logs from the namenode in regular intervals and applies to fsimage 2.Once it has new fsimage, it copies back to namenode 3.Namenode will use this fsimage for the next restart,which will reduce the startup time Hope this helps :)