Name Node Federation, Checkpoint, Backup, and Snapshots in Hadoop Cluster – Big Data Analytics
This article discusses basic Hadoop concepts in Big Data Analytics such as,
- Name Node Federation
- Checkpoint and Backup
1. Name Node Federation
Earlier versions of HDFS (that is Hadoop V 1.0) provides a single namespace for the entire Hadoop cluster managed by a single Name Node. Thus, the resources of a single Name Node determined the size of the entire Hadoop cluster namespace. This limits the cluster size and scalability. This issue in the Hadoop cluster is addressed by adding multiple name nodes or namespaces in the Hadoop cluster. This is known as the Name node Federation in the Hadoop cluster.
The key benefits of Name node federation are as follows:
Namespace scalability: “HDFS cluster storage scales horizontally without placing a burden on one Name Node.”
Better performance: “Adding more Name Nodes to the cluster scales the file system read and write operations throughput by separating the total namespace.”
System isolation: “Multiple Name Nodes enable different categories of applications to be distinguished, and users can be isolated to different namespaces.”
2. Checkpoint and Backup
The Name Node stores the metadata information of the HDFS file system in a file called fsimage. File systems modifications are written to an edits log file, and at startup or on the restart, the Name Node merges the edits into a new fsimage.
The Secondary Name Node or Checkpoint Node periodically fetches edit log files from the Name Node, merges them into fsimage file, and returns an updated fsimage to the Name Node.
An HDFS Backup Node is similar to a checkpoint node, but along with edit log files, it also maintains an up-to-date copy of the file system namespace both in primary memory and on disk.
Unlike a Checkpoint Node, the Backup Node does not need to download the fsimage and edits files from the active NameNode as it has an up-to-date namespace state in memory. One Name Node supports one Backup node at a time in the Hadoop cluster. If a Backup node is in use in the Hadoop cluster then there is no need for Checkpoint Nodes.
In the Hadoop cluster, HDFS snapshots are similar to HDFS backup nodes but snapshots are created by administrators using the hdfs dfs -snapshot command. HDFS snapshots are read-only point-in-time copies of the HDFS.
HDFS snapshots offer the following features:
1. HDFS snapshots can be taken for the entire directory or on a sub-tree of the HDFS file system.
2. HDFS snapshots can be used for data backup, protection against user errors, and disaster recovery.
3. HDFS snapshot creation is instantaneous.
4. Blocks on the Data Nodes are not copied, because the HDFS snapshot just keeps the record of the block list in Data Node and the block size. There is no data copying, although it appears to the user like the entire file system is copying into separate storage.
This article discusses the name Node Federation, Checkpoint, Backup, and Snapshots in Hadoop Cluster – Big Data Analytics. Don’t forget to give your comment and Subscribe to our YouTube channel for more videos and like the Facebook page for regular updates.