Hello. In the last video,
I skimmed through the topic of HDFS recovery process.
If you are going to be a data engineer,
never skim any information through.
Give it an in-depth studying to get a good understanding of the process.
Right now it might be a good idea to re-watch my last video.
Of course, I'm kidding.
To eliminate multiple minutes to be precise and choosy.
Let me introduce a few concepts.
Block and replica.
Replica is a physical data storage on a data node.
There are usually several replicas with the same content on different data nodes.
Block is a meta-information storage on a name node
and provides information about replica's locations and their states.
Both replica and block have their own states.
There are the following data node replica's states.
If replica is in a finalized state
then it means that the content of this replica is cool and icy.
Technically speaking is frozen.
The latter means that meta-information for this block on name node
is aligned with all the corresponding replica's states and data.
For instance you can safely read data from
any data node and you will get exactly the same content.
This property preserves read consistency.
Each block of data has a version number called Generation Stamp or GS for short.
For finalized replicas, you have a guarantee that all of
them have the same GS number which can only increase over time.
It happens during error recovery process or during data appending to a block.
State RBW stands for Replica Being Written to.
It is the state of the last block of
an open file or a file which was reopened for appending.
During this state different data nodes can return to use a different set of bytes.
In short, bytes that are acknowledged by
the downstream data nodes in a pipeline are visible for a reader of this replica.
Moreover, data node on disk data and
name node meta-information may not match during this state.
In case of any failure data node will try to preserve as many bytes as possible.
It is a design goal called data durability.
Replica Waiting to be Recovered or RWR for short,
is a state of all Being Written replicas after data node failure and recovery.
For instance, after a system reboot or after Pacer.sys or BSOD,
which are quite likely from a programming point of view.
RWR replicas will not be in
any data node pipeline and therefore will not receive any new data packets.
So they either become outdated and should be discarded,
or they will participate in
a special recovery process called a lease recovery if the client also dies.
HDFS client requests a lease from a name node to
have an exclusive access to write or append data to a file.
In case of HDFS client lease expiration,
replica transition to a RUR state.
RUR stands for Replica Under Recovery.
Lease expiration usually happens during the client's site failure.
As data grows and different nodes are added or removed from a cluster,
data can become unevenly distributed over the cluster nodes.
A Hadoop administrator can spawn a process of data re-balancing or
a data engineer can request increasing of
the replication factor of data for the sake of durability.
In these cases new generated replicas will be in a state called temporary.
It is pretty much the same state as RBW except
the fact that this data is not visible to user unless finalized.
In case of failure,
the whole chunk of data is removed without any intermediate recovery state.
In addition to the replica transition table,
a name node block has its own collection of states and transitions.
Different from data node replica states,
a block state is stored in memory, it doesn't persist on any disk.
As soon as a user opens a file for writing,
name node creates the corresponding block with the under_construction state.
When a user opens a file for append name node
also transition this block to the state under_construction.
It is always the last block of a file,
it's length and generation stamp are mutable.
Name node block keeps track of right pipeline.
It means that it contains information about all RBW and RWR replicas.
It is quite vindictive and watches every step.
Replicas transitions from RWR to recovery RUR state when the client dies.
Even more generally it happens when a client's lease expires.
Consequently, the corresponding block
transitions from under_construction to under_recovery state.
The under_construction block transitions to a committed state when a client
successfully requests name node to close
a file or to create a new consecutive block of data.
The committed state means that there are already
some finalized replicas but not all of them.
For this reason in order to serve a read request,
the committed block needs to keep track of RBW replicas,
until all the replicas are transitioned to the finalized state
and HDFS client will be able to close the file.
It has to retry it's requests.
Final complete state of a block is a state where all the replicas are in
the finalized state and therefore they have
identical visible length and generation stamps.
Only when all the blocks of a file are complete the file can be closed.
In case of name node restart,
it has to restore the open file state.
All the blocks of the un-closed file are loaded as
complete except the last block which is loaded as under_construction.
Then recovery procedures will start to work.
There are several types of them, replica recovery,
block recovery, lease recovery, and pipeline recovery.