HDFS的讀寫機制

下圖顯示了在讀取HDFS上的文件時，客戶端、名稱節點和數據節點間發生的一些事件以及事件的順序。

假設一個HDFS客戶機想要編寫一個大小為248 MB的名為 example.txt 的文件

假設系統塊大小配置為128 MB(默認)。因此，客戶機將把example.txt文件分成兩個塊，一個是128 MB(塊A)，另一個是120 MB(塊B)。

Now, the following protocol will be followed whenever the data is written into HDFS:

At first, the HDFS client will reach out to the NameNode for a Write Request against the two blocks, say, Block A & Block B.（首先，HDFS客戶端將針對兩個塊（例如，塊A和塊B）向NameNode發出寫入請求）
The NameNode will then grant the client the write permission and will provide the IP addresses of the DataNodes where the file blocks will be copied eventually.（然後，NameNode將授予客戶端寫權限，並提供數據節點的IP位址，最終將在這些節點上複製文件塊。）
The selection of IP addresses of DataNodes is purely randomized based on availability, replication factor and rack awareness that we have discussed earlier.（datanode的IP位址的選擇是完全隨機的，基於我們前面討論過的可用性、複製因子和機架感知）
Let’s say the replication factor is set to default i.e. 3. Therefore, for each block the NameNode will be providing the client a list of (3) IP addresses of DataNodes. The list will be unique for each block.（假設複製因子設置為默認值，即3。因此，對於每個塊，NameNode將向客戶端提供一個datanode的(3)IP位址列表。對於每個塊，列表都是唯一的。）
Suppose, the NameNode provided following lists of IP addresses to the client:
For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6}For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9}
Each block will be copied in three different DataNodes to maintain the replication factor consistent throughout the cluster.（每個塊將被複製到三個不同的datanode中，以保持整個cluste的複製因子一致）
Now the whole data copy process will happen in three stages:（現在，整個數據複製過程將分三個階段進行）

Set up of Pipeline
Data streaming and replication
Shutdown of Pipeline (Acknowledgement stage)

1.1、Set up of Pipeline

在寫入塊之前，客戶端確認每個ip列表中的datanode是否準備好接收數據。在此過程中，客戶端通過連接每個塊的相應列表中的各個datanode來為每個塊創建一個管道。讓我們考慮a塊。NameNode提供的datanode列表是：

So, for block A, the client will be performing the following steps to create a pipeline:

The client will choose the first DataNode in the list (DataNode IPs for Block A) which is DataNode 1 and will establish a TCP/IP connection.（客戶端將選擇列表中的第一個DataNode（塊A的DataNode IP），即DataNode 1，並將建立TCP / IP連接）
The client will inform DataNode 1 to be ready to receive the block. It will also provide the IPs of next two DataNodes (4 and 6) to the DataNode 1 where the block is supposed to be replicated.（客戶端將通知DataNode 1準備接收數據塊。它還將為DataNode 1提供下兩個DataNode(4和6)的ip，在DataNode 1中複製塊。）
The DataNode 1 will connect to DataNode 4. The DataNode 1 will inform DataNode 4 to be ready to receive the block and will give it the IP of DataNode 6. Then, DataNode 4 will tell DataNode 6 to be ready for receiving the data.（DataNode 1將連接到DataNode 4。DataNode 1將通知DataNode 4準備接收塊，並將DataNode 6的IP給它。然後，DataNode 4將告訴DataNode 6準備接收數據。）
Next, the acknowledgement of readiness will follow the reverse sequence, i.e. From the DataNode 6 to 4 and then to 1.（接下來，確認準備就緒將遵循相反的順序，即從DataNode 6到4，然後到1）
At last DataNode 1 will inform the client that all the DataNodes are ready and a pipeline will be formed between the client, DataNode 1, 4 and 6.（最後，DataNode 1將通知客戶端所有的DataNode都準備好了，並在客戶端DataNode 1、DataNode 4和DataNode 6之間形成一個管道。）
Now pipeline set up is complete and the client will finally begin the data copy or streaming process.（現在管道設置完成，客戶端將最終開始數據複製或流處理。）

1.2、Data Streaming

在創建管道之後，客戶機將把數據推入管道。現在，不要忘記在HDFS中，數據是根據複製因子進行複製的。因此，這裡塊A將被存儲到三個datanode，假設複製因子為3。繼續，客戶機將僅將塊(A)複製到DataNode 1。複製總是按順序由datanode完成。

So, the following steps will take place during replication:

Once the block has been written to DataNode 1 by the client, DataNode 1 will connect to DataNode 4.（一旦客戶端將數據塊寫入到DataNode 1, DataNode 1將連接到DataNode 4。）
Then, DataNode 1 will push the block in the pipeline and data will be copied to DataNode 4.（然後，DataNode 1將數據塊推送到管道中，數據將被複製到DataNode 4）
Again, DataNode 4 will connect to DataNode 6 and will copy the last replica of the block.（同樣，DataNode 4將連接到DataNode 6並複製塊的最後一個副本。）

1.3、Shutdown of Pipeline or Acknowledgement stage

一旦將塊複製到所有三個DataNode中，將進行一系列確認，以確保客戶端和NameNode數據已成功寫入。然後，客戶端將最終關閉管道以結束TCP會話

總體的具體過程如下：

Client 調用 DistributedFileSystem 對象的 create 方法，創建一個文件輸出流（FSDataOutputStream）對象；
通過 DistributedFileSystem 對象與集群的 NameNode 進行一次 RPC 遠程調用，在 HDFS 的 Namespace 中創建一個文件條目（Entry），此時該條目沒有任何的 Block，NameNode 會返回該數據每個塊需要拷貝的 DataNode 地址信息；
通過 FSDataOutputStream 對象，開始向 DataNode 寫入數據，數據首先被寫入 FSDataOutputStream 對象內部的數據隊列中，數據隊列由 DataStreamer 使用，它通過選擇合適的 DataNode 列表來存儲副本，從而要求 NameNode 分配新的 block；
DataStreamer 將數據包以流式傳輸的方式傳輸到分配的第一個 DataNode 中，該數據流將數據包存儲到第一個 DataNode 中並將其轉發到第二個 DataNode 中，接著第二個 DataNode 節點會將數據包轉發到第三個 DataNode 節點；
DataNode 確認數據傳輸完成，最後由第一個 DataNode 通知 client 數據寫入成功；
完成向文件寫入數據，Client 在文件輸出流（FSDataOutputStream）對象上調用 close 方法，完成文件寫入；
調用 DistributedFileSystem 對象的 complete 方法，通知 NameNode 文件寫入成功，NameNode 會將相關結果記錄到 editlog 中。

2、客戶端讀文件

如下圖所示，確認按相反的順序發生，即從DataNode 6到4，然後到1。最後，DataNode 1將把三個確認(包括它自己的)推入管道，並將其發送給客戶機。客戶端將通知NameNode數據已被成功寫入。NameNode將更新它的元數據，客戶機將關閉管道。

Now, following steps will be taking place while reading the file:

The client will reach out to NameNode asking for the block metadata for the file 「example.txt」.（客戶端將向NameNode請求文件example.txt的塊元數據。）
The NameNode will return the list of DataNodes where each block (Block A and B) are stored.（NameNode將返回存儲每個塊(塊A和塊B)的datanode列表。）
After that client, will connect to the DataNodes where the blocks are stored.（在該客戶端之後，將連接到存儲塊的datanode。）
The client starts reading data parallel from the DataNodes (Block A from DataNode 1 and Block B from DataNode 3).（客戶端開始從DataNode並行讀取數據(從DataNode 1讀取塊A，從DataNode 3讀取塊B)）
Once the client gets all the required file blocks, it will combine these blocks to form a file.（一旦客戶端獲得所有需要的文件塊，它將組合這些塊形成一個文件）

其具體過程總結如下（簡單總結一下）：

Client 通過 DistributedFileSystem 對象與集群的 NameNode 進行一次 RPC 遠程調用，獲取文件 block 位置信息；
NameNode 返回存儲的每個塊的 DataNode 列表；
Client 將連接到列表中最近的 DataNode；
Client 開始從 DataNode 並行讀取數據；
一旦 Client 獲得了所有必須的 block，它就會將這些 block 組合起來形成一個文件。

在處理 Client 的讀取請求時，HDFS 會利用機架感知選舉最接近 Client 位置的副本，這將會減少讀取延遲和帶寬消耗。

HDFS的讀寫機制

數據採集技術簡介

YARN的介紹和一些實踐探索

Data Vault 簡介

Node.js架構剖析

帶你看懂 HMR 熱更新原理

限定性數據結構-棧

手寫一個簡單的HashMap

一文弄懂String的所有小秘密

詳解JS閉包概念

國密算法在數據存儲中的安全應用

Raft 算法摘要

為什麼Java的泛型要用"擦除"實現

Remax 原理淺析

詳解國密 SM2 的數字簽名

Python實現RabbitMQ中6種消息模型

靜態脫敏與動態脫敏的區別

Golang 中生成唯一的字符串（UUID，GUID）

.NET Core + Kubernetes：Pod

Redis zset內部實現

哈希表和高效數組鍊表的實現

.Net 微服務架構技術棧的那些事

Event Loop到底是什麼？

一文讀懂密碼學中的證書

【目標檢測從放棄到入門】一篇文章帶你入門前端視覺編譯技術