Corresponding Component Names
Corresponding Component Names
Apache Hadoop Project
Design Assumptions and Goals
HDFS Architecture
Master/Slave Architecture
NameNode
DataNode:
Files and Directories:
Files and Directories:
Data Replication:
Data Replication:
Replica Placement
Replica Placement: Hardware Settings
Placement Policy: Simple and non-optimal
Placement Policy: HDFS default policy
Placement Policy: HDFS default policy
Replication factor greater than 3:
DataNodes are not allowed to have multiple replicas of the same block
Run the following (lngo/goram)
$ ssh -p 22 lngo@ms0225.utah.cloudlab.us
$ sudo su
$ /opt/hadoop-3.1.1/bin/hdfs dfs -ls /
$ /opt/hadoop-3.1.1/bin/hdfs fsck /test/t8.shakespeare.txt -blocks -files -racks
$ /opt/hadoop-3.1.1/bin/hdfs fsck /test2/t8.shakespeare.txt -blocks -files -racks
To view available commands
$ /opt/hadoop-3.1.1/bin/hdfs
On each data node, the actual data blocks are located at
/tmp/hadoop-root/dfs/data/current
HDFS Writes
Staging
Pipelining
First resource manager/job controller on HDFS
Execution Progress
Conceptual Design/Differences
Step 1: A client program submits the application
Step 2: Resource Manager negotiates a container to start the Application Master and then launches the Application Master
Step 3: The Application Master, on boot-up, registers with the Resource Manager. This allows the client to query the Resource Manager for details to directly interact with its Application Master
Step 4: Application Master negotiates resource containers via the resource-request protocol
Step 5: After successful allocations, the Application Master launches the container by providing the container launch specification to the Node Manager. This includes command line to launch, environment variables, local resources (jars, shared-objects, ...), and security-related tokens.
Step 6: The application code executing within the container then provides logging info back to its ApplicationMaster via an application-specific protocol.
Step 7: During the application execution, the client that submitted the program communicates directly with the Application Master to get status, progress, updates via an application-specific protocol.
Step 8: Upon completion, the Application Master deregisters with the ResourceManager and shuts down, allowing its own container to be repurposed.