This notebook was prepared by Donne Martin. Source and license info is on GitHub.

HDFS

Run an HDFS command:


In [ ]:
!hdfs

Run a file system command on the file systems (FsShell):


In [ ]:
!hdfs dfs

List the user's home directory:


In [ ]:
!hdfs dfs -ls

List the HDFS root directory:


In [ ]:
!hdfs dfs -ls /

Copy a local file to the user's directory on HDFS:


In [ ]:
!hdfs dfs -put file.txt file.txt

Display the contents of the specified HDFS file:


In [ ]:
!hdfs dfs -cat file.txt

Print the last 10 lines of the file to the terminal:


In [ ]:
!hdfs dfs -cat file.txt | tail -n 10

View a directory and all of its files:


In [ ]:
!hdfs dfs -cat dir/* | less

Copy an HDFS file to local:


In [ ]:
!hdfs dfs -get file.txt file.txt

Create a directory on HDFS:


In [ ]:
!hdfs dfs -mkdir dir

Recursively delete the specified directory and all of its contents:


In [ ]:
!hdfs dfs -rm -r dir

Specify HDFS file in Spark (paths are relative to the user's home HDFS directory):


In [ ]:
data = sc.textFile ("hdfs://hdfs-host:port/path/file.txt")