HDFS Commands Cheat Sheet

Published in

Geek Culture

9 min readMar 31, 2022

A quick guide for HDFS Commands

HDFS is the main hub of the Hadoop ecosystem, responsible for storing large data sets both structured & unstructured across various nodes & thereby maintaining the metadata in the form of log files. Thus, to work with such a system, we need to be well versed or at least should be aware of the common commands and processes to ease our task. In that matter, we have consolidated some of the most commonly used HDFS commands that one should know to work with HDFS.

To begin with, we need to check the below list.

1. Install Hadoop

2. Run Hadoop — we can use the ‘start-all.cmd’ command or start directly from the Hadoop directory.

3. Verify Hadoop services — We can check if our Hadoop is up and running using the below command.

jps

Great..!!! Now we are ready to execute and learn the commands.

**Note:- These commands are case-specific. Do take special care of capital and small letter while writing the commands.

version — this command is used to know the version of our Hadoop, with additional local file system location and compilation information.

hadoop version

2. mkdir — this command is used to create a new directory, if it does not exists. If the directory exists, it will give a “File already exists” error.

hadoop fs -mkdir <Directory Path/Name>

mkdir

3. ls — this command is used to check the files or directory in the HDFS. It shows the name, permissions, owner, size, and modification date for each file or directory in the specified directory.

hadoop fs -ls

4. put — this command is used to copy the data from the local file system to HDFS.

hadoop fs -put <Local File Path> <HDFS file path>

hadoop put

We can verify the same from HDFS WebUI.

5. get — this command is used to copy the data from HDFS to the local file system. This command is the reverse of the ‘put’ command.

hadoop fs -get <HDFS file path> <Local File Path>

hadoop get

We can verify the same from our local file system.

6. cat — command used to view the data from the file in HDFS

hadoop fs -cat <HDFS file path with file name>

7. mv — this command is used to move a file from one location to HDFS to another location in HDFS.

hadoop fs -mv <Source HDFS path> <Destination HDFS path>

we can verify the same from Web UI.

8. cp — this command is used to copy a file from one location to HDFS to another location within HDFS only.

hadoop fs -cp <Source HDFS path> <Destination HDFS path>

we can verify the same from Web UI.

9. copyFromLocal — this command is used to copy data from the local file system to HDFS.

hadoop fs -copyFromLocal <Local File Path> <HDFS file path>

Hadoop copyFromLocal

We can verify the copied file from WebUI.

10. copyToLocal — this command is used to copy data from HDFS to the local file system.

hadoop fs -copyToLocal <HDFS file path> <Local File Path>

hadoop copyToLocal

We can check the file in our local file system.

11. moveFromLocal — this command is used for moving a file or directory from the local file system to HDFS.

hadoop fs -moveFromLocal <Local File Path> <HDFS file path>

hadoop moveFromLocal

12. moveToLocal — this command is used for moving a file or directory from HDFS to the local file system. This command is yet not implemented, but soon will be.

hadoop fs -moveToLocal <HDFS file path> <Local File Path>

13. rm — removes, this command is used to delete/remove a file from HDFS.

hadoop fs -rm <HDFS file path>

14. tail — this command is used to read the tail/end part of the file from HDFS. It has an additional parameter “[-f]”, that is used to show the appended data to the file.

hadoop fs -tail [-f] <HDFS file path>

15. expunge — this command is used to make the trash empty.

hadoop fs -expunge

16. chown — we should use this command when we want to change the user of a file or directory in HDFS.

hadoop fs -chown <HDFS file path>

We can verify if the user changed or not using the hadoop -ls command or from WebUI.

17. chgrp — we should use this command when we want to change the group of a file or directory in HDFS.

hadoop fs -chgrp <HDFS file path>

We can verify if the user changed or not using the hadoop -ls command or from WebUI.

18. setrep — this command is used to change the replication factor of a file in HDFS.

hadoop fs -setrep <Replication Factor> <HDFS file path>

We can check it from the WebUI.

19. du — this command is used to check the amount of disk usage of the file or directory.

hadoop fs -du <HDFS file path>

20. df — this command is used to shows the capacity, free space and size of the HDFS file system. It has an additional parameter “[-h]” to convert the data to a human-readable format.

hadoop fs -df [-h] <HDFS file path>

21. fsck — this command is used to check the health of the files present in the HDFS file system.

hadoop fsck <HDFS file path>

It also has some attributes/options to modify the command use.

22. touchz — this command creates a new file in the specified directory of size 0.

hadoop fs -touchz <HDFS file path>

The new file can be seen in the WebUI.

23. test — this command answer various questions about <HDFS path>, with the result via exit status.

hadoop fs -test <HDFS file path>

24. text — this is a simple command, used to print the data of an HDFS file on the console.

hadoop fs -text <HDFS file path>

25. stat — this command provides the stat of the file or directory in HDFS.

hadoop fs -stat <HDFS file path>

It can provide data in the following formats. By default, it uses ‘%y’.

26. usage — Displays the usage for given command or all commands if none is specified.

hadoop fs -usage <command>

27. help — Displays help for given command or all commands if none is specified.

hadoop fs -help <command>

28. chmod — is used to change the permission of the file in the HDFS file system.

hadoop fs -chmod [-r] <HDFS file path>

Old Permission

New Permission

hadoop chmod new permission

29. appendToFile — this command is used to merge two files from the local file system to one file in the HDFS file.

hadoop fs -appendToFile <Local file path1> <Local file path2> <HDFS file path>

30. checksum — this command is used to check the checksum of the file in the HDFS file system.

hadoop fs -checksum <HDFS file Path>

hadoop checksum

31. count — it counts the number of files, directories and size at a particular path.

hadoop fs -count [options] <HDFS directory path>

This function also has few functions to modify the query as per need.

32. find — this command is used to find the files in the HDFS file system. We need to provide the expression that we are looking for and can also provide a path if we want to look for the file at a particular directory.