
Tuesday, February 15, 2022

Shell Scripts Commands in Hadoop HDFS (Class -5)

 The meaning of Shell is the interface between Human and Particular OS (e.g. Linux).

In Ingestion Phase, We extract Data from Source Layer with the help of tools like SQOOP, SPARK, TALEND, etc. and pulled into Hadoop (HDFS), where Data is distributed in the form of Blocks.

SQOOP Commands in Ingestion phase can be implemented through Shell Scripts to execute data into Hadoop.

Once the data is present in the Hadoop, we need to create a table/query in the blocks again by using Shell Scripts.

When shell commands are executed with the help of the file this is called shell scripting.

Data in Blocks are analysed and processed with the help of frameworks such as Hive, Spark, Spark SQL, etc., by using Shell Scripts to execute them.

All the commands in Edge Node need to be executed with the help of Shell Scripts. Shell will convert User input (commands) into Machine Language in the Linux OS environment.

Different Kinds of Shell are

a)      a) Bourne Again shell

a)     b)  Bash shell

a)     c)  C shell

a)      d) Tenex shell

a)     e)  Korn shell

      Bash shell is the default widely used comes along with every Linux OS platform.

$: $ sign is used in the shell to retrieve the value of variables.

echo: echo command is used to print the text or string to the shell or output file.

Shell Scripts are nothing but group of Linux Commands with extension .sh.

a)      a) For suppose, if you want to create 4 Directories, then

b) Instead,

a)      c) Type “Escape I

a)      d) Then type “Escape :wq!

a)      e) Use “cat” command to Read Data Inside the File.

a)     f)  Now, execute the shell script using

a)    g)   To check the Directories in Shell Script, we need to type:

Now, you know that Shell scripts are useful to keep all commands in the file and execute them.

In Unix Shell Script, there are 2 types of Variables (containers which stores the data):

a)      A)  System Variables such as BASH, BASH_VERSION, HOME, PWD

a)     B)  User Variables 

Shell Commands in Hadoop:

1)      echo (to print matter)

Similarly, by inputting variable

To save the file name,

A)   A)   System Variables:

Create a File:

[cloudera@quickstart ~]$ vi

Type “Escape I”:

Save this by giving Escape :wq!

Read the File.

Execute Shell Script using sh.

Shell Scripts that should be handle dynamically are called Parameterization i.e. we can create Shell Scripts only once (cannot change) in a day with different inputs.

/home/cloudera --- home path of edge node

/user/cloudera --- home path of HDFS

We cannot create or move a file in HDFS; only we can Copy the file and one can delete the data under his user id in cloudera.

As example here, given

a)      Class10_Practice --- Directory (new)

a)      b) mark.csv --- File (existing) – need to use ‘vi’ command to create new file.

1)      2) list [to see all files/directories for given hdfs destination / home path]

1)      3) mkdir [to create new directory]

1)      4) put (to move file from one location to another)

1)      5) cat [to Read data inside hdfs file]

1)     6)  vi [to Create a New File in given Directory]

[cloudera@quickstart ~]$ vi employee_data.txt

Press “Escape I” at a time.

After Escape button, type :wq!

1)     7) copyFromLocal [to copy the file from Local file system to HDFS]

1)      8) get [to copy files from HDFS to Local file system]

1)     9)  cp [to copy files from one local hdfs system to another local hdfs system]

say 2 are local hdfs file systems :

a)  a)   /user/cloudera/Class10_dir2 (new directory created)

b)   /user/cloudera/Class10_Practice/ (existing directory)

1)      10) mv [to move data from one file location to another]

1)      11) du [Disk Usage]

1)     12)  rm -r [to remove the data from file]

1)    13)   dfsadmin –report [to know the capacity of the cluster]

1)      14) yarn node –list –all [to know the size of the cluster]

1)      15) touchz [to create Empty Directory]

Shell Scripts in Hadoop (hdfs dfs)






to check whether all Hadoop services are running or not


list all files / directories


Displays free space


no. of directories, files and bytes


to check the Health of Hadoop file system


Run a cluster balancing utility.


to create the directory


Copy file from one location to another


to Read data inside file


to copy the files from HDFS to Local File System.


to copy the files from Local File System to HDFS.


to move files from source to destination


Disk Usage

rm -r

to remove entire directory and of its file content.

dfsadmin -report

to know the Capacity of the cluster

yarn node --list -all

to know the Size of the cluster.


makes the trash empty


change the Permissions of Files

No comments:

Post a Comment