Labels

Tuesday, February 15, 2022

Shell Scripts Commands in Hadoop HDFS (Class -5)

 The meaning of Shell is the interface between Human and Particular OS (e.g. Linux).

In Ingestion Phase, We extract Data from Source Layer with the help of tools like SQOOP, SPARK, TALEND, etc. and pulled into Hadoop (HDFS), where Data is distributed in the form of Blocks.

SQOOP Commands in Ingestion phase can be implemented through Shell Scripts to execute data into Hadoop.

Once the data is present in the Hadoop, we need to create a table/query in the blocks again by using Shell Scripts.

When shell commands are executed with the help of the file this is called shell scripting.

Data in Blocks are analysed and processed with the help of frameworks such as Hive, Spark, Spark SQL, etc., by using Shell Scripts to execute them.

All the commands in Edge Node need to be executed with the help of Shell Scripts. Shell will convert User input (commands) into Machine Language in the Linux OS environment.

Different Kinds of Shell are

a)      a) Bourne Again shell

a)     b)  Bash shell

a)     c)  C shell

a)      d) Tenex shell

a)     e)  Korn shell

      Bash shell is the default widely used comes along with every Linux OS platform.

$: $ sign is used in the shell to retrieve the value of variables.

echo: echo command is used to print the text or string to the shell or output file.


Shell Scripts are nothing but group of Linux Commands with extension .sh.

a)      a) For suppose, if you want to create 4 Directories, then


b) Instead,




a)      c) Type “Escape I


a)      d) Then type “Escape :wq!


a)      e) Use “cat” command to Read Data Inside the File.


a)     f)  Now, execute the shell script using


a)    g)   To check the Directories in Shell Script, we need to type:


Now, you know that Shell scripts are useful to keep all commands in the file and execute them.

In Unix Shell Script, there are 2 types of Variables (containers which stores the data):

a)      A)  System Variables such as BASH, BASH_VERSION, HOME, PWD

a)     B)  User Variables 

Shell Commands in Hadoop:

1)      echo (to print matter)


Similarly, by inputting variable


To save the file name,



A)   A)   System Variables:

Create a File:

[cloudera@quickstart ~]$ vi req_2.sh


Type “Escape I”:


Save this by giving Escape :wq!


Read the File.


Execute Shell Script using sh.


Shell Scripts that should be handle dynamically are called Parameterization i.e. we can create Shell Scripts only once (cannot change) in a day with different inputs.


/home/cloudera --- home path of edge node

/user/cloudera --- home path of HDFS

We cannot create or move a file in HDFS; only we can Copy the file and one can delete the data under his user id in cloudera.

As example here, given

a)      Class10_Practice --- Directory (new)

a)      b) mark.csv --- File (existing) – need to use ‘vi’ command to create new file.

1)      2) list [to see all files/directories for given hdfs destination / home path]


1)      3) mkdir [to create new directory]


1)      4) put (to move file from one location to another)


1)      5) cat [to Read data inside hdfs file]


1)     6)  vi [to Create a New File in given Directory]

[cloudera@quickstart ~]$ vi employee_data.txt


Press “Escape I” at a time.


After Escape button, type :wq!

1)     7) copyFromLocal [to copy the file from Local file system to HDFS]




1)      8) get [to copy files from HDFS to Local file system]


1)     9)  cp [to copy files from one local hdfs system to another local hdfs system]

say 2 are local hdfs file systems :

a)  a)   /user/cloudera/Class10_dir2 (new directory created)

b)   /user/cloudera/Class10_Practice/ (existing directory)



1)      10) mv [to move data from one file location to another]



1)      11) du [Disk Usage]


1)     12)  rm -r [to remove the data from file]


1)    13)   dfsadmin –report [to know the capacity of the cluster]


1)      14) yarn node –list –all [to know the size of the cluster]


1)      15) touchz [to create Empty Directory]


Shell Scripts in Hadoop (hdfs dfs)

Commands

Meaning

echo

Print

jps

to check whether all Hadoop services are running or not

ls

list all files / directories

df

Displays free space

count

no. of directories, files and bytes

fsck

to check the Health of Hadoop file system

balancer

Run a cluster balancing utility.

mkdir

to create the directory

put

Copy file from one location to another

cat

to Read data inside file

get

to copy the files from HDFS to Local File System.

copyFromLocal

to copy the files from Local File System to HDFS.

mv

to move files from source to destination

du

Disk Usage

rm -r

to remove entire directory and of its file content.

dfsadmin -report

to know the Capacity of the cluster

yarn node --list -all

to know the Size of the cluster.

expunge

makes the trash empty

chmod

change the Permissions of Files


No comments:

Post a Comment