The meaning of Shell is the interface between Human and Particular OS (e.g. Linux).
In Ingestion Phase, We extract Data from Source Layer with
the help of tools like SQOOP, SPARK, TALEND, etc. and pulled into Hadoop (HDFS),
where Data is distributed in the form of Blocks.
SQOOP Commands in Ingestion phase can be implemented through
Shell Scripts to execute data into Hadoop.
Once the data is present in the Hadoop, we need to create a
table/query in the blocks again by using Shell Scripts.
When shell commands are executed with the
help of the file this is called shell scripting.
Data in Blocks are analysed and processed with the help of
frameworks such as Hive, Spark, Spark SQL, etc., by using Shell Scripts to
execute them.
All the commands in Edge Node need to be executed with the
help of Shell Scripts. Shell will convert User input (commands) into Machine
Language in the Linux OS environment.
Different Kinds of Shell are
a) a) Bourne
Again shell
a) b) Bash
shell
a) c) C
shell
a) d) Tenex
shell
a) e) Korn shell
Bash shell is the default widely used comes along with every Linux OS platform.
$: $ sign is used in the shell to retrieve the value of
variables.
echo: echo command is used to print the text or string to the
shell or output file.
a) a) For suppose, if you want to create 4 Directories, then
a) c) Type
“Escape
I”
a) d) Then
type “Escape :wq!”
a) e) Use
“cat” command to Read Data Inside the File.
a) f) Now,
execute the shell script using
a) g) To
check the Directories in Shell Script, we need to type:
In Unix Shell Script, there are 2 types of Variables
(containers which stores the data):
a) A) System Variables such as BASH, BASH_VERSION, HOME,
PWD
a) B) User Variables
Shell Commands in
Hadoop:
1) echo (to print matter)
Similarly, by inputting variable
To save the file name,
A) A) System Variables:
Create a File:
[cloudera@quickstart ~]$ vi req_2.sh
Type “Escape I”:
Save this by giving Escape :wq!
Read the File.
Shell Scripts that should be handle dynamically are called Parameterization i.e. we can create Shell Scripts only once (cannot change) in a day with different inputs.
/home/cloudera --- home path of edge node
/user/cloudera --- home path of HDFS
We cannot create or move a file in HDFS; only we can Copy the
file and one can delete the data under his user
id in cloudera.
As example here, given
a) Class10_Practice --- Directory (new)
a) b) mark.csv
--- File (existing) – need to use ‘vi’
command to create new file.
1) 2) list [to see all files/directories for
given hdfs destination / home path]
1) 3) mkdir [to create new directory]
1) 4) put (to move file from one location to another)
1) 5) cat [to Read data inside hdfs file]
1) 6) vi [to Create a New File in given
Directory]
[cloudera@quickstart ~]$ vi employee_data.txt
Press “Escape I”
at a time.
After Escape button, type :wq!
1) 7) copyFromLocal [to copy the file from Local file system to HDFS]
1) 8) get [to copy files from HDFS to Local
file system]
1) 9) cp [to copy files from one local hdfs system
to another local hdfs system]
say 2 are local hdfs file systems :
a) a) /user/cloudera/Class10_dir2
(new directory created)
b) /user/cloudera/Class10_Practice/ (existing directory)
1) 10) mv [to move data from one file location
to another]
1) 11) du [Disk Usage]
1) 12) rm -r [to remove the data from file]
1) 13) dfsadmin –report [to know the
capacity of the cluster]
1) 14) yarn node –list –all [to know the size of the cluster]
1) 15) touchz [to create Empty Directory]
Shell
Scripts in Hadoop (hdfs dfs) |
|
Commands |
Meaning |
echo |
Print |
jps |
to check
whether all Hadoop services are running or not |
ls |
list all
files / directories |
df |
Displays
free space |
count |
no. of
directories, files and bytes |
fsck |
to check
the Health of Hadoop file system |
balancer |
Run a
cluster balancing utility. |
mkdir |
to
create the directory |
put |
Copy
file from one location to another |
cat |
to Read
data inside file |
get |
to copy
the files from HDFS to Local File System. |
copyFromLocal |
to copy
the files from Local File System to HDFS. |
mv |
to move files from
source to destination |
du |
Disk
Usage |
rm -r |
to
remove entire directory and of its file content. |
dfsadmin
-report |
to know
the Capacity of the cluster |
yarn
node --list -all |
to know
the Size of the cluster. |
expunge |
makes
the trash empty |
chmod |
change
the Permissions of Files |
No comments:
Post a Comment