Labels

Saturday, April 30, 2022

Spark Core RDD Operations (Class -41)

Resilient Distributed Dataset (RDDs) are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it.

At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster. The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. 

Apache Spark RDD supports two types of Operations-

·         Transformations

·         Actions

 


Friday, April 29, 2022

Spark Ecosystem (Class -40)

 Apache Spark is an Open Source analytical framework for large scale powerful distributed data processing and machine learning applications. Spark has become a top-level Apache project since Feb 2014. Spark is a general-purpose, in-memory, fault-tolerant, distributed processing engine that allows you to process data efficiently 100x faster than traditional systems.

Using Spark we can process data from Hadoop HDFSAWS S3Databricks DBFSAzure Blob Storage, and many file systems. Spark also is used to process real-time data using Streaming and Kafka.


Spark ecosystem consists of 5 tightly integrated components which are

  Spark Core

 Spark SQL

Spark Streaming

 MLlib

 GraphX

Sunday, April 24, 2022

SPARK Introduction (Class -39)

Spark was introduced by Apache Software Foundation is a lightning-fast cluster computing technology, designed for speeding up the Hadoop computational computing software process.

Spark is not a modified version of Hadoop and not entirely dependent on Hadoop because it has its own cluster management. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

Spark is not an ecosystem of Hadoop as it can run individually. Spark uses Hadoop in two ways – one is Storage and second is Processing (MapReduce)

Hadoop frameworks are known for analyzing datasets based on a simple programming model (MapReduce) and main concern is to maintain speed in processing large datasets in terms of waiting time between queries and waiting time to run the program.


Earlier Hadoop Versions:

Hadoop 1.0 introduced in 2006 and used up-to 2012 until Hadoop 2.0 (YARN) came into the picture.

Main drawbacks of Hadoop 1.0 are

a)    Single Point of Failure

b)    Block Size

c)    Relying on MapReduce (MR) [1970] for Resource management and processing engine.

In 2008, Cloudera becomes the commercial version of Hadoop which is open source and enterprise.

Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei Zaharia when they are testing on Resource manager called Mesos Cluster, not for processing the data.

Spark was Open Sourced in 2010 under a BSD license. It was donated to Apache software foundation in 2013, and now Apache Spark has become a top level Apache project from Feb-2014.

Saturday, April 23, 2022

Aggregate Functions in SQL (Part -7)

 SQL aggregation function is used to perform the calculations on multiple rows of a single column of a table which returns a single value. It is also used to summarize the data.

We often use aggregate functions with the GROUP BY, WHERE and HAVING clauses of the SELECT statement.


1)     SUM:

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.


2)  COUNT:

COUNT function is used to Count the number of rows in a database table. It can work on both numeric and non-numeric data types.


3) MAX:

MAX function is used to find the maximum value of a certain column. This function determines the largest value of all selected values of a column.


4) MIN:

MIN function is used to find the minimum value of a certain column. This function determines the smallest value of all selected values of a column.


5)     AVG:

 The AVG function is used to calculate the average value of the numeric type. AVG function returns the average of all non-Null values.


Thursday, April 21, 2022

Operators in SQL (Part -6)

 SQL Operators are special Words or Characters used to perform specific tasks both mathematical and logical computations on operands, which use ‘WHERE’ clause in a SQL query / statement.

There are six types of SQL operators that we are going to cover: Arithmetic, Bitwise, Comparison, Compound, Logical and String.

Every database administrator and user uses SQL queries for manipulating and accessing the data of database tables and views with the help of reserved words and characters, which are used to perform arithmetic operations, logical operations, comparison operations, compound operations, etc.

SQL Operators

Description

Arithmetic

Add (+), Subtract (-), Multiply (*), Divide (/), Modulo (%)

Bitwise

AND (&), OR (|), exclusive OR (^)

Comparison

Equal to (=), Greater than (>), Less than (<), Greater than or equal to (>=), Less than or equal to (<=), Not equal to (<>)

Compound

Add equals (+=), Subtract equals (-=), Multiply equals (*=), Divide equals (/=), Modulo equals (%=), Bitwise AND equals (&=), Bitwise exclusive equals (^-=), Bitwise OR equals (|*=)


Wednesday, April 20, 2022

SQL Constraints (Part -5)

 SQL constraints are conditions / rules that apply on the data columns of a table which is used to limit the type of data that goes inside the table.

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command.

There are 3 types of Constraints:

A)     Key Constraint

B)     Domain Constraint

C)     Referential Integrity Constraint



The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.

A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).

A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.

A FOREIGN KEY comprises of single or collection of fields in a table that essentially refers to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.
   The table with the foreign key constraint is labelled as the child table, and the table containing the candidate key is labelled as the referenced or parent table.

Monday, April 18, 2022

Create a Table in MySQL (Part -4)

 A MySQL table stores and organizes data in columns and rows as defined during table creation.


In the process of creating a table, you need to specify the following information:

  • Column names – We are creating the titlegenredirector, and release year columns for our table.
  • Varchar of the columns containing characters – Specifies the maximum number of characters stored in the column.
  • The integer of the columns containing numbers – Defines numeric variables holding whole numbers.
  • Not null rule – Indicates that each new record must contain information for the column.
  • Primary key – Sets a column that defines a record.

SQL in RDBMS (Part -3)

RDBMS (Relational Database Management System) is one of the types of DBMS which shows Relational between the Tables (where data are stored in the form of ROWS & COLUMNS).

In other words, Relational database is a type of database that allows us to identify and access data in relation to another piece of data in the database. It stores data in rows and columns in a series of tables to make processing and querying efficient.


Rows (Tuples) are Horizontal while Columns (Headers are called Attributes – ID, Name, etc.) are Vertical in a Table.

Degree of Relation = No. of Columns

Cardinality = No. of Rows

Data present inside the Column is called ‘Domain Values’.

Databases in SQL (Part -2)

Data usually comes through raw format in an unorganized manner and processing of such data is called ‘Information’.

Our Data are stored in Databases [db] consists set of Tables (Rows & Columns). A Table is an organized collection of data stored in the form of Rows and Columns. Columns can be categorized as vertical, while Rows as horizontal. The Columns in a table are called Fields in records while the Rows can be referred to as unique Records.

For example, a Table that contains Employee data for a company might contain a row for each employee and columns representing employee information such as employee number, name, address, job title, etc.

A Computer can have one or more than one instance of SQL Server installed. Each instance of SQL Server can contain one or many databases. Within a database, there are one or many object ownership groups called schemas. Within each schema there are database objects such as tables, views, and stored procedures. Some objects such as certificates and asymmetric keys are contained within the database, but are not contained within a schema.



SQL Server databases are stored in the file system in files. Files can be grouped into file-groups. 

At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in file-groups for allocation and administration purposes.

The number of tables in a database is limited only by the number of objects allowed in a database (2,147,483,647). A standard user-defined table can have up to 1,024 columns. The number of rows in the table is limited only by the storage capacity of the server.

SQL vs MySQL: Key Difference (Part -1)

SQL extends for Structured Query Language that enables the user to design and manage databases, while MySQL is a Relational database management system that allows a user to store and retrieve data from the database.

SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.

Both the technologies work on the concept of storing data as per schema (table storage). MySQL is inclined more towards selecting the data to facilitate data display, update and save the data again. It is a bit weaker than SQL Server in terms of data insertion and deletion.


Many famous web-based applications and companies use MySQL like WordPress, YouTube, Joomla, etc. SQL is also used by many platforms like MYSQL, Oracle, Microsoft SQL Server, etc.

Thursday, April 14, 2022

Access Modifiers in Scala (Class -38)

Access Modifiers in scala are used to define the access field of members of packages, classes or objects in scala. For using an access modifier, you must include its keyword in the definition of members of package, class or object. These modifiers will restrict accesses to the members to specific regions of code.

There are 3 types of Access Modifiers in Scala:

a)      Public

b)      Private

c)       Protected

Singleton Object in Scala (Class -37)

Singleton Object is another Class of only One instance which can be created using keyword called ‘object’. An object that can exist without a class is a singleton object. It objects keyword is used to define it.

 A Scala object is a singleton object that is accessible to any Scala code that has visibility of that object definition. The term singleton here refers to the fact that there is a single instance of the object definition within the virtual machine executing the Scala program. This is guaranteed by the language itself and does not require any additional programmer intervention.


Objects can • extend classes, • mix in traits, • define methods and functions • as well as properties (both vals and vars). • But they cannot define constructors.

Class and Objects in Scala (Class -36)

 Class is nothing but blueprint for creating an object.

         e.g.:- variables, function objects, methods, etc.

 A Class is one of the basic building blocks of Scala. Classes act as templates which are used to construct instances. Classes allow programmers to specify the structure of an instance (i.e. its instance variables or fields) and the behaviour of an instance (i.e. its methods and functions) separately from the instance itself. This is important, as it would be extremely time-consuming (as well as inefficient) for programmers to define each instance individually. Instead, they define classes and create instances of those classes.


Currying function in Scala (Class -35)

The name Currying may seem obscure but the technique is named after Haskell Curry (for whom the Haskell programming language is named). Grouping of Parameters together is called Currying.

A currying function is a transforming function with multiple arguments transformed into single arguments. A currying function takes two arguments into a function that takes only a single argument.


There are two syntaxes to define the currying functions in Scala.

def functionName (arg1) = (arg2) => operation

  def functionName(arg1) (arg2) = operation



In the first syntax, the function takes arg1 which equals arg2 and then the operation is performed.

The first single argument is the original function argument. This function returns another function that takes the second of the original function. This chaining continuous for all arguments of the function.

The last function in this chain does the actual word of the function call.

Higher Order Functions (HOF) in Scala (Class -34)

 A Function that takes a function as a parameter is referred to as HOF. In other words, passing a function to another function is called Higher Order Function.

In Scala higher-order functions are functions that do at least one of the following (and may do both):

• Takes one or more functions as arguments (parameters).

• Return as output a function.


Example:

                 def f1 = println(“I am god”)

                 def f2(f:Unit) = f

                  f2(f1)

 Here f1 is default first class function.

   If f2 can take f1 as an argument, then f2 is HOF.

Tuesday, April 12, 2022

Functional Programming (FP) in Scala (Class -33)

When programs get larger, you need some way to divide them into smaller, more manageable pieces. For dividing up control flow, Scala offers an approach familiar to all experienced programmers: divide the code into functions.

 In fact, Scala offers several ways to define functions that are not present in Java.

 The most common way to define a function is as a member of some object; such a function is called a ‘method’.

 The main function in Scala is defined as,

def main(args: Array[String]) {


 Functions in Scala are called First Class Citizens. Not only can you define functions and call them, but you can write down functions as unnamed literals and then pass them around as values. In other words, If you can treat a Function as a Value, it is a First Class Function.

a)   We can assign an object to a function à function object.

b)      We can assign one function object to another function object.

c)       Function object can be passed as a parameter to another function / method.

d)      Functions object are returned from a method or function.

 Point’s c & d are Higher Order Functions. 

Match Expressions in Scala (Class -32)

 Scala has a concept of a match expression. This is also called “Pattern Matching”.

    Here, “match” keyword is used instead of switch statement. “Match” is always defined in Scala’s root class to make its availability to the all objects. This can contain a sequence of alternatives. Each alternative will start from case keyword. Each case statement includes a pattern and one or more expression which get evaluated if the specified pattern gets matched. To separate the pattern from the expressions, arrow symbol (=>) is used.





Collections in Scala (Class -31)

Collection in programming is a simple object used to collect data. It groups together elements into a single entity (object). You can do the operation is to add, delete, update data using collections. The Collection in Scala can be mutable as well as immutable. 

A Mutable collection is a collection whose elements can be updated and elements are added or removed from it. It allows all these operations.

An Immutable collection does not allow the user to do the update operation or add and remove operation on it. There is an option to do this, but on every operation, a new collection is created with updated value and the old one is discarded.

Scala collections have a rich hierarchy. The traversable trait is at the root of Scala hierarchy, all classes inherit some traits that are required for the general functioning of the collections.


Monday, April 11, 2022

Operators in Scala (Class -30)

 An Operator is a symbol that represents an operation that is to be performed in the program. 

      Operators tell the compiler to perform a specific operation; each operator is associated with one and has a unique identification code. Operators play an important role in programming and they are used to make an expression that executes to perform a task. 

Scala as a huge range of operators:

a)      a)   Arithmetic operators

b)     Relational operators

c)      Logical operators

d)     Bitwise operators

e)      Assignment operators


Data Types in Scala (Class -29)

 In a programming language, a data type is also known as type, tells the compiler about the type of data that is used by the programmer.

The Scala data types are taken as it is from Java and the storage and length are the same. There are many different types of data types in Scala.

1 Byte = 8 Bits.

Data Types

Values

Storage

Default Value

Usage

Boolean

True / False

2 Bytes

FALSE

Only 2 values.

Integer

minus 2147483648 to 2147483647

4 bytes

0

Commonly used in programming.

Float

IEEE 754 single-precision float.

4 Bytes

0.0F

For the decimal point numbers.

Double

IEEE 754 double-precision float.

8 Bytes

0.0D

To handle decimal point numbers that needs larger value and more precision.

Char

0 to 216-1 Unicode

2 Bytes

\u000'

To handle character assignments stored as unsigned Unicode characters.

String

any length

40 Bytes (Empty String)

Null

Used to store a character sequence in a program.