Big Data

Home » Technology » Big Data & Data Science » Big Data

Course Skill Level:

Foundational to Intermediate

Course Duration:

5 day/s

Course Delivery Format:

Live, instructor-led
Course Category:

Big Data & Data Science
Course Code:

BIGDATL21E09

Who should attend & recommended skills:

Those with basic IT and traditional database skills

Download Course Outline

Who should attend & recommended skills

This course is geared for those who want to use an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
Skill-level: Foundation-level Big Data skills for Intermediate skilled team members. This is not a basic class.
IT skills: Basic to Intermediate (1-5 years’ experience)
Traditional databases: Basic (1-2 years’ experience) helpful
Large-scale Data Analysis and NoSQL tools not necessary

About this course

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This course presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You’ll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you’ll learn specific technologies like Hadoop, Storm, and NoSQL databases.

Skills acquired & topics covered

Working in a hands-on learning environment, led by our Big Data expert instructor, participants will learn about and explore:

Using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team.
It guides readers through the theory of big data systems, how to implement them in practice
How to deploy and operate them once they’re built.
Introduction to big data systems
Real-time processing of web-scale data
Tools like Hadoop, Cassandra, and Storm
Extensions to traditional database skills

Course breakdown / modules

The properties of data
The fact-based model for representing data
Graph schemas
A complete data model for SuperWebAnalytics.com

The properties of data
The fact-based model for representing data
Graph schemas
A complete data model for SuperWebAnalytics.com

Why a serialization framework?
Apache Thrift
Limitations of serialization frameworks

Storage requirements for the master dataset
Choosing a storage solution for the batch layer
How distributed filesystems work
Storing a master dataset with a distributed filesystem
Vertical partitioning
Low-level nature of distributed filesystems
Storing the SuperWebAnalytics.com master dataset on a distributed filesystem

Using the Hadoop Distributed File System
Data storage in the batch layer with Pail
Storing the master dataset for SuperWebAnalytics.com

Motivating examples
Computing on the batch layer
Recomputation algorithms vs. incremental algorithms
Scalability in the batch layer
MapReduce: a paradigm for Big Data computing
Low-level nature of MapReduce
Pipe diagrams: a higher-level way of thinking about batch computation

An illustrative example
Common pitfalls of data-processing tools
An introduction to JCascalog
Composition

Design of the SuperWebAnalytics.com batch layer
Workflow overview
Ingesting new data
URL normalization
User-identifier normalization
Deduplicate pageviews
Computing batch views

Starting point
Preparing the workflow
Ingesting new data
URL normalization
User-identifier normalization
Deduplicate pageviews
Computing batch views

Performance metrics for the serving layer
The serving layer solution to the normalization/denormalization problem
Requirements for a serving layer database
Designing a serving layer for SuperWebAnalytics.com
Contrasting with a fully incremental solution

Basics of ElephantDB
Building the serving layer for SuperWebAnalytics.com

Computing realtime views
Storing realtime views
Challenges of incremental computation
Asynchronous versus synchronous updates
Expiring realtime views

Cassandra’s data model
Using Cassandra

Queuing
Stream processing
Higher-level, one-at-a-time stream processing
SuperWebAnalytics.com speed layer

Defining topologies with Apache Storm
Apache Storm clusters and deployment
Guaranteeing message processing
Implementing the SuperWebAnalytics.com uniques-over-time speed layer

Achieving exactly-once semantics
Core concepts of micro-batch stream processing
Extending pipe diagrams for micro-batch processing
Finishing the speed layer for SuperWebAnalytics.com
Another look at the bounce-rate-analysis example

Using Trident
Finishing the SuperWebAnalytics.com speed layer
Fully fault-tolerant, in-memory, micro-batch processing

Defining data systems
Batch and serving layers
Speed layer
Query layer

Ongoing training is a talent recruiting differentiator
Bonus outcome: Saving IT team time, resources, and budget

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Big Data

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

A new paradigm for Big Datafree audio

Data model for Big Datafree audio

Data model for Big Data: Illustration

Data storage on the batch layer

Data storage on the batch layer: Illustration

Batch layer

Batch layer: Illustration

An example batch layer: Architecture and algorithms

An example batch layer: Implementation

Serving layer

Serving layer: Illustration

Realtime views

Realtime views: Illustration

Queuing and stream processing

Queuing and stream processing: Illustration

Micro-batch stream processing

Micro-batch stream processing: Illustration

Lambda Architecture in depth

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail