Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Big Data

  • Course Code: Big Data - Big Data
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 5 Days Audience: This course is geared for those who wants to use an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data

Course Snapshot 

  • Duration: 5 days 
  • Skill-level: Foundation-level Big Data skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who wants to use an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation. 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive. 

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This course presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You’ll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you’ll learn specific technologies like Hadoop, Storm, and NoSQL databases. 

Working in a hands-on learning environment, led by our Big Data expert instructor, students will learn about and explore: 

  • using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.  
  • It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. this  
  • guides readers through the theory of big data systems, how to implement them in practice 
  • how to deploy and operate them once they’re built. 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Introduction to big data systems 
  • Real-time processing of web-scale data 
  • Tools like Hadoop, Cassandra, and Storm 
  • Extensions to traditional database skills 

Audience & Pre-Requisites 

This course is geared for attendees who want to use an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data 

Pre-Requisites:  Students should have  

  • Basic to Intermediate IT Skills.  
  • no previous exposure to large-scale data analysis or NoSQL tools.  
  • Familiarity with traditional databases is helpful. 

Course Agenda / Topics 

  1. A new paradigm for Big Datafree audio 
  1. Data model for Big Datafree audio 
  • The properties of data 
  • The fact-based model for representing data 
  • Graph schemas 
  • A complete data model for SuperWebAnalytics.com 
  1. Data model for Big Data: Illustration 
  • Why a serialization framework? 
  • Apache Thrift 
  • Limitations of serialization frameworks 
  1. Data storage on the batch layer 
  • Storage requirements for the master dataset 
  • Choosing a storage solution for the batch layer 
  • How distributed filesystems work 
  • Storing a master dataset with a distributed filesystem 
  • Vertical partitioning 
  • Low-level nature of distributed filesystems 
  • Storing the SuperWebAnalytics.com master dataset on a distributed filesystem 
  1. Data storage on the batch layer: Illustration 
  • Using the Hadoop Distributed File System 
  • Data storage in the batch layer with Pail 
  • Storing the master dataset for SuperWebAnalytics.com 
  1. Batch layer 
  • Motivating examples 
  • Computing on the batch layer 
  • Recomputation algorithms vs. incremental algorithms 
  • Scalability in the batch layer 
  • MapReduce: a paradigm for Big Data computing 
  • Low-level nature of MapReduce 
  • Pipe diagrams: a higher-level way of thinking about batch computation 
  1. Batch layer: Illustration 
  • An illustrative example 
  • Common pitfalls of data-processing tools 
  • An introduction to JCascalog 
  • Composition 
  1. An example batch layer: Architecture and algorithms 
  • Design of the SuperWebAnalytics.com batch layer 
  • Workflow overview 
  • Ingesting new data 
  • URL normalization 
  • User-identifier normalization 
  • Deduplicate pageviews 
  • Computing batch views 
  1. An example batch layer: Implementation 
  • Starting point 
  • Preparing the workflow 
  • Ingesting new data 
  • URL normalization 
  • User-identifier normalization 
  • Deduplicate pageviews 
  • Computing batch views 
  1. Serving layer 
  • Performance metrics for the serving layer 
  • The serving layer solution to the normalization/denormalization problem 
  • Requirements for a serving layer database 
  • Designing a serving layer for SuperWebAnalytics.com 
  • Contrasting with a fully incremental solution 
  1. Serving layer: Illustration 
  • Basics of ElephantDB 
  • Building the serving layer for SuperWebAnalytics.com 
  1. Realtime views 
  • Computing realtime views 
  • Storing realtime views 
  • Challenges of incremental computation 
  • Asynchronous versus synchronous updates 
  • Expiring realtime views 
  1. Realtime views: Illustration 
  • Cassandra’s data model 
  • Using Cassandra 
  1. Queuing and stream processing 
  • Queuing 
  • Stream processing 
  • Higher-level, one-at-a-time stream processing 
  • SuperWebAnalytics.com speed layer 
  1. Queuing and stream processing: Illustration 
  • Defining topologies with Apache Storm 
  • Apache Storm clusters and deployment 
  • Guaranteeing message processing 
  • Implementing the SuperWebAnalytics.com uniques-over-time speed layer 
  1. Micro-batch stream processing 
  • Achieving exactly-once semantics 
  • Core concepts of micro-batch stream processing 
  • Extending pipe diagrams for micro-batch processing 
  • Finishing the speed layer for SuperWebAnalytics.com 
  • Another look at the bounce-rate-analysis example 
  1. Micro-batch stream processing: Illustration 
  • Using Trident 
  • Finishing the SuperWebAnalytics.com speed layer 
  • Fully fault-tolerant, in-memory, micro-batch processing 
  1. Lambda Architecture in depth 
  • Defining data systems 
  • Batch and serving layers 
  • Speed layer 
  • Query layer 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?