Data Science Boot Camp

Home » Technology » Big Data & Data Science » Data Science Boot Camp

Course Skill Level:

Foundational

Course Duration:

6 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

DSBOCAL21E09

Who should attend & recommended skills:

Those with basic IT, programming, Python, & Linux skills

Who should attend & recommended skills

This course is for those who know the basics of Python.
No prior data science or machine learning skills required.
It is geared to test and build your knowledge of Python and learn to handle the kind of open-ended problems that professional data scientists work on daily.
Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career.
Skill-level: Foundation-level Data Science Boot camp skills for Intermediate skilled team members.
This is not a basic class.
IT skills: Basic to Intermediate (1-5 years’ experience) Linux: Basic (1-2 years’ experience), including familiarity with command-line options such as ls, cd, cp, and suProgramming: Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them.

About this course

Data Science Boot camp is a comprehensive set of challenging projects carefully designed to grow your data science skills from novice to master. Veteran data scientist Leonard Apeltsin sets 10 increasingly difficult exercises that test your abilities against the kind of problems you’d encounter in the real-world. As you solve each challenge, you’ll acquire and expand the data science and Python skills you’ll use as a professional data scientist. Ranging from text processing to machine learning, each project comes complete with a unique downloadable data set and a fully-explained step-by-step solution. Because these projects come from Dr. Apelstin’s vast experience, each solution highlights the most likely failure points along with practical advice for getting past unexpected pitfalls. When you wrap up these 10 awesome exercises, you’ll have a diverse relevant skill set that’s transferable to working in industry.

Skills acquired & topics covered

Working in a hands-on learning environment, led by a Data Science Boot Camp expert instructor, students will learn about and explore:
Visualizing complex multi-variable datasets
Training a decision tree machine learning algorithm
10 in-depth Python exercises with full downloadable data sets
Web scraping for text and images
Organizing data sets with clustering algorithms

Course breakdown / modules

Sample Space Analysis: An Equation-Free Approach for Measuring Uncertainty in Outcomes
Computing Non-Trivial Probabilities
Computing Probabilities Over Interval Ranges

Basic Matplotlib Plots
Plotting Coin-Flip Probabilities

Simulating Random Coin-Flips and Dice-Rolls Using NumPy
Computing Confidence Intervals Using Histograms and NumPy Arrays
Leveraging Confidence Intervals to Analyze a Biased Deck of Cards
Using Permutations to Shuffle Cards

Overview
Predicting Red Cards within a Shuffled Deck
Optimizing Strategies using the Sample Space for a 10-Card Deck
Key Takeaways
Case Study 2: Assessing Online Ad-Clicks for Significance

Exploring the Relationships between Data and Probability Using SciPy
Mean as a Measure of Centrality
Variance as a Measure of Dispersion

Manipulating the Normal Distribution Using SciPy
6.2 Determining Mean and Variance of a Population through Random Sampling
6.3 Making Predictions Using Mean

Assessing the Divergence Between Sample Mean and Population Mean
Data Dredging: Coming to False Conclusions through Oversampling
Bootstrapping with Replacement: Testing a Hypothesis When the Population Variance is Unknown
Permutation Testing: Comparing Means of Samples when the Population Parameters are Unknown

Storing Tables Using Basic Python
Exploring Tables Using Pandas
Retrieving Table Columns
Retrieving Table Rows
Modifying Table Rows and Columns
Saving and Loading Table Data
Visualizing Tables Using Seaborn

Processing the Ad-Click Table in Pandas
Computing P-values from Differences in Means
Determining Statistical Significance
Shades of Blue: A Real-Life Cautionary Tale
Key Takeaways
Case Study 3: Tracking Disease Outbreaks Using News Headlines

Using Centrality to Discover Clusters
K-Means: A Clustering Algorithm for Grouping Data into K Central Groups
Using the Elbow Method
Using Density to Discover Clusters
DBSCAN: A Clustering Algorithm for Grouping Data Based on Spatial Density
Analyzing Clusters Using Pandas

The Great-Circle Distance: A Metric for Computing Distances Between 2 Global Points
Plotting Maps Using Base map
Location Tracking Using GeoNamesCache
Matching Location Names in Text

Overview
Extracting Locations from Headline Data
Visualizing and Clustering the Extracted Location Data
Extracting Insights from Location Clusters
Key Takeaways
Case Study 4: Using Online Job Postings to Improve Your Data Science Resume

Simple Text Comparison
Vectorizing Texts Using Word Counts
Matrix Multiplication for Efficient Similarity Calculation
Computational Limits of Matrix Multiplication

Clustering 2D Data in 1-Dimension
Dimension Reduction Using PCA and Scikit-Learn
Clustering 4D Data in 2-Dimensions
Computing Principal Components Without Rotation
Efficient Dimension Reduction Using SVD and Scikit-Learn

The Structure of HTML Documents
Parsing HTML using Beautiful Soup
Downloading and Parsing Online Data

Overview
Extracting Skill Requirements from Job Posting Data
Filtering Jobs by Relevance
Clustering Skills in Relevant Job Postings
Conclusion
Key Takeaways

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Data Science Boot Camp

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Computing Probabilities Using Python

Plotting Probabilities Using Matplotlib

Running Random Simulations in NumPy

Case Study 1 Solution

Basic Probability and Statistical Analysis Using SciPy

Making Predictions Using the Central Limit Theorem and SciPy

Statistical Hypothesis Testing

Analyzing Tables Using Pandas

Case Study 2 Solution Overview

Clustering Data into Groups

Geographic Location Visualization and Analysis

Case Study 3 Solution

Measuring Text Similarities

Dimension Reduction of Matrix Data

NLP Analysis of Large Text Datasets

Extracting Text from Web Pages

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail