Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Data Science Boot camp

  • Course Code: Data Science - Data Science Boot camp
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 6 Days Audience: This course is geared to test and build your knowledge of Python and learn to handle the kind of open-ended problems that professional data scientists work on daily. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career.

Course Snapshot 

  • Duration: 6 days 
  • Skill-level: Foundation-level Data Science Boot camp skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared to test and build your knowledge of Python and learn to handle the kind of open-ended problems that professional data scientists work on daily. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career. 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation, or remote instructor led delivery, or CBT/WBT (by request). 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Data Science Boot camp is a comprehensive set of challenging projects carefully designed to grow your data science skills from novice to master. Veteran data scientist Leonard Apeltsin sets 10 increasingly difficult exercises that test your abilities against the kind of problems you’d encounter in the real-world. As you solve each challenge, you’ll acquire and expand the data science and Python skills you’ll use as a professional data scientist. Ranging from text processing to machine learning, each project comes complete with a unique downloadable data set and a fully-explained step-by-step solution. Because these projects come from Dr. Apelstin’s vast experience, each solution highlights the most likely failure points along with practical advice for getting past unexpected pitfalls. When you wrap up these 10 awesome exercises, you’ll have a diverse relevant skill set that’s transferable to working in industry. 

Working in a hands-on learning environment, led by Data Science Boot camp expert instructor, students will learn about and explore: 

  • Visualize complex multi-variable datasets 
  • Train a decision tree machine learning algorithm 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • 10 in-depth Python exercises with full downloadable data sets 
  • Web scraping for text and images 
  • Organize data sets with clustering algorithms 
  • Visualize complex multi-variable datasets 
  • Train a decision tree machine learning algorithm 

Audience & Pre-Requisites 

This course is for readers who know the basics of Python. No prior data science or machine learning skills required. 

Pre-Requisites:  Students should have  

  • Basic to Intermediate IT Skills. Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them. 
  • Good foundational mathematics or logic skills 
  • Basic Linux skills, including familiarity with command-line options such as ls, cd, cp, and su 

Course Agenda / Topics 

  1. Computing Probabilities Using Python 
  • Sample Space Analysis: An Equation-Free Approach for Measuring Uncertainty in Outcomes 
  • Computing Non-Trivial Probabilities 
  • Computing Probabilities Over Interval Ranges 
  1. Plotting Probabilities Using Matplotlib 
  • Basic Matplotlib Plots 
  • Plotting Coin-Flip Probabilities 
  1. Running Random Simulations in NumPy 
  • Simulating Random Coin-Flips and Dice-Rolls Using NumPy 
  • Computing Confidence Intervals Using Histograms and NumPy Arrays 
  • Leveraging Confidence Intervals to Analyze a Biased Deck of Cards 
  • Using Permutations to Shuffle Cards 
  1. Case Study 1 Solution 
  • Overview 
  • Predicting Red Cards within a Shuffled Deck 
  • Optimizing Strategies using the Sample Space for a 10-Card Deck 
  • Key Takeaways 
  • Part 2. Case Study 2: Assessing Online Ad-Clicks for Significance 
  1. 5 Basic Probability and Statistical Analysis Using SciPy 
  • Exploring the Relationships between Data and Probability Using SciPy 
  • Mean as a Measure of Centrality 
  • Variance as a Measure of Dispersion 
  1. Making Predictions Using the Central Limit Theorem and SciPy 
  • Manipulating the Normal Distribution Using SciPy 
  • 6.2 Determining Mean and Variance of a Population through Random Sampling 
  • 6.3 Making Predictions Using Mean  
  1. Statistical Hypothesis Testing 
  • Assessing the Divergence Between Sample Mean and Population Mean 
  • Data Dredging: Coming to False Conclusions through Oversampling 
  • Bootstrapping with Replacement: Testing a Hypothesis When the Population Variance is Unknown 
  • Permutation Testing: Comparing Means of Samples when the Population Parameters are Unknown 
  1. Analyzing Tables Using Pandas 
  • Storing Tables Using Basic Python 
  • Exploring Tables Using Pandas 
  • Retrieving Table Columns 
  • Retrieving Table Rows 
  • Modifying Table Rows and Columns 
  • Saving and Loading Table Data 
  • Visualizing Tables Using Seaborn 
  1. Case Study 2 Solution Overview 
  • Processing the Ad-Click Table in Pandas 
  • Computing P-values from Differences in Means 
  • Determining Statistical Significance 
  • Shades of Blue: A Real-Life Cautionary Tale 
  • Key Takeaways 
  • Part 3. Case Study 3: Tracking Disease Outbreaks Using News Headlines 
  1. Clustering Data into Groups 
  • Using Centrality to Discover Clusters 
  • K-Means: A Clustering Algorithm for Grouping Data into K Central Groups 
  • Using the Elbow Method 
  • Using Density to Discover Clusters 
  • DBSCAN: A Clustering Algorithm for Grouping Data Based on Spatial Density 
  • Analyzing Clusters Using Pandas 
  1. Geographic Location Visualization and Analysis 
  • The Great-Circle Distance: A Metric for Computing Distances Between 2 Global Points 
  • Plotting Maps Using Base map 
  • Location Tracking Using GeoNamesCache 
  • Matching Location Names in Text 
  1. Case Study 3 Solution 
  • Overview 
  • Extracting Locations from Headline Data 
  • Visualizing and Clustering the Extracted Location Data 
  • Extracting Insights from Location Clusters 
  • Key Takeaways 
  • Part 4. Case Study 4: Using Online Job Postings to Improve Your Data Science Resume 
  1. Measuring Text Similarities 
  • Simple Text Comparison 
  • Vectorizing Texts Using Word Counts 
  • Matrix Multiplication for Efficient Similarity Calculation 
  • Computational Limits of Matrix Multiplication 
  1. Dimension Reduction of Matrix Data 
  • Clustering 2D Data in 1-Dimension 
  • Dimension Reduction Using PCA and Scikit-Learn 
  • Clustering 4D Data in 2-Dimensions 
  • Computing Principal Components Without Rotation 
  • Efficient Dimension Reduction Using SVD and Scikit-Learn 
  1. NLP Analysis of Large Text Datasets 
  1. Extracting Text from Web Pages 
  • The Structure of HTML Documents 
  • Parsing HTML using Beautiful Soup 
  • Downloading and Parsing Online Data 
  1. Case Study 4 Solution 
  • Overview 
  • Extracting Skill Requirements from Job Posting Data 
  • Filtering Jobs by Relevance 
  • Clustering Skills in Relevant Job Postings 
  • Conclusion 
  • Key Takeaways 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?