Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


Course Skill Level:


Course Duration:

1 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:


Who should attend & recommended skills:

Intermediate Python programmers with basic data science, Jupyter Notebooks, pandas, Scikit-learn, k-means clustering, & TF-IDF skills

Who should attend & recommended skills

  • This course is geared for intermediate Python programmers who know basic data science techniques and are taking aim to take online job postings and understand the data science job market by looking at the major themes (groups) of job postings using their skill requirements.
  • Skill-level: Foundation-level Decoding Data Science for Intermediate skilled team members. This is not a basic class.
  • Jupyter Notebooks: Basic (1-2 years’ experience)
  • pandas: Basic (1-2 years’ experience)
  • Scikit-learn: Basic (1-2 years’ experience)
  • k-means clustering: Basic (1-2 years’ experience)
  • TF-IDF: Basic (1-2 years’ experience)

About this course

In this course, you’re a budding data scientist who has created a draft of your resume. You want to apply for data science jobs, but would like to find the jobs you have the best shot at so would like to optimize your resume for a better chance at getting one of these jobs. We will be using NLP and text analytics to search for the most relevant data science jobs from online job postings and optimize our resume for the job postings. The job post HTML pages have already been web-scraped, and we will be loading them into Python and processing the text data from there. The number of job postings that were collected is large (over one thousand), so we will need to process them with data science methods using Python. We will use text similarity methods to find the most similar job postings, and also to find key skills we’re missing from our resume. We’ll summarize our findings by printing out highlights of the text results, as well as displaying plots and word clouds of the data.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Data Science expert instructor, students will learn about and explore:
  • Parsing webpages with the BeautifulSoup library
  • Storing and processing data with pandas DataFrames
  • Converting raw text to numeric features (TF-IDF vectors) with the sklearn (scikit-learn) library
  • Measuring text similarity with a cosine distance function from the sklearn library
  • Dimensionality reduction with singular value decomposition (SVD) using sklearn
  • K-means clustering using sklearn
  • Creating word clouds with the WordCloud library for text cluster visualization
  • Collect HTML job postings and extract relevant sections for the next steps (primarily the HTML body and skill requirements you should find where the skill requirements are from looking at some HTML pages)
  • Find which jobs are most similar to our resume based on the full text of our resume and full texts of the job posts. Future steps in the project will use this group of most similar job postings
  • Understand the major themes of job postings via the skill requirements of groups of jobs
  • Find which skills our resume is missing compared with what organizations are looking for (e.g., the skill requirements from job postings)

Course breakdown / modules

  • Our first step is to take the raw HTML job postings and extract relevant information from them, such as the skill requirements for each job.
  • Next, we will find the jobs that are most similar to our resume using cosine similarity.

  • After that, we’ll use the most similar job postings to analyze what type of skills are typically asked for by clustering the skill requirements from the job postings.

  • Finally, we’ll use our most similar job postings to find which skills are missing from our resume, so we can work on those skills and add them to our resume. This should give us a better shot at getting our dream data science job.