Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


Course Skill Level:

Foundational to Intermediate

Course Duration:

4 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:


Who should attend & recommended skills:

Those with basic Python skills

Who should attend & recommended skills

  • Python experienced developers, analysts or others with Python skills who wish to master the spreadsheet disambiguation technique and create credible and well-organized spreadsheets.
  • Skill-level: Foundation-level Spreadsheets skills for Intermediate skilled team members. This is not a basic class.
  • Python developers: Basic (1-2 years’ experience)

About this course

Spreadsheets are a popular way to store and communicate business data, but, although they are easy to create and update, they are not reliable enough to be used for making important corporate decisions. With this course, you can gain insight into how to maintain spreadsheets, how to format them, and then convert them into a database of reliable and useful information. Turning Spreadsheets into Corporate Data starts with a quick history of spreadsheet usage. You’ll learn the basics of formatting spreadsheets, including how to handle special characters and column headings, and how to convert the spreadsheet first into an intermediate database and then into corporate data. You will also learn how to utilize the mnemonic dictionary that is created along with the intermediate database. The later chapters discuss the immutability of data and the importance of organizational and political considerations during the data transformation. By the end of this course, you’ll have the skills and knowledge needed to convert your spreadsheets into reliable corporate data.

Skills acquired & topics covered

  • The different factors that affect the transformation of spreadsheets into data
  • The basics of spreadsheet formatting in detail
  • Ways to handle non-standard spreadsheet structures
  • The two ways of viewing spreadsheets: internal and external
  • Establishing guidelines for the end-user while submitting spreadsheets
  • Ways to compensate for the shortcomings of the .pdf formatting
  • How to convert intermediate data from spreadsheets into corporate data
  • Working with the mnemonic dictionary
  • Data modeling through the entity-relationship diagram

Course breakdown / modules

  • The IT Labyrinth
  • End User Acceptance of the Spreadsheet
  • Spreadsheet Hell
  • A Tradeoff
  • Responsibility — The Flip Side of Control
  • Management’s Problem
  • Differences Between Two Types of Data

  • Public Data
  • The Spreadsheet as a Medium of Exchange
  • Recurring/Non-Recurring Spreadsheets
  • The Spectrum of Spreadsheets
  • The Cost of Transforming a Spreadsheet
  • Factors Other than Cost
  • Transcription of Data
  • Cell Formula
  • Spreadsheet Descriptors
  • Artificially Supplying Descriptors

  • Simple Demarcation-xlstab
  • Other Special Characters-eold and Linefeed
  • The Internal View of a Spreadsheet
  • A Missing Column Heading
  • A Missing Value
  • A Multiline Row
  • The Standard Spreadsheet Format
  • Managing the User of the Spreadsheet
  • The ssdef Table
  • The Spreadsheet Processing Log
  • The Lineage of Spreadsheet Data
  • The Cell Formula
  • Relating to the Real World
  • Identifying the Header Line

  • The Importance of Special Characters
  • PDF and OCR
  • A Final Option

  • The System Name
  • Unreliability of Report Name
  • Multiple Sheets in a Spreadsheet
  • Other Special Characters
  • Identifying Column Headings
  • Similar Column Headings
  • Blocking Off Sections of a Spreadsheet
  • Non-Standard Spreadsheet Structures
  • A Spreadsheet that Cannot be Mapped
  • A Spreadsheet in a TXT Format

  • Selecting Spreadsheets for Inclusion into Corporate Data
  • Recasting the Spreadsheet
  • Logging the Spreadsheet for Transformation
  • Entry into the Path Queue
  • Defining the Spreadsheet Headings
  • Pairing the ssdef Specification to the Spreadsheet
  • Finding and Creating Database Definitions and Values
  • The Intermediate Database
  • Some Anomalies
  • What if an Error is Discovered?
  • Manual Effort Required
  • Spreadsheet Width
  • Subdividing a Spreadsheet
  • No Value for a Column Name
  • No Column Headings
  • Creating the ssdef Specification Once

  • Finding Errors
  • The Contents of the Intermediate Database
  • Functions Served by the Data Elements
  • Alternate Name
  • Adding Context to Data Values
  • Editing Data in the Intermediate Database

  • Organizing Data Inside the ssdef Table
  • Processing Using ssdef Records
  • Searching the Full Path Queue

  • From Intermediate Data to Corporate Data
  • Grouped Corporate Data
  • Tracing the Lineage

  • The Contents of the Mnemonic Dictionary
  • Grouping Like Data Elements
  • Applying Naming Conventions
  • Value of the Mnemonic Dictionary

  • Shifting Control
  • Immutability of Data
  • The Importance of Alternate Names
  • Limited Editing
  • Super Classifications of Data
  • The Lineage of Corporate Data
  • Relative Volumes of Data

  • The Entity Relationship Diagram
  • The Data Item Set
  • The Physical Model
  • The Data Model
  • The Data Model and Spreadsheet Data
  • Correctness of Data
  • Aligning Data from Different Spreadsheets
  • An Algorithmic Resolution
  • An Indexed Resolution
  • Resolution and the Data Model
  • Spreadsheet Data in the Data Warehouse
  • Changing Spreadsheet Data