Microsoft Excel — The BASICS of Data Transformation

Key Concepts Essential to Shaping Data For Analysis

Apple Mobile Product Sales History Charts

Whenever analysis and reporting is performed in Excel — and practically every company does — it is imperative that data is aggregated correctly and “transformed”.

For analysis to be efficient and effective, proper structure of the data and the development of additional fields (to aid analysis) is very important.

If interested, you can get a download of the Excel file I walk-through below.


The two charts above were developed for a recent training session I conducted.

Presentation to Penn State Weather Risk Management Group

A part of the session was to focus on working with a data set I compiled (Apple historical sales) and performing basic data “shaping” to enable flexible analysis and reporting.

Below, I highlight the steps taken to do this and touch on three areas of Excel that are typically used for basic data transformation and analysis:

  • Excel Data Tables
  • VLOOKUP function
  • Pivot Tables (future post…) — Part 2

Data Transformation / Shaping Process

Excel Data Tables

Here is a excellent summary from TechRepublic of key reasons you want to use Excel Tables. Key considerations mentioned include:

  • Effortless data entry
  • Always visible headers
  • Formula autofill (my favorite!)
  • Dynamic range

Step 1-Convert raw data input into an Excel Table

From this —

Raw data compilation…

To this —

Raw data converted to an Excel Table

Step 2 — Create additional fields in the data table

This will enable the analysis we are potentially interested in getting. In this session, I have anticipated a need for three additional fields.

Data Table — 3 extra fields to be populated
  • (1) — Calendar year (Apple reports on a fiscal year). Why a calendar year? It is likely that when data analysis begins, information on a calendar year basis might be useful. The YEAR function is used for this.
  • (2) — Category Summary. This field will “condense” Apple’s classification (which changes over the years) to a more summary grouping. This will enable higher level insight which would not be available from the raw dataset. The VLOOKUP function, with EXACT match, is used.
  • (3) — Stratification of the Amount field. This may be useful to report aggregated data and develop charts regarding frequency of occurrence. The VLOOKUP function, with APPROXIMATE match, is used.

Step 3 — Populate the Calendar Year field

Enter YEAR function and reference Date field
  • As soon as the formula is “Entered”, the Table populates the entire column immediately. This is known as “formula auto fill” and is a very useful reason to use tables — a lot.
Table “formula auto fill” automatically populates all rows

Step 4 — Populate the Category Summary field

We will use a VLOOKUP with EXACT match to do this. Here is the formula syntax for VLOOKUP.

VLOOKUP Function syntax
  • Here is the flow, with the LOOKUP table referenced (which is located on another worksheet).
Table VLOOKUP to return a Summary Category field
  • The formula is “=VLOOKUP([@Category2],SummaryLookupTable2,2,FALSE)”
  • Translation…Look at the Category2 field, go to the SummaryLookupTable2, if you find a match — return the value from column 2, and I want an EXACT match.
VLOOKUP populates the CategorySummary field

Step 5 — Populate the Stratification field

Getting this field populated is more complicated. Here’s why:

  • It involves 2 lookup tables. Since the Amount field includes $’s AND Units, separate lookup tables are needed for DOLLARS grouping and UNITS grouping.
  • Nesting functions is used to resolve this. The formula performs the following:

— Use the IF Function to check whether the TYPE field is “Dollars”.

— If TYPE = “Dollars”, then use the specified lookup table (result if IF statement is TRUE).

— If TYPE is NOT “Dollars”, then use the Units lookup table (result if IF statement is FALSE).

Nested (IF, VLOOKUPs — Approximate Match) used to populate Stratification field

Our data set is complete…for now.

Next step is to start to build analysis and visualizations of the data. This is the fun part — and where pivot tables make it simple!

I will cover the basics tips I provided in the presentation in a future post…stay tuned!


About Don

“On a mission to challenge the status quo to a more productive and effective end…”

Don is passionate about helping professionals and organizations keep up and adapt to the changing business world that we operate in.

Connect with Don!

LinkedIn, Flipboard, Twitter, Snapchat

Or, just Google me…I’m everywhere

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.