Data Analysis with Excel: Fundamentals Part-1

Abhijit
8 min readNov 22, 2023

--

Introduction:

In the ever-evolving realm of digital tools, Microsoft Excel stands tall as a ubiquitous force, transforming data management since its inception in 1985. Initially tailored for financial tasks, Excel’s user-friendly interface and robust capabilities have propelled it beyond accounting, finding a vital place in industries spanning finance, engineering, healthcare, and education.

Excel’s legacy is not just historical but continues to shape the professional landscape today. Its demand is unwavering, recognized as a critical skill across diverse sectors. From financial analysts crunching numbers to educators crafting interactive lesson plans, Excel’s versatility knows no bounds. Beyond its traditional spreadsheet role, Excel is a canvas for creativity and problem-solving. As we embark on this journey from basics to advanced features in our blog series, we’ll uncover the dynamic capabilities that make Excel an indispensable tool in the digital age. Discover how it can elevate your proficiency in handling data and driving informed decisions.

Over a period of five blogs, we will look into Data Preparation, Aggregation, Logical Functions, and Lookup functions. After covering the fundamentals of excel, we’ll go into advanced topics for data visualization.

This is the first part of excel fundamentals journey where we’ll look at the business tasks from the eyes of a Data Analyst.

Objectives: Cleaning and manipulating text

  • Apply a range of text functions to manipulate and restructure data.
  • Solve issues of removing and replacing unwanted characters.
  • Develop confidence working with advanced formula techniques and nested functions.

Tool Used: Microsoft Excel 2016

Topics Covered:

  1. Joining Text Data (CONCATENATE. &, CONCAT, TEXT JOIN)
  2. Splitting text data (LEFT, RIGHT, MID)
  3. Combining text functions (FIND, LEN)
  4. Cleaning data and changing cases (CLEAN, TRIM, UPPER, LOWER, PROPER)
  5. Removing and replacing text characters (SUBSTITUTE)

Excel Data File

  1. Joining Text Data (CONCATENATE. &, CONCAT, TEXT JOIN): Combining data from different sources into a single cell.

You have received a statement from the supplier detailing all the invoices that have been paid this month.

This data needs to be cross-checked with the data available within your system data, which is now downloaded into an Excel file. The problem is a lot of the data doesn’t match. So, for example with your system, the Payment Reference is made up of the Invoice Number and the Payment Number.

On the supplier statement, however, the same data is in two separate columns. Before these values can be compared, you are going to need to join the ‘Document Number’ and the ‘Payment Number’ in a new column called ‘Payment No’

Supplier Invoice Statement

To do this, you will use the function ‘CONCATENATE’:

If you go to the table ‘MC Invoice Report’, you can see that the format of ‘Payment_Ref’ column is ‘24673_1’. We’ll use the same format for our table ‘Payment No.’ in the ‘Supplier Invoice Statement’.

There is another way to combine columns by using ‘Concenate Operator — &’. We’ll use it to join ABN, Acct, and Check columns in the column named ‘Bank Details’.

Similarly, we can also introduce another character, in this case ‘-’ using Concatenate Operator — ‘&’:

2. Splitting text data (LEFT, RIGHT, MID):

i. Your next task is to find out in which month the invoice was raised in. To this, you will need to extract the first three letters from the ‘Invoiced’ column into the column named ‘Inv Month’:

To do this task, we’ll use ‘LEFT’ function. The ‘LEFT’ functions returns the leftmost characters from a text value.

Format: =LEFT(column_no, number_of_characters_to_be_extracted)

ii. Now we want to extract the purchase order number from the ‘Customer PO’ column. It can be seen that the 6-digit purchase order number is joined with the location. Since we want to extract 6 characters from the right, we’ll use the ‘RIGHT’ function to extract it in the column ‘PO Number’.

iii. Next we want is the location from the column ‘Customer PO’. If you notice the location, it’s neither on left or on right, but actually in the middle.

To extract this value, we’ll use the ‘MID’ function. This function returns a specific number of characters from a text string starting at a specified position.

Format: =MID(text/column, start_num, num_characters)

To keep it simple, we’ll return ‘S’ for ‘Sydney’ and ‘M’ for ‘Melbourne’.

3. Combining text functions (FIND, LEN): From here on, things will start getting a bit more interesting as we look into combining the excel functions to solve complicated problems.

i. From out previous operation of ‘MID’ function, we extracted ‘S’ for ‘Sydney’ and ‘M’ for ‘Melbourne’, but we do need the complete names. The problem here is ‘Sydney’ is a 6-letter word, whereas ‘Melbourne’ is a 9-letter word. So, when using a formula, we can’t just use these numbers, we will let excel to calculate that length for us.

Excel is very good at working with patterns. So, as long as there are patterns in data, Excel will be able to execute what is needed.

In the column above, we can spot a pattern. The are always three characters ‘PO-’ before and a ‘-’ after the location. So, if we can find the position of ‘-’, we can work out how long our location is. For this operation, we’ll use ‘FIND’ function:

Format: =FIND(find_text, within_text, [start_num])

As it can be seen, the FIND function has extracted 10 characters for the first row, but we need 6. So, we need to subtract 4 to get to the actual location.

We’ll now copy this formula and paste in in MID function.

ii. FIND is very useful when there is some form of delimiter present that separates the data, but there are cases when there are no separators.

There’s another pattern that takes care of this. In the column ‘Customer PO’ we can see that the location is always preceded by three characters and has seven after. In other words, the length of string is always going to be: 10 + location. If we take the total length of the string and subtract it with 10, we can get the exact length of our location. To do this, we have a function called ‘LEN’.

Format: =LEN(column)

Now, we’ll subtract it by 10 which is sum of first three and last seven characters in the string.

We’ll use this function inside the MID function to get our values.

So, with that, for the same problem we were able to find solution in two ways using ‘FIND’ and ‘LEN’.

4. Cleaning data and changing cases (CLEAN, TRIM, UPPER, LOWER, PROPER):

i. The column in this that we are going to focus on is Inv/cr.

In the column we can see that there are some strange characters, in some rows there are extra spaces, and some have both. And so, this need to be tidied up. The first thing we will take care of the special characters. The function that we are going to use here is ‘CLEAN’.

Format: =CLEAN(column_name)

ii. The clean function only works for special character and not the spaces that was still there after using the ‘CLEAN’ function. To get rid of this, we’ll use the function called as ‘TRIM’. It dies not remove all spaces, but only removes extra spaces from the text which are: Leading spaces, Trailing spaces, and Extra mid spaces.

Format: =TRIM(text)

iii. We now need to make the text more consistent, by either making all upper case, lower case. The functions for it at ‘UPPER’, ‘LOWER’, and ‘PROPER’. ‘UPPER’ converts the text to upper case, ‘LOWER’ converts the text to lower case, where “PROPER’ case capitilises the first letter of each word.

Format: UPPER(data), LOWER(data), PROPER(text)

5. Removing and replacing text characters (SUBSTITUTE): In the previous case we have used ‘CLEAN’ and ‘TRIM’ to get rid of unwanted characters and spaces, but this is quite limited. Like in the case of the ‘S’ in the column ‘Paid Amount’. In this, the ‘S’ is supposed to be a ‘$’. For this we’ll use the function ‘SUBSTITUTE’ which replaces text-based content.

After we execute the function, we can see that there is space befor the character:

To get rid of that, we’ll use another substituted function. And since that space is between ‘S’ and amount we’ll use MID function to identify:

And that marks the end of Part 1.

In the next blog of “Data Analysis with Excel: Fundamentals Part-2” we’ll be working with numbers and dates.

Connect with me on LinkedIn

--

--