How to Analyze Nominal Data

Alejandra Budar
Fields Data
Published in
4 min readMay 5, 2021
Photo by Isaac Smith on Unsplash

In my previous posts, I introduced you to Google Colaboratory and briefly presented some of the Python libraries that can be used for data analysis and data visualization. If you have not read my previous articles, I suggest you familiarize yourself with them first, before proceeding. They can be found here. We will now build upon that knowledge and explore how to implement these packages to create a report from nominal data, or in other words, from data that is composed of names and non-numerical information.

Preparation

In order to create an accurate report, the data must be standardized. Take time to review your data and ensure that it adheres to the data standardization agreed upon in your data guidelines. Capitalization, language and spelling should be cohesive throughout, or you risk diluting the accuracy of the analysis. Remember that computers read text very literally, so “apple” and “Apple” would be considered two different words.

Once your data is ready, click on the Colab link here, create a copy of the notebook, and upload your data file to it by clicking on the folder icon on the left hand side.

Note that this report utilizes publicly available data collected by Fields Data from the Humanitarian Data Exchange website.

Data Visualization

Data visualization is one of the most efficient ways to quickly and clearly convey insights. Creating reports with varying graphs condenses reports and engages readers. To visualize our data, we will use Seaborn to create a countplot chart. This enables us to see at a glance the number of each entry in a specified column of a dataset . You will have to enter the information required for “x” and “data”. For example, “Source” should be replaced with the desired column, while keeping the quotations, and FD should instead be the name of the dataset. However, the rest can remain the same. Although the data cannot be directly imputed into formulas for calculation purposes, counting entries at various levels can be equally as telling. For this reason, countplots are my favorite tool for quickly understanding the variety of data and identifying discrepancies.

When the names of the data types are too long, I recommend you use Dexplot (illustrated below and located in the final cell of the notebook). Dexplot creates a countplot like Seaborn, but makes longer names easier to read by wrapping the text. To utilize, simply replace the column within the quotation marks of x, “Organization Type” with the new column name.

Data Analysis

I have found that groupby is one of the most useful codes for the analysis of nominal data (found in cells five and six). This code not only groups data at various levels, but also enables you to combine the code with value_counts to obtain the count amounts of a given variable after it has been grouped multiple times For example, cell five shows the frequency of each sector sorted by country > province > sector. An additional benefit of such a dataframe is that it can be downloaded as a csv file (cell seven) for better viewing and ease of sharing. This is extremely useful for data that has many entries and cannot be properly viewed in the cell.

To effectively utilize this code, replace “FD” with the name of your dataset and any of the names within brackets (after groupby) with the name of the desired columns. If you do not have multiple columns you’d like to group by, then remove it along with the comma from cell five. In cell six, remove the column from the first set of brackets only. The other set of brackets (with .count() attached) identifies the column that will be used for counting each entry. Both sets of brackets should match if there is only one column of interest.

Summary

To sum up, counting and grouping will be your best friends for nominal data. For higher level groupings, Seaborn is a quick and easy option. If more granular groupings are desired, then using groupby and value_counts will achieve this. Once you’ll have completed the analysis steps, you’ll also have the possibility to add descriptions and additional information as text, by clicking on the top left button labeled “+ Text”. Adding text will make your data more comprehensive and easier to follow. Finally, it is worth noting that the notebook can be shared easily with a link, similar to a Google Doc.

The notebook utilized is a simple format that I thought would be easier for beginners to understand. However, do not be afraid to experiment and be creative. Stack Overflow is a great resource for finding more challenging pieces of code that can help in communicating insights. One last piece of advice before using a new code, is to always ask yourself whether it will add value, confusion or redundancy to your data.

Now it is your turn, good luck!

--

--